Commercial CrawlersActive

GPTBot

OpenAI's training data crawler for GPT models including ChatGPT and GPT-4.

Operated by OpenAIOfficial docs

What is GPTBot?

GPTBot is OpenAI's primary web crawler for collecting training data for GPT models, including ChatGPT, GPT-4, and future models. It crawls web pages at scale to build the datasets used during model pre-training and fine-tuning.

This is the most discussed AI training crawler due to OpenAI's market prominence. GPTBot respects robots.txt and publishes its IP ranges at openai.com/gptbot.json. Its crawl rate is moderate (around 100 pages/hour on major sites) compared to OpenAI's real-time browsing agents.

The decision to allow or block GPTBot is one of the most consequential AI policy decisions site owners face today. Allowing it means your content may influence GPT model behavior; blocking it keeps your content out of training but has no effect on ChatGPT's real-time browsing (that's ChatGPT-User) or search features (that's OAI-SearchBot).

User-Agent Strings

These are the known user-agent patterns used by GPTBot. Use them to identify this crawler in your server logs or configure robots.txt rules.

GPTBot
gptbot

robots.txt example:

User-agent: GPTBot
Disallow: /private/
Allow: /

How to Manage GPTBot

1

Block GPTBot in robots.txt if you don't want your content used for AI training.

2

This does NOT affect ChatGPT browsing or search — those are separate agents.

3

Use Switch journeys to serve modified content specifically to GPTBot.

4

Monitor GPTBot crawl patterns to understand which content interests OpenAI.

How to block GPTBot

Start managing GPTBot today

Switch detects, tracks, and lets you build custom journeys for GPTBot and 35+ other AI agents and crawlers. Set up in five minutes.

Get Started Free

Related Agents

Back to Agents Directory