Commercial Crawlers Bots & Crawlers
Crawlers operated by AI companies to collect training data for large language models. They scrape web content at scale, often without direct user interaction.
AI2Bot
Commercial CrawlersAllen AI
Allen Institute for AI's research crawler for academic AI development.
UA: AI2Bot, ai2bot
Amazonbot
Commercial CrawlersAmazon
Amazon's web crawler powering Alexa, Amazon search, and AI services.
UA: Amazonbot
Applebot-Extended
Commercial CrawlersApple
Apple's AI training token controlling how Applebot data is used for Apple Intelligence.
UA: Applebot-Extended
Bytespider
Commercial CrawlersByteDance
ByteDance's web crawler for TikTok AI and LLM training data.
UA: Bytespider, bytespider, Bytedance
CCBot
Commercial CrawlersCommon Crawl
Common Crawl's open-source web archive used by multiple AI companies for training.
UA: CCBot, ccbot
ClaudeBot
Commercial CrawlersAnthropic
Anthropic's web crawler collecting training data for Claude models.
UA: ClaudeBot, claudebot, Claude-Web, anthropic-ai, Anthropic
cohere-ai
Commercial CrawlersCohere
Cohere's web crawler for enterprise AI and language model training.
UA: cohere-ai, CohereBot
DeepSeekBot
Commercial CrawlersDeepSeek
DeepSeek's web crawler for their open-source large language models.
UA: DeepSeekBot, deepseek
Diffbot
Commercial CrawlersDiffbot
Diffbot's AI-powered web scraping and knowledge graph crawler.
UA: Diffbot, diffbot
Google-Extended
Commercial CrawlersGoogle's AI training token controlling use of Googlebot-crawled content for AI.
UA: Google-Extended
GPTBot
Commercial CrawlersOpenAI
OpenAI's training data crawler for GPT models including ChatGPT and GPT-4.
UA: GPTBot, gptbot
ICC-Crawler
Commercial CrawlersNICT
Japan's NICT research crawler for AI and multilingual data collection.
UA: ICC-Crawler
Meta-WebIndexer
Commercial CrawlersMeta
Meta's web indexer for improving Meta AI search and knowledge.
UA: Meta-WebIndexer, meta-webindexer
webzio
Commercial CrawlersWebz.io
Webz.io's data extraction crawler used by AI companies for training data.
UA: webzio
Explore Other Categories
Manage all commercial crawlers with Switch
Detect, track, and build custom response journeys for every commercial crawlers visiting your site. Five-minute setup.
Get Started Free