webzio
Webz.io's data extraction crawler used by AI companies for training data.
What is webzio?
Webz.io (formerly Omgili) operates a web data extraction service that sells structured web data to AI companies, enterprises, and researchers. Their crawler collects and structures web content into machine-readable feeds used by various downstream consumers.
Unlike company-specific training crawlers, Webz.io acts as a data intermediary — content they crawl may end up in multiple AI training pipelines through their commercial data products. This makes managing Webz.io access an important part of AI content governance.
The crawler identifies itself as "webzio" and operates at low to moderate rates. Webz.io provides data feeds covering news, forums, reviews, and general web content.
User-Agent Strings
These are the known user-agent patterns used by webzio. Use them to identify this crawler in your server logs or configure robots.txt rules.
robots.txt example:
User-agent: webzio Disallow: /private/ Allow: /
How to Manage webzio
Content may be resold to multiple AI companies — consider blocking for content protection.
Acts as a data intermediary, not a direct AI model trainer.
Low to moderate crawl rates.
Use Switch to track alongside direct AI training crawlers.
Start managing webzio today
Switch detects, tracks, and lets you build custom journeys for webzio and 35+ other AI agents and crawlers. Set up in five minutes.
Get Started FreeRelated Agents
AI2Bot
Commercial CrawlersAllen AI
Allen Institute for AI's research crawler for academic AI development.
Amazonbot
Commercial CrawlersAmazon
Amazon's web crawler powering Alexa, Amazon search, and AI services.
Applebot-Extended
Commercial CrawlersApple
Apple's AI training token controlling how Applebot data is used for Apple Intelligence.
Bytespider
Commercial CrawlersByteDance
ByteDance's web crawler for TikTok AI and LLM training data.
CCBot
Commercial CrawlersCommon Crawl
Common Crawl's open-source web archive used by multiple AI companies for training.
ClaudeBot
Commercial CrawlersAnthropic
Anthropic's web crawler collecting training data for Claude models.