Commercial CrawlersActive

Diffbot

Diffbot's AI-powered web scraping and knowledge graph crawler.

Operated by Diffbot

What is Diffbot?

Diffbot operates an AI-powered web scraping infrastructure that builds a comprehensive knowledge graph from the public web. Unlike traditional crawlers, Diffbot uses computer vision and NLP to understand page structure and extract structured data automatically.

Diffbot's technology is used by companies worldwide for competitive intelligence, lead generation, and data enrichment. Their knowledge graph contains billions of entities extracted from web pages, making it one of the largest commercial web data products.

The crawler identifies itself as "Diffbot" and visits pages to extract structured information like product details, article content, organization data, and person profiles. This data feeds into commercial APIs used by thousands of businesses.

User-Agent Strings

These are the known user-agent patterns used by Diffbot. Use them to identify this crawler in your server logs or configure robots.txt rules.

Diffbot
diffbot

robots.txt example:

User-agent: Diffbot
Disallow: /private/
Allow: /

How to Manage Diffbot

1

Block if you don't want structured data extraction from your pages.

2

Diffbot extracts product info, pricing, and organization data.

3

Low to moderate crawl rates.

4

Use Switch to identify Diffbot and serve limited content if desired.

How to block Diffbot

Start managing Diffbot today

Switch detects, tracks, and lets you build custom journeys for Diffbot and 35+ other AI agents and crawlers. Set up in five minutes.

Get Started Free

Related Agents

Back to Agents Directory