How to Block CCBot

Complete guide to blocking CCBot (Common Crawl) from crawling your website using robots.txt, server configuration, and Switch workflows.

Operated by Common CrawlCommercial Crawlers

Should You Block CCBot?

CCBot collects data for AI model training. Blocking it prevents your content from being used in Common Crawl's AI products without affecting your search visibility.

This is a common and recommended action for sites that want to control how their content is used in AI training.

Blocking Methods

1robots.txt

High for cooperative crawlers

Add a Disallow rule for CCBot's user-agent string in your robots.txt file. This is the standard, cooperative method that well-behaved crawlers respect.

2Server-side UA filtering

High

Configure your web server (nginx, Apache, Cloudflare) to reject requests matching CCBot's user-agent patterns. This blocks at the network level before your application processes the request.

3Switch Journey Workflows

Highest — granular, real-time control

Create a custom journey in Switch that detects CCBot and routes it to a block action, challenge, redirect, or modified content — without touching your server configuration.

robots.txt — Block CCBot

Add the following to your robots.txt file (at the root of your domain) to block CCBot:

User-agent: CCBot
Disallow: /

User-agent: ccbot
Disallow: /

robots.txt — Allow with Restrictions

Alternatively, allow CCBot on most pages while blocking specific directories:

User-agent: CCBot
Disallow: /private/
Allow: /

User-agent: ccbot
Disallow: /private/
Allow: /

CCBot User-Agent Strings

Use these patterns to identify CCBot in your server logs or firewall rules:

CCBot
ccbot

Frequently Asked Questions

Does blocking CCBot affect my Google search rankings?

No. Blocking CCBot does not affect your Google search rankings. Only blocking Googlebot impacts Google Search visibility.

Does CCBot respect robots.txt?

Yes, CCBot respects robots.txt directives. Adding a Disallow rule for its user-agent will prevent it from crawling blocked paths.

Can I allow CCBot on some pages but not others?

Yes. Use robots.txt to disallow specific directories, or use Switch journey workflows for granular page-level control with conditional logic.

Go beyond robots.txt

Switch detects CCBot in real-time and lets you build custom journey workflows — block, challenge, redirect, or serve modified content. No server changes required.

Get Started Free