Commercial CrawlersActive

ICC-Crawler

Name: ICC-Crawler
Author: NICT

Japan's NICT research crawler for AI and multilingual data collection.

What is ICC-Crawler?

ICC-Crawler is operated by Japan's National Institute of Information and Communications Technology (NICT) for collecting multilingual web data for AI and machine learning research. NICT is a government-funded research institute focused on information and communications technology.

The crawler supports Japan's national AI research initiatives, particularly in multilingual natural language processing. Data collected feeds into research projects that advance Japanese language AI capabilities and cross-lingual understanding.

ICC-Crawler is one of the few government-backed AI research crawlers, representing the growing role of national institutions in AI development. It operates at low volumes and is primarily interested in multilingual content.

User-Agent Strings

These are the known user-agent patterns used by ICC-Crawler. Use them to identify this crawler in your server logs or configure robots.txt rules.

ICC-Crawler

robots.txt example:

User-agent: ICC-Crawler
Disallow: /private/
Allow: /

How to Manage ICC-Crawler

Government-backed research crawler with minimal commercial implications.

Very low crawl rates — negligible bandwidth impact.

Particularly relevant for multilingual and Japanese-language content.

Use Switch to monitor as part of overall AI crawler tracking.

How to block ICC-Crawler

Start managing ICC-Crawler today

Switch detects, tracks, and lets you build custom journeys for ICC-Crawler and 35+ other AI agents and crawlers. Set up in five minutes.

Get Started Free

Related Agents

AI2Bot

Commercial Crawlers

Allen AI

Allen Institute for AI's research crawler for academic AI development.

Amazonbot

Commercial Crawlers

Amazon

Amazon's web crawler powering Alexa, Amazon search, and AI services.

Applebot-Extended

Commercial Crawlers

Apple

Apple's AI training token controlling how Applebot data is used for Apple Intelligence.

Bytespider

Commercial Crawlers

ByteDance

ByteDance's web crawler for TikTok AI and LLM training data.

CCBot

Commercial Crawlers

Common Crawl

Common Crawl's open-source web archive used by multiple AI companies for training.

ClaudeBot

Commercial Crawlers

Anthropic

Anthropic's web crawler collecting training data for Claude models.

Back to Agents Directory