Commercial CrawlersActive

Bytespider

ByteDance's web crawler for TikTok AI and LLM training data.

Operated by ByteDance

What is Bytespider?

Bytespider is ByteDance's web crawler used to collect training data for their AI models and services, including TikTok's recommendation algorithms and their LLM products. ByteDance is one of the world's largest tech companies, operating TikTok and several AI research divisions.

The crawler has been observed crawling at significant rates and does not always provide full documentation about its data usage policies. It identifies itself with "Bytespider" or "Bytedance" in the user-agent string.

Bytespider has faced scrutiny for its aggressive crawling behavior and the geopolitical implications of data collection by a China-headquartered company. Site owners should make informed decisions about allowing access based on their content policies and audience.

User-Agent Strings

These are the known user-agent patterns used by Bytespider. Use them to identify this crawler in your server logs or configure robots.txt rules.

Bytespider
bytespider
Bytedance

robots.txt example:

User-agent: Bytespider
Disallow: /private/
Allow: /

How to Manage Bytespider

1

Consider geopolitical and data policy implications before allowing access.

2

Can be aggressive — use Switch to monitor crawl rates and patterns.

3

Block in robots.txt if you don't want ByteDance to use your content for AI training.

4

Separate from TikTok social crawler — manage independently.

How to block Bytespider

Start managing Bytespider today

Switch detects, tracks, and lets you build custom journeys for Bytespider and 35+ other AI agents and crawlers. Set up in five minutes.

Get Started Free

Related Agents

Back to Agents Directory