What is robots.txt?
A text file at a website's root that tells crawlers which pages they can and cannot access.
robots.txt is a plain text file placed at the root of a website (e.g., example.com/robots.txt) that provides instructions to web crawlers about which pages or sections they should or shouldn't access. It follows the Robots Exclusion Protocol, a voluntary standard that cooperative crawlers respect.
The file uses simple directives: User-agent specifies which crawler the rules apply to, Disallow blocks access to specific paths, Allow permits access to paths within a disallowed directory, and Crawl-delay suggests a minimum wait between requests. A sitemap directive points crawlers to your XML sitemap.
Important limitation: robots.txt is advisory, not enforced. Well-behaved crawlers (Googlebot, GPTBot, ClaudeBot) respect it, but malicious scrapers and browser agents ignore it entirely. For enforceable access control, you need server-side solutions or tools like Switch that can block or challenge non-compliant visitors.
How Switch Helps
Switch complements robots.txt by providing enforceable access control for crawlers that ignore robots.txt, plus behavioral detection for agents that don't use identifiable user-agent strings.
Get Started Free