⚡ 25+ AI Crawlers · Toggle Each · Free

AI Robots.txt Generator

Control which AI training crawlers can scrape your site. Block GPTBot, ClaudeBot, Perplexity, Apple, Google, Bytespider — or pick exactly which ones to allow.

Quick presets

🚫 Block all AI crawlers
Recommended for paywalled / premium content sites
🔍 Block training, allow search
Block model training but allow ChatGPT/Perplexity to cite you live
✅ Allow all AI crawlers
Maximum AI visibility — recommended for new content sites

Or pick exactly which bots to block

Your robots.txt — copy & paste into the root of your site


    

Place this at https://yoursite.com/robots.txt — most static hosts (Vercel, Netlify, Cloudflare Pages) serve it automatically from the project root.

What changed in 2024–2026 with AI crawlers

Until 2023, robots.txt was mostly about telling Google and Bing what not to crawl for search. Today, a new generation of crawlers exists: bots that scrape the web specifically to train large language models. GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Google-Extended (Gemini), Applebot-Extended (Apple Intelligence), Bytespider (ByteDance), CCBot (Common Crawl, used by many AI labs).

Most respect robots.txt. A few don't (or are rumored not to). Major publishers — NYT, Reuters, BBC, Axel Springer — have explicitly blocked them. Smaller sites are split: some want the AI exposure, others want to protect their content from being trained on without compensation.

The 3 main strategies

The 25+ AI crawlers we cover

Beyond the famous ones (GPTBot, ClaudeBot, PerplexityBot), there are crawlers from Cohere (cohere-ai), Diffbot (Diffbot), Amazon (Amazonbot), Meta (FacebookBot, Meta-ExternalAgent), TikTok (Bytespider), Yandex (YandexAI), and several research crawlers (CCBot, Omgili, omgilibot). New ones appear monthly. Our list is updated weekly from public documentation.

Three things people get wrong

Should you also use llms.txt?

Yes, complement robots.txt with llms.txt — a newer, AI-specific format that tells LLMs what your site is about and how it should be cited. Robots.txt says don't crawl, llms.txt says here's what I am if you do crawl. They're complementary, not competing.

FAQ

Will blocking AI crawlers hurt my Google rankings? No. Googlebot is for search, Google-Extended is for Gemini training. Blocking Google-Extended doesn't affect your search ranking.

What if I want to allow some content but not other? You can scope: User-agent: GPTBot / Disallow: /paid/ / Allow: /. Block paywalled pages, allow public pages.

How do I verify it's working? Use Google Search Console's robots.txt tester for Googlebot. For others, check your access logs after deployment — you should see the user-agent strings of bots respecting (or violating) your rules.