Free AI Robots.txt Generator (2026) — Block GPTBot, ClaudeBot, Perplexity

What changed in 2024–2026 with AI crawlers

Until 2023, robots.txt was mostly about telling Google and Bing what not to crawl for search. Today, a new generation of crawlers exists: bots that scrape the web specifically to train large language models. GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Google-Extended (Gemini), Applebot-Extended (Apple Intelligence), Bytespider (ByteDance), CCBot (Common Crawl, used by many AI labs).

Most respect robots.txt. A few don't (or are rumored not to). Major publishers — NYT, Reuters, BBC, Axel Springer — have explicitly blocked them. Smaller sites are split: some want the AI exposure, others want to protect their content from being trained on without compensation.

The 3 main strategies

Block all AI crawlers: highest content protection, but you also lose visibility in ChatGPT search, Perplexity citations, Claude Web. Best for paywalled sites and original journalism.
Block training, allow live retrieval: nuanced. GPTBot trains, ChatGPT-User is live retrieval. You can allow one and block the other. Hardest to maintain but gets you AI exposure without training contribution.
Allow all: if your goal is visibility (most marketing sites, blogs), let AI bots in. They'll cite you, drive traffic, and you'll be findable in AI answers.

The 25+ AI crawlers we cover

Beyond the famous ones (GPTBot, ClaudeBot, PerplexityBot), there are crawlers from Cohere (cohere-ai), Diffbot (Diffbot), Amazon (Amazonbot), Meta (FacebookBot, Meta-ExternalAgent), TikTok (Bytespider), Yandex (YandexAI), and several research crawlers (CCBot, Omgili, omgilibot). New ones appear monthly. Our list is updated weekly from public documentation.

Three things people get wrong

Casing matters: User-agent: GPTBot works. User-agent: gptbot works in most parsers but is technically wrong.
Path matters: Disallow: / blocks everything. Disallow: (blank) allows everything. Disallow: /private/ only blocks that folder.
It's not law-enforced. Robots.txt is a polite request. Bad-faith crawlers ignore it. For real protection, you need WAF rules, rate limiting, or Cloudflare's bot management.

Should you also use llms.txt?

Yes, complement robots.txt with llms.txt — a newer, AI-specific format that tells LLMs what your site is about and how it should be cited. Robots.txt says don't crawl, llms.txt says here's what I am if you do crawl. They're complementary, not competing.

FAQ

Will blocking AI crawlers hurt my Google rankings? No. Googlebot is for search, Google-Extended is for Gemini training. Blocking Google-Extended doesn't affect your search ranking.

What if I want to allow some content but not other? You can scope: User-agent: GPTBot / Disallow: /paid/ / Allow: /. Block paywalled pages, allow public pages.

How do I verify it's working? Use Google Search Console's robots.txt tester for Googlebot. For others, check your access logs after deployment — you should see the user-agent strings of bots respecting (or violating) your rules.

AI Robots.txt Generator

Quick presets

Or pick exactly which bots to block

Your robots.txt — copy & paste into the root of your site

What changed in 2024–2026 with AI crawlers

The 3 main strategies

The 25+ AI crawlers we cover

Three things people get wrong

Should you also use llms.txt?

FAQ