How this AI text detector works
This AI text detector uses nine linguistic and statistical heuristics that have been shown — across multiple academic and industry evaluations — to separate machine-generated text from human writing more reliably than any single signal alone. Each heuristic returns a sub-score; the final verdict combines them into one weighted probability.
1. Em-dash and en-dash density
Large language models, especially the GPT-4 and Claude families, dramatically over-use the em-dash (—) compared to almost any human writer. A 1,000-word human essay uses one or two em-dashes; a typical ChatGPT essay uses six to ten. This signal alone correctly flags a large portion of unedited LLM output.
2. Sentence-length variance (burstiness)
Human writing is bursty: short sentences next to long ones, fragments next to complex clauses. LLM output tends to a steadier rhythm with sentence lengths clustered near a mean. We compute the coefficient of variation of sentence lengths — values below 0.4 are AI-typical.
3. Vocabulary diversity (type-token ratio)
The ratio of unique words to total words. Humans repeat themselves at predictable rates. LLMs often run higher on this metric for short text but lower on longer text where the model's preferred vocabulary recurs heavily.
4. AI-tell phrases
A curated list of 30+ phrases that LLMs use far more often than humans: "delve into", "in the realm of", "navigate the complexities", "in today's fast-paced world", "it's important to note", "tapestry of", "seamlessly integrate", "leverage", "robust", "comprehensive guide". The presence of three or more AI-tells is a very strong signal.
5. Perplexity proxy
True perplexity needs a language model. We approximate it with token-level surprise using a small reference frequency table — the more uniformly common the words, the more "AI-flat" the text reads. Real perplexity is measured against models like GPT-2 in academic detectors, and our proxy correlates moderately well in practice.
6. Comma-to-sentence ratio
LLMs love compound clauses. The mean number of commas per sentence runs noticeably higher in unedited model output, especially for explanatory or instructional prose.
7. Filler-word frequency
Humans write filler — "kind of", "sort of", "basically", "actually", "you know", "I mean". LLMs strip these almost entirely unless explicitly prompted to mimic them.
8. Sentence-opener uniformity
LLMs over-use a few sentence-opening structures: "Moreover,", "Furthermore,", "However,", "Additionally,", "In conclusion,". A human writer rotates openers more freely.
9. Repetition of short n-grams
Even when the high-level meaning differs, AI output reuses 3-grams and 4-grams within a single document at higher rates than humans. The detector counts repeated trigrams as a tie-breaker signal.
How accurate is a heuristic AI text detector?
Pure-heuristic detectors like this one typically reach 75–85% accuracy on raw, unedited LLM output of 300+ words. Accuracy drops on shorter text (under 200 words) and on text that has been lightly edited by a human. Commercial detectors that combine heuristics with a fine-tuned classifier (Originality, GPTZero, Turnitin) reach 90%+ but charge per check and send your text to their servers. This tool is honest about its limits and free.
What this detector is good at
- Spotting unedited ChatGPT, Claude, Gemini, and Llama output of 300+ words
- Catching the most common AI-tell phrases that survive light editing
- Running entirely offline — your text never leaves your machine
- Returning a per-signal breakdown so you can see why something looks AI
What it cannot do
- Detect heavily human-edited AI text — once a human rewrites the openers and trims AI-tells, all heuristic detectors weaken
- Distinguish AI text from text deliberately written in an AI-mimicking style
- Replace a forensic check by a teacher or editor — treat the verdict as a probability, not proof
FAQ
Does this AI text detector send my text anywhere? No. Open the browser network tab while you analyze — you will see zero requests after the page loads. The whole detector is JavaScript running on your machine.
Can I use this for student essays? Use it as a probability, not a verdict. False positives are real — non-native English writers and people with formal writing styles can trigger AI signals. Combine the result with a conversation, draft history, or in-class writing sample.
How is this different from GPTZero or Originality.ai? Those tools use trained classifiers and cost money per check. This is a heuristic-only tool, free, and fully private. It is less accurate at the margins but easier to trust because you can read the source and see exactly what it checks.
Will it detect Claude / GPT-5 / Gemini 2 specifically? The heuristics target the shared register of large autoregressive models. ChatGPT, Claude, Gemini, Llama, and Mistral output all have very similar tells in unedited form. A model fine-tuned for an extreme style (poetry, code, marketing copy) will read more naturally and be harder to catch.
Why is short text harder? The signals are statistical — they need a sample large enough that variance averages out. Under 200 words, almost any text can swing either direction. Aim for 300+ for stable verdicts.