Forecast token costs across every active client build before the SOW gets signed — not after the first invoice from OpenAI hits a margin you committed three months ago. Compare GPT-5, Claude 4.6, Gemini 3, DeepSeek, and 25+ more models against your real workflow shape.
The category of "AI agency" barely existed three years ago. In 2026 it is one of the fastest-growing segments of the services economy — small, lean shops shipping agent builds, n8n and Make automations wired into LLMs, custom GPTs, retrieval pipelines, voice agents, and bespoke Claude or GPT-5 workflows for mid-market clients who do not have an in-house AI team. The work pays well. The margin trap underneath it is brutal.
Token spend is the line that breaks AI agencies. A six-figure retainer that looks gorgeous on the contract can quietly slide into negative margin if the client triples their automation run volume in month two and the agreement did not carve out a usage cap. A fixed-bid agent build that quoted at $18,000 can burn $4,200 in OpenAI charges across the eval and iteration phase alone if nobody modeled the build-phase multiplier up front.
This calculator is the missing pre-bid step. Enter realistic prompt size, expected run volume, model choice, and client count, and it cross-multiplies against published rates for every major hosted model. The output is a per-workflow and per-client cost table that drops straight into a proposal appendix, a master SOW, or the COGS row of an internal margin tracker.
Before you send a fixed bid on an agent or automation build, model the expected token spend across the build phase (eval runs, iteration, multi-turn debugging) and the steady-state run phase the client will inherit. Most under-bids on AI agency work come from skipping the build-phase multiplier — iteration burns four to twelve times more tokens than steady-state operation. The calculator surfaces the spread so the bid lands with margin instead of hope.
Whether you bundle LLM costs into the retainer or pass them through with a handling markup, the client meeting goes considerably better when you arrive with a defensible per-run cost number. The calculator generates the line item your SOW needs: estimated monthly token spend at projected volume, the handling markup band, and the usage cap above which overage billing kicks in. Procurement teams sign faster when the AI line is itemized rather than buried.
Agencies serving multiple clients on overlapping AI workflows almost always discover that two or three accounts quietly drive 70-85% of their token spend. The calculator models per-client cost so you see which retainers are subsidizing which, and where a price increase, a model swap, or a routing change recovers margin fastest. Run it once per quarter and the next round of retainer renewals walks in with real numbers attached.
The economics of agent work differ from one-shot LLM calls. A multi-step agent that loops through tool calls, plans, retries, and re-asks the model can burn 8-30x the tokens of a single completion per task. The calculator handles agent loops explicitly, so you can forecast both the build-phase eval cost and the per-execution run cost the client will see after launch — the two numbers that determine whether a flat-fee build is profitable.
Most AI agency engagements default to a flagship model because it is the safe answer in a pitch. The calculator lets you model a routed setup — Haiku 4.5 or Gemini Flash on high-volume classification, intent routing, and pre-processing; Claude Sonnet 4.6 or GPT-5 on the explicit user-facing generation — and shows the margin recovered. On most active agency engagements the routing layer pays for itself inside the first three weeks of operation.
Three failure modes show up repeatedly inside AI agencies in 2026, and all three are visible in the calculator before the bid leaves the inbox:
For canonical per-token rates check OpenAI's pricing page and Anthropic's Claude pricing. For broader benchmarks on agency pricing and AI-services margins, the Demand Curve and HubSpot agency-economics reports track the shift in deliverable composition as AI work absorbs a larger share of agency revenue.
To make the numbers concrete, here is how a typical four-client AI agency portfolio — two automation retainers, one agent build, one custom GPT engagement — lands when run through the calculator:
| Model | Monthly token spend | Margin at 35% markup |
|---|---|---|
| GPT-5 across all workflows | $2,840 | $994 absorbed by agency |
| Claude Sonnet 4.6 across all | $1,920 | $672 absorbed by agency |
| Gemini 3 Flash across all | $284 | $99 absorbed by agency |
| DeepSeek V3.1 across all | $172 | $60 absorbed by agency |
| Mixed routing (recommended) | $640 | $224 absorbed by agency |
Numbers above are illustrative. Plug your real portfolio shape into the live tool to get a current comparison against the latest published rates. The "mixed routing" row is the pattern most profitable AI agencies settle into — cheap models for high-volume background steps, a flagship reserved for the explicit user-facing generation pass. Multiply your own per-client cost by the number of clients on each retainer tier to see exactly where your margin is leaking and which engagement deserves a price increase at renewal.
The token bill is one line on your P&L. The other lines — brand, marketing site, client compliance, and acquisition assets — the TinyTools suite covers most of them without adding a single SaaS seat to your monthly stack:
The pattern is the same across all of them: free, single-purpose, no signup, no extra seat license to expense back to a client. For broader reading on AI agency operations and pricing, the Indie Hackers archive tracks emerging AI-services business models, and the Harvard Business Review professional services category publishes regular work on services pricing and margin defense.
Yes — the table is plain HTML, so it pastes cleanly into Google Docs, Notion, PandaDoc, or any SOW template. Most agencies paste the per-run cost into the deliverables section and the assumptions block into the appendix so procurement can audit the math.
Yes. Agent runs that loop through multiple tool calls, planning passes, and retries are first-class. Set the average steps per task and the calculator multiplies the per-step token cost accordingly, so the per-execution number reflects realistic agent behavior rather than a single-shot completion.
Yes. GPT-5 mini, Claude Haiku 4.5, Gemini 3 Flash, and DeepSeek's full lineup are all included. Mini tiers run 5-20x cheaper than the flagship and are usually the right default for embeddings, classification, intent routing, and any internal step the client does not consciously experience as "the AI."
The calculator reads from a price table we update whenever a major provider publishes a change. Expect 1-3 day lag on smaller providers, near-real-time on the top five.
Yes — the cost breakdown is calibrated to be a defensible artifact in a pricing conversation. Many AI agency owners walk into a renewal meeting with the calculator output printed as an appendix, which turns the "your costs went up" conversation from confrontation into shared math.