Price retainers and fixed-bid AI work without leaking margin client by client. Plug in deliverable mix, prompt context, and revision rounds — get an honest per-account and per-month cost across GPT-5, Claude 4.6, Gemini 3, DeepSeek, and 25+ more models.
An agency's cost stack used to be predictable: salaries, software seats, and pass-through media. Then 2025 happened, the work shifted from manual production to AI-augmented production, and a brand-new line item appeared inside almost every retainer — variable token spend that scales with deliverable volume, prompt context, and how many rounds a brand director sends back. Most agencies are still pricing as if that line doesn't exist, or as if it's small enough to swallow.
The math gets dangerous when you scale. A creative team running thirty social variants a week per account, with research briefs that pull 20,000 tokens of brand context and three revision cycles, can burn anywhere from $80 to $1,800 per month per client depending entirely on model selection. Multiply that across a roster of twelve accounts and the wrong default model can turn a 22% retainer margin into a 9% retainer margin without a single new hire on payroll. That difference is the size of an account director's salary or the difference between hitting and missing your annual bonus pool.
This calculator exists to put the number on the page before the SOW gets signed. Enter the realistic shape of your service mix — brief length, output volume, revision count, deliverables per month — and it cross-multiplies against published rates for every major hosted model. The output is a sortable cost table that pastes straight into a pitch deck, a retainer worksheet, or a pass-through line on a client invoice.
Before you commit to a monthly retainer ceiling, model the realistic AI bill at the agreed deliverable volume. A "content sprint" retainer that promises 40 social posts, 4 long-form blogs, and 2 email sequences a month sits in a very different cost band depending on whether the underlying model is GPT-5 or Gemini Flash. The calculator's volume slider shows the spread so you set the retainer at a number that survives Q4.
If you bill API usage as a transparent pass-through with markup (the dominant 2026 model), procurement reviewers will compare your number against the public per-token rate. The calculator gives you the at-cost figure plus a clean markup breakdown. Drop it into the MSA and the procurement conversation gets shorter, the renewal conversation gets cleaner, and the receivables aging stays under control.
When you're pitching a new account, the AI line is the one number you can't easily back-of-envelope in a Slack thread. The calculator gives a defensible pitch number in under two minutes, and the same view exports the assumptions sheet your finance lead needs for the win-rate review. No more "we'll figure out tooling after we win it" surprises in week three.
Most agencies discover, sometimes painfully, that paid social management eats more AI token spend than long-form content because of the sheer number of variants. The calculator lets you slice by service line — SEO content, paid social, email, brand — and surfaces which lines are quietly margin-thin so you can re-price or restructure before the next pricing review.
Specialist subs (AI copy shops, automated UGC houses) are pitching agencies hard in 2026. Plug your in-house workload into the calculator, compare against the sub's flat per-deliverable price, and the answer is usually obvious in under a minute. It also exposes when a sub's quote is too cheap to be sustainable, which protects you from the inevitable mid-contract repricing.
Three failure modes show up over and over inside AI-augmented agency work in 2026:
The calculator surfaces all three by design — it asks for context size, output length, and revision count separately, then recomputes against every supported provider. For the canonical per-token rates check OpenAI's pricing page and Anthropic's Claude pricing. For broader coverage of how agencies are restructuring retainers around AI tooling, Digiday and the 4A's publish regular pieces on agency billing models.
To make the numbers concrete, here's how a typical mid-size B2B content retainer lands when run through the calculator:
| Model | Cost / deliverable | Monthly per account |
|---|---|---|
| GPT-5 | $0.74 | $45.88 |
| Claude Sonnet 4.6 | $0.51 | $31.62 |
| Gemini 3 Flash | $0.07 | $4.34 |
| DeepSeek V3.1 | $0.04 | $2.48 |
| Mixed (Flash draft + Sonnet polish) | $0.18 | $11.16 |
Numbers above are illustrative. Plug your real per-account shape into the live tool to get a current comparison against the latest published rates. The "mixed" row is the pattern most profitable agencies settle into — a cheap workhorse for first-pass production, a smarter model for the senior-creative polish layer. Multiply by your roster size to see the annualized impact on agency margin.
The AI bill is one line on the account P&L. The other lines that matter — and where the TinyTools suite already covers most of them without adding another seat or another vendor login:
The pattern is the same across all of them: free, single-purpose, no signup, no extra seat license to expense. For a wider view of how agencies are restructuring fees around AI tooling, the Adweek archive tracks billing-model shifts at independent and holding-company shops, and the IAB publishes guidance on how media markups translate to AI-assisted services.
Yes — the table is plain HTML, so it pastes cleanly into a Notion SOW, a Google Doc pitch, a Keynote deck, or a Figma proposal. Many agencies paste the per-account number directly into the master service agreement as a transparent pass-through estimate with a stated markup.
Yes. GPT-5 mini, Claude Haiku 4.5, Gemini 3 Flash, and DeepSeek's full lineup are all included. Mini tiers run 5-20x cheaper than the flagship and are good enough for caption variants, alt text, hashtag generation, and outline drafts — the majority of an agency's volume.
The calculator reads from a price table that we update whenever a major provider publishes a change. Expect 1-3 day lag on smaller providers, near-real-time on the top five.
The current calculator models one workload at a time, but agencies typically run it three or four times — one per service line — and sum the totals into a roster-wide budget. We are working on a multi-account dashboard for the next release.
Self-hosted GPU pricing is too workload-dependent to model precisely, but we cover hosted serverless rates (Together, Fireworks, Groq, Bedrock) for Llama, Mistral, Qwen, and DeepSeek — those are a reasonable upper bound for what a private deployment saves once ops overhead is factored in. For regulated industries that require it, pair the calculator with the AI Disclosure Generator so the client deliverable carries the correct label.