Question 1

How should an AI agency price token costs into a client project?

Accepted Answer

The dominant 2026 pattern is a transparent pass-through line on the SOW with a 25-50% handling markup, plus a usage cap that converts to overage billing above the cap. Estimate tokens per workflow run multiplied by expected run volume across the engagement window, multiply by the published model rate, add 30-40% headroom for prompt iteration and retries, and put it on the master statement as 'LLM API usage, estimated, billed at cost plus handling.' The TinyTools AI Cost Calculator produces the underlying number so the line item is grounded in math rather than guessed margin.

Question 2

Should an AI agency eat token costs or pass them through to clients?

Accepted Answer

Below roughly $200 per month in expected token spend, most AI agencies bundle the bill into the retainer and quietly absorb it as a cost of doing business. Above that, pass-through with handling markup is standard — eating a token bill that scales unpredictably with the client's volume is how AI agencies discover, three months in, that their flagship retainer is margin-negative. The deciding factor is whether the client controls the run volume. If yes, pass through. If your agency controls volume, bundle and price the risk in.

Question 3

What is a healthy gross margin for an AI agency in 2026?

Accepted Answer

Established AI agencies running automation, agent, and custom LLM build work target 55-72% blended gross margin in 2026, with the AI tooling line treated as variable COGS. Boutique builders and prompt-engineering shops can push higher (70-82%) because token spend is a smaller share of delivered value. Anything below 45% blended margin usually means the agency is either eating client-controlled token spend without markup or running flagship models on workloads that would happily accept a mini-tier model.

Question 4

How do AI agencies model build versus run costs for agents?

Accepted Answer

Agent builds have two cost phases that need separate modeling. The build phase is heavy on prompt iteration, eval runs, and multi-turn debugging — usually 4-12x the per-task token cost of steady-state operation. The run phase is the per-execution cost the client will see on every invocation post-launch. Quoting fixed-bid build work without forecasting the iteration multiplier is the single most common way AI agencies under-bid an engagement. The calculator's iteration multiplier accounts for the build phase explicitly.

Question 5

Is the AI Cost Calculator free for AI agency owners?

Accepted Answer

Yes. The calculator is free, requires no signup, and runs entirely in your browser. Nothing about your client list, workflow definitions, or margin assumptions leaves your machine. Use the estimates inside SOWs, project bids, retainer proposals, and internal margin models with no licensing constraints.

Model	Monthly token spend	Margin at 35% markup
GPT-5 across all workflows	$2,840	$994 absorbed by agency
Claude Sonnet 4.6 across all	$1,920	$672 absorbed by agency
Gemini 3 Flash across all	$284	$99 absorbed by agency
DeepSeek V3.1 across all	$172	$60 absorbed by agency
Mixed routing (recommended)	$640	$224 absorbed by agency

AI Cost Calculator for AI Agency Owners

Why AI agencies need a real cost model, not a vibe

Five jobs it actually does for AI agency owners

1. Pre-bid token forecasting on fixed-price builds

2. SOW pass-through and handling markup math

3. Retainer pricing across a portfolio of clients

4. Build-vs-run cost separation for agent projects

5. Model arbitrage and routing planning

What AI agencies usually get wrong

Sample AI agency portfolio workload

How this fits the rest of an AI agency's stack

Frequently asked questions

Can I drop the per-workflow cost number directly into a client SOW?

Does it model agent loops, tool calls, and multi-step workflows?

Does it cover the cheap mini and Flash tiers for high-volume background work?

How current is the pricing data?

Can I share the output with a client during pricing discussions?