Question 1

Does an MCP server itself cost tokens, or is the cost downstream?

Accepted Answer

The MCP server itself does not bill tokens. It is a protocol layer that exposes tools, resources, and prompts to a host model. The cost lives one layer up, inside the host (Claude Desktop, Cursor, Windsurf, GPT-5 with MCP, or a custom client), because every tool call your server exposes gets schema-injected into the host's context window, and every resource read flows back into that context. A chatty server with 30 tools and 8KB of resource bodies can quietly add 6,000-12,000 tokens of input context to every single user turn even when the tools are not actually invoked, which compounds fast over a multi-hour session.

Question 2

How do I budget token cost for an MCP server in production?

Accepted Answer

Multiply three numbers: (1) the static schema overhead of your server in tokens — every tool signature, parameter description, and resource manifest exposed to the host on connection; (2) the per-invocation context cost when a tool actually fires, including the model's reasoning trace and the tool result you return; (3) expected invocation frequency across active sessions. The TinyTools AI Cost Calculator handles all three explicitly. The number that surprises new MCP server developers is usually (1), the static overhead, because it bills on every turn regardless of whether anything was called.

Question 3

How do tool descriptions affect token cost in MCP?

Accepted Answer

Every word in a tool description, parameter name, parameter description, and JSON schema enum gets serialized into the host model's system prompt at session start and re-included on every turn the model decides to inspect available tools. A 40-tool server with verbose descriptions can ship 4,000-9,000 tokens of pure schema overhead per turn. Tightening tool names from sentences to verbs, collapsing redundant descriptions, and trimming optional parameters is the single highest-leverage cost optimization for production MCP servers.

Question 4

Should an MCP server return small or large resource payloads?

Accepted Answer

Smaller and paginated, always. A resource read that returns a 200KB file is going to expand into roughly 50,000 tokens of context the host model now carries for the rest of the session. Return pointers, summaries, and pagination cursors; let the host model request the specific slice it needs through a follow-up tool call. The cost difference between a chatty server and a disciplined server on the same workload is usually 5-15x on monthly token spend, all attributable to resource payload shape rather than the actual tools.

Question 5

Is the AI Cost Calculator free for MCP server developers?

Accepted Answer

Yes. The calculator is free, requires no signup, and runs entirely in your browser. Nothing about your server's tool schema, resource shape, or expected invocation pattern leaves your machine. Use the estimates inside spec docs, ADRs, internal cost models, and pricing pages for public MCP servers without any licensing constraints.

Host model	Per-session cost (this server)	Of which is schema overhead
Claude Sonnet 4.6 (Desktop)	$0.92	$0.38 (41%)
Claude Opus 4.6	$3.10	$1.28 (41%)
GPT-5 with MCP	$1.45	$0.58 (40%)
GPT-5 mini with MCP	$0.18	$0.07 (39%)
Gemini 3 with MCP	$0.71	$0.29 (41%)

AI Cost Calculator for MCP Server Developers

The hidden token bill behind every MCP server

Five jobs it actually does for MCP server developers

1. Static schema overhead estimation

2. Per-tool-call invocation cost

3. Resource payload sizing decisions

4. Host model selection for MCP-aware clients

5. Pricing a hosted MCP service

What MCP server developers usually get wrong

Sample MCP server cost shape

How this fits the rest of an MCP server developer's stack

Frequently asked questions

Does the calculator know about MCP-specific billing semantics?

Can I model a hosted MCP service with subscription tiers?

Does it cover the cheap mini and Flash host tiers?

How current is the pricing data?

Can I publish the output in my server's README?