Agent cost simulator

Estimate the real cost of your AI agent

Most calculators model a single API call. Real agents run loops with tool calls, context growth, reasoning tokens, retries, and prompt caching. Pick a workload template, tweak the numbers, see what it actually costs.

Model Workload template

Estimated monthly cost

$0

Per task

$0

Annualized

$0

Use prompt caching System + tools become cacheable

Cache hit rate 75%

▸ Workload parameters 10 inputs

Turns per task Round-trips per task. Tasks / month System prompt (tokens) Stable. Cacheable. Tools schema (tokens) Sent every turn. Cacheable. User input / turn Output / turn Tool calls / turn Tool result avg (tokens) Reasoning tokens / turn o3, Claude thinking, Gemini thinking. 0 for non-reasoning models. Retry rate (%) Failed tool calls + bad outputs.

Cost breakdown / task

Cacheable input $0
Variable input $0
Output $0
Reasoning $0
Subtotal × retry $0

With vs without caching (monthly)

Without caching $0
With caching $0
Caching saves $0

Stable prefix (system + tools) is cacheable. Variable input (user + tool results) pays full input price. Output includes generated tokens + reasoning tokens. Total scales by (1 + retry rate). Steady-state cache approximation, with writes amortized across hits.

What this models

Every turn in an agent loop sends a stable prefix plus variable input, and generates output that may include reasoning. Caching changes the math on the prefix. Retries multiply everything.

Per-turn input

System prompt: stable, cacheable
Tools schema: stable, cacheable
User input: varies per turn
Tool results: varies per turn

Per-turn output

Reasoning tokens: billed as output (o3, Claude thinking, Gemini thinking)
Tool call args: small, billed as output
Final generation: the model's response

Workload templates

The dropdown above pre-fills representative numbers for common shapes. Adjust freely.

Chatbot. Multi-turn conversation, no tools. Defaults: 8 turns, small system, no tool calls, high cache hit rate (system reused across all conversations).
Coding agent. ReAct-style loop with file and exec tools. Defaults: 15 turns, large tool schema, ~2 tool calls per turn, ~1,500 tokens of tool results, reasoning tokens, 10% retry rate.
RAG Q&A. Single-turn retrieval-augmented. Defaults: 1 turn, small system, ~5,000 tokens of retrieved chunks per query, no tool calls, caching disabled by default.
Customer support agent. Moderate-length conversation with KB lookup tools. Defaults: 6 turns, mid-sized tools, 1 tool call per turn.
Data extraction. Single-shot structured output from documents. Defaults: 1 turn, ~8,000 token document input, ~1,500 token JSON output.

Reality check. Agent costs are dominated by two things: how many turns the loop takes, and how much repeated context you send. If your monthly bill looks scary, look at those two before swapping models. Often the answer is "send less context per turn" or "cap the loop earlier," not "swap GPT-5 for Haiku."

For an explicit caching-vs-no-caching breakdown across providers, see the cache calculator.

Using Claude Code specifically? The general agent calculator above is the right tool for "any agent, any model" workloads. For a Claude-Code-specific view that handles Pro vs Max vs API recommendations and bakes in Claude Code's typical session shape, see the Claude Code cost calculator.