Agent cost simulator
Estimate the real cost of your AI agent
Most calculators model a single API call. Real agents run loops with tool calls, context growth, reasoning tokens, retries, and prompt caching. Pick a workload template, tweak the numbers, see what it actually costs.
▸ Workload parameters 10 inputs
- Cacheable input $0
- Variable input $0
- Output $0
- Reasoning $0
- Subtotal × retry $0
- Without caching $0
- With caching $0
- Caching saves $0
Stable prefix (system + tools) is cacheable. Variable input (user + tool results) pays full input price. Output includes generated tokens + reasoning tokens. Total scales by (1 + retry rate). Steady-state cache approximation, with writes amortized across hits.
What this models
Every turn in an agent loop sends a stable prefix plus variable input, and generates output that may include reasoning. Caching changes the math on the prefix. Retries multiply everything.
Per-turn input
- System prompt: stable, cacheable
- Tools schema: stable, cacheable
- User input: varies per turn
- Tool results: varies per turn
Per-turn output
- Reasoning tokens: billed as output (o3, Claude thinking, Gemini thinking)
- Tool call args: small, billed as output
- Final generation: the model's response
Workload templates
The dropdown above pre-fills representative numbers for common shapes. Adjust freely.
- Chatbot. Multi-turn conversation, no tools. Defaults: 8 turns, small system, no tool calls, high cache hit rate (system reused across all conversations).
- Coding agent. ReAct-style loop with file and exec tools. Defaults: 15 turns, large tool schema, ~2 tool calls per turn, ~1,500 tokens of tool results, reasoning tokens, 10% retry rate.
- RAG Q&A. Single-turn retrieval-augmented. Defaults: 1 turn, small system, ~5,000 tokens of retrieved chunks per query, no tool calls, caching disabled by default.
- Customer support agent. Moderate-length conversation with KB lookup tools. Defaults: 6 turns, mid-sized tools, 1 tool call per turn.
- Data extraction. Single-shot structured output from documents. Defaults: 1 turn, ~8,000 token document input, ~1,500 token JSON output.
For an explicit caching-vs-no-caching breakdown across providers, see the cache calculator.