Calculators on this site need JavaScript to run. The model reference pages work without it.

Prompt caching ROI calculator

Anthropic gives ~90% off cached input tokens. OpenAI and Gemini do similar. But how much do you actually save? Plug in your numbers and see real monthly cost with caching, plus the breakeven hit rate where it starts paying for itself.

Model (caching-capable only)

Total prompt tokens

Output tokens / call

Cacheable portion (75%)

How much of your prompt is system message + repeated context (cacheable). Tools, system, and a long shared knowledge base are usually 70–95%.

Cache hit rate (80%)

% of calls that hit the cache (vs. need a fresh write). High-traffic chatbots see 80–95%. Long-tail content is usually 30–60%.

Calls / month

Without caching

/ month

With caching

/ month

You save

0% off · breakeven hit rate ·

Models Anthropic-style caching: cache reads at the cached-input rate, cache writes at the cache-write rate (typically 1.25× regular input). Steady-state approximation: assumes cache writes are amortized across many hits.

How prompt caching actually works

When your app sends the same long context repeatedly (system prompts, tool definitions, knowledge bases, few-shot examples), providers can cache the encoded version of those tokens on their side. Subsequent calls within the cache lifetime read from the cache at a heavily discounted rate.

Three things to know:

Cache writes cost more than regular input. Anthropic charges ~1.25× input price for the first call that primes the cache (5-min TTL). 2× for 1-hour TTL. So caching only saves money once enough hits amortize the write.
Hit rate is everything. If most of your traffic is unique long-tail prompts, caching may cost more than it saves. High-traffic chatbots with shared system prompts see 80–95% hit rates and the savings are massive.
Breakeven hit rate tells you the threshold where caching pays for itself. For Claude, it's typically around 22%. Below that, skip it.

OpenAI's prompt caching is automatic and doesn't have a separate write price; cached tokens are simply 50% off. Gemini context caching uses a per-hour storage fee plus discounted reads. This calculator models the Anthropic-style explicit-cache-write case, which is the most complex and where you most need the math. For the simpler models, ignore the breakeven and the savings will be a strict improvement.

Reality check: if your "cacheable portion" is below 30% or your hit rate is below 50%, prompt caching is unlikely to be the optimization to chase first. Look at batch API (50% off if you can wait), cheaper models, or shorter prompts before you wire up caching.