Prompt caching ROI calculator
Anthropic gives ~90% off cached input tokens. OpenAI and Gemini do similar. But how much do you actually save? Plug in your numbers and see real monthly cost with caching, plus the breakeven hit rate where it starts paying for itself.
How much of your prompt is system message + repeated context (cacheable). Tools, system, and a long shared knowledge base are usually 70–95%.
% of calls that hit the cache (vs. need a fresh write). High-traffic chatbots see 80–95%. Long-tail content is usually 30–60%.
Models Anthropic-style caching: cache reads at the cached-input rate, cache writes at the cache-write rate (typically 1.25× regular input). Steady-state approximation: assumes cache writes are amortized across many hits.
How prompt caching actually works
When your app sends the same long context repeatedly (system prompts, tool definitions, knowledge bases, few-shot examples), providers can cache the encoded version of those tokens on their side. Subsequent calls within the cache lifetime read from the cache at a heavily discounted rate.
Three things to know:
- Cache writes cost more than regular input. Anthropic charges ~1.25× input price for the first call that primes the cache (5-min TTL). 2× for 1-hour TTL. So caching only saves money once enough hits amortize the write.
- Hit rate is everything. If most of your traffic is unique long-tail prompts, caching may cost more than it saves. High-traffic chatbots with shared system prompts see 80–95% hit rates and the savings are massive.
- Breakeven hit rate tells you the threshold where caching pays for itself. For Claude, it's typically around 22%. Below that, skip it.
OpenAI's prompt caching is automatic and doesn't have a separate write price; cached tokens are simply 50% off. Gemini context caching uses a per-hour storage fee plus discounted reads. This calculator models the Anthropic-style explicit-cache-write case, which is the most complex and where you most need the math. For the simpler models, ignore the breakeven and the savings will be a strict improvement.