LLM Cost Calculator
Estimate monthly and annual LLM API spend across 12 models. Enter your daily requests and token counts to see which provider is cheapest.
Estimated monthly cost (GPT-4o)
$22.50
Monthly requests
3,000
Monthly input tokens
3,000,000
Monthly output tokens
1,500,000
Annual cost
$270.00
Cost per request
$0.00750
Model comparison (sorted by monthly cost)
| Provider | Model | Input cost | Output cost | Monthly cost | Annual cost |
|---|---|---|---|---|---|
| Gemini Flashcheapest | $0.2250 | $0.4500 | $0.6750 | $8.10 | |
| OpenAI | GPT-4.1 Nano | $0.3000 | $0.6000 | $0.9000 | $10.80 |
| DeepSeek | DeepSeek Reasoner | $0.4200 | $0.8250 | $1.25 | $14.94 |
| Mistral | Mistral Small | $0.6000 | $0.9000 | $1.50 | $18.00 |
| DeepSeek | DeepSeek Chat | $0.8100 | $1.65 | $2.46 | $29.52 |
| OpenAI | GPT-4.1 Mini | $1.20 | $2.40 | $3.60 | $43.20 |
| Gemini Pro | $3.75 | $7.50 | $11.25 | $135.00 | |
| Mistral | Mistral Large | $6.00 | $9.00 | $15.00 | $180.00 |
| OpenAI | GPT-4.1 | $6.00 | $12.00 | $18.00 | $216.00 |
| OpenAI | GPT-4o | $7.50 | $15.00 | $22.50 | $270.00 |
| Anthropic | Claude Sonnet | $9.00 | $22.50 | $31.50 | $378.00 |
| Anthropic | Claude Opus | $45.00 | $112.50 | $157.50 | $1,890.00 |
Monthly cost comparison
Annual cost comparison
Cost breakdown (input vs output)
GPT-4o monthly split
Reference pricing (per 1M tokens)
| Model | Provider | Input | Output | Context |
|---|---|---|---|---|
| GPT-4o | OpenAI | $2.500 | $10.00 | 128K |
| GPT-4.1 | OpenAI | $2.000 | $8.00 | 128K |
| GPT-4.1 Mini | OpenAI | $0.400 | $1.60 | 128K |
| GPT-4.1 Nano | OpenAI | $0.100 | $0.40 | 128K |
| Claude Sonnet | Anthropic | $3.000 | $15.00 | 200K |
| Claude Opus | Anthropic | $15.000 | $75.00 | 200K |
| Gemini Flash | $0.075 | $0.30 | 1M | |
| Gemini Pro | $1.250 | $5.00 | 1M | |
| DeepSeek Chat | DeepSeek | $0.270 | $1.10 | 64K |
| DeepSeek Reasoner | DeepSeek | $0.140 | $0.55 | 64K |
| Mistral Large | Mistral | $2.000 | $6.00 | 128K |
| Mistral Small | Mistral | $0.200 | $0.60 | 128K |
Pricing reflects published API rates as of June 2026. Verify current rates on each provider site before budgeting.
About the LLM cost calculator
LLM API pricing is billed per token, with separate rates for input and output. This calculator models your real usage pattern: daily requests, average tokens per call, and days per month. It then applies published June 2026 rates for GPT-4o, GPT-4.1, Claude, Gemini, DeepSeek, and Mistral so you can compare monthly and annual spend before you commit to a provider.
Use it to answer common budget questions: how much will my AI app cost, which model is cheapest at my volume, and what is the cost per request or per user. The comparison table sorts all models by monthly cost and highlights the cheapest option automatically.
How it works
Features
12 model database
GPT-4o, GPT-4.1, Claude, Gemini, DeepSeek, and Mistral with June 2026 rates.
Multi-provider comparison
Sort all models by monthly cost and highlight the cheapest automatically.
Advanced usage mode
Model costs per user with requests per user and monthly growth %.
Cost charts
Monthly, annual, and input vs output breakdown visualizations.
Export and share
Copy results, download CSV, or share a prefilled link.
Free to use
No account or signup required.
What is LLM pricing?
LLM pricing is the per-token rate providers charge when you call their API. Unlike flat SaaS subscriptions, API billing scales with usage. Every prompt you send and every response you receive consumes tokens, and those tokens are multiplied by model-specific rates.
Flagship models like GPT-4o and Claude Opus cost more because they use larger architectures and deliver higher quality on complex tasks. Nano, mini, and flash tiers trade some capability for dramatically lower per-token rates, which makes them ideal for classification, extraction, and high-volume chatbots.
How token billing works
Providers tokenize your text into subword units before processing. You are charged for every token in your prompt (input) and every token the model generates (output). System prompts, conversation history, and retrieved context all count toward input tokens.
Billing is calculated as (input tokens / 1,000,000) x input price plus (output tokens / 1,000,000) x output price. There is no per-request flat fee on standard API tiers. A single long request with a large context window can cost more than many short requests combined.
Input vs output tokens
Input tokens are everything you send to the model: user messages, system instructions, tool results, and RAG context. Output tokens are the model completion. Output is priced higher on nearly every model because generation requires sequential compute for each token produced.
Apps that generate long responses (summaries, code, reports) skew toward output cost. Apps that send large context but expect short answers (classification, routing) skew toward input cost. The breakdown chart in this calculator shows which side drives your bill for the selected model.
How to estimate AI costs
Start with realistic averages: count tokens in a typical request using the AI token counter, multiply by expected daily requests, and multiply again by working days per month. Add 10 to 20% buffer for retries, tool calls, and context growth.
Run the estimate on both your primary model and the cheapest model in the comparison table. The gap between them is your potential savings from model tiering. For multi-model apps, estimate each endpoint separately and sum the results.
How to reduce API spending
The fastest savings come from model selection: move routing, classification, and FAQ tasks to GPT-4.1 Nano or Gemini Flash. Cap max_tokens on output-heavy endpoints. Cache identical system prompts where your provider supports it.
Trim RAG context to only the chunks you need. Batch overnight jobs through provider batch APIs where available for 50% discounts. Monitor cost per request weekly and set alerts before you hit budget limits.
GPT vs Claude vs Gemini cost comparison
At typical chat volumes, Gemini Flash is often the lowest monthly cost thanks to $0.075 input and $0.30 output per million tokens. GPT-4o sits in the mid tier at $2.50 / $10.00 and offers strong general performance. Claude Sonnet at $3.00 / $15.00 costs slightly more per request but excels at coding and long-context work with 200K tokens.
DeepSeek Chat and Reasoner undercut all three on price for developers comfortable with their data policies. For a detailed provider breakdown, see the LLM cost comparison guide. Provider-specific calculators: OpenAI, Claude.
Best practices for AI budget planning
Set a monthly API budget before launch and track cost per active user from day one. Separate development and production keys so test traffic does not inflate forecasts. Re-run this calculator when you change models, add features, or onboard new user segments.
Plan for 2x to 3x growth in the first quarter if your product gains traction. Use the advanced growth field to stress-test higher volume. Keep a fallback model configured so you can downgrade quickly if costs spike unexpectedly.
Frequently asked questions about LLM API pricing
What is an LLM cost calculator?+
An LLM cost calculator estimates how much you will spend on large language model API calls based on your usage volume, average tokens per request, and model pricing. It helps you compare providers before you scale an AI product.
How are AI API costs calculated?+
Providers bill by the token. You multiply your monthly input tokens by the input price per million, add output tokens times the output price per million, and sum the two. This calculator applies that formula across 12 major models automatically.
Why do output tokens cost more?+
Generating text requires more compute than reading it. Models run autoregressive decoding for every output token, so providers price output at 2x to 5x the input rate on most flagship models.
How many tokens equal one word?+
English text averages about 1.3 tokens per word, though code and JSON use more. A 500-word prompt is roughly 650 input tokens. Use the AI token counter for precise counts on your actual text.
Which AI model is cheapest?+
For high-volume workloads, Gemini Flash and GPT-4.1 Nano are typically the lowest cost per request. DeepSeek Chat and Reasoner also rank among the cheapest for developers. The cheapest model depends on your input-to-output ratio.
How can I reduce LLM costs?+
Route simple tasks to smaller models, cap max output tokens, cache repeated system prompts, trim verbose context, and batch non-urgent jobs. Most teams cut spend 50 to 70% by combining model tiering with output limits.
What is token pricing?+
Token pricing is the per-million-token rate a provider charges for API usage. Input and output are priced separately. Rates vary by model tier, with nano and flash models costing far less than flagship reasoning models.
How much does GPT-4o cost?+
GPT-4o costs $2.50 per million input tokens and $10.00 per million output tokens as of June 2026. A request with 1,000 input and 500 output tokens costs about $0.0075 before volume discounts.
How much does Claude cost?+
Claude Sonnet costs $3.00 per million input tokens and $15.00 per million output tokens. Claude Opus costs $15.00 input and $75.00 output per million. Sonnet is the default production choice for most apps.
Should I use GPT, Claude, or Gemini?+
Choose GPT for the broadest ecosystem and tool support. Claude excels at long documents and coding with a 200K context window. Gemini Flash is the best value for high-volume tasks with a 1M context window. Compare all three in the table above.