LLM Cost Calculator

Compare GPT, Claude, Gemini, DeepSeek, and Mistral API costs. Updated June 2026.

AI Tool

Estimate monthly and annual LLM API spend across 12 models. Enter your daily requests and token counts to see which provider is cheapest.

Number of requests per day

Days per month

Average input tokens per request

Average output tokens per request

Results model

Estimated monthly cost (GPT-4o)

$22.50

Monthly requests

3,000

Monthly input tokens

3,000,000

Monthly output tokens

1,500,000

Annual cost

$270.00

Cost per request

$0.00750

Input: $7.50Output: $15.00

Model comparison (sorted by monthly cost)

Provider	Model	Input cost	Output cost	Monthly cost	Annual cost
Google	Gemini Flashcheapest	$0.2250	$0.4500	$0.6750	$8.10
OpenAI	GPT-4.1 Nano	$0.3000	$0.6000	$0.9000	$10.80
DeepSeek	DeepSeek Reasoner	$0.4200	$0.8250	$1.25	$14.94
Mistral	Mistral Small	$0.6000	$0.9000	$1.50	$18.00
DeepSeek	DeepSeek Chat	$0.8100	$1.65	$2.46	$29.52
OpenAI	GPT-4.1 Mini	$1.20	$2.40	$3.60	$43.20
Google	Gemini Pro	$3.75	$7.50	$11.25	$135.00
Mistral	Mistral Large	$6.00	$9.00	$15.00	$180.00
OpenAI	GPT-4.1	$6.00	$12.00	$18.00	$216.00
OpenAI	GPT-4o	$7.50	$15.00	$22.50	$270.00
Anthropic	Claude Sonnet	$9.00	$22.50	$31.50	$378.00
Anthropic	Claude Opus	$45.00	$112.50	$157.50	$1,890.00

Monthly cost comparison

Annual cost comparison

Cost breakdown (input vs output)

GPT-4o monthly split

Input: $7.50 (33.3%)

Output: $15.00 (66.7%)

Reference pricing (per 1M tokens)

Model	Provider	Input	Output	Context
GPT-4o	OpenAI	$2.500	$10.00	128K
GPT-4.1	OpenAI	$2.000	$8.00	128K
GPT-4.1 Mini	OpenAI	$0.400	$1.60	128K
GPT-4.1 Nano	OpenAI	$0.100	$0.40	128K
Claude Sonnet	Anthropic	$3.000	$15.00	200K
Claude Opus	Anthropic	$15.000	$75.00	200K
Gemini Flash	Google	$0.075	$0.30	1M
Gemini Pro	Google	$1.250	$5.00	1M
DeepSeek Chat	DeepSeek	$0.270	$1.10	64K
DeepSeek Reasoner	DeepSeek	$0.140	$0.55	64K
Mistral Large	Mistral	$2.000	$6.00	128K
Mistral Small	Mistral	$0.200	$0.60	128K

Pricing reflects published API rates as of June 2026. Verify current rates on each provider site before budgeting.

About the LLM cost calculator

LLM API pricing is billed per token, with separate rates for input and output. This calculator models your real usage pattern: daily requests, average tokens per call, and days per month. It then applies published June 2026 rates for GPT-4o, GPT-4.1, Claude, Gemini, DeepSeek, and Mistral so you can compare monthly and annual spend before you commit to a provider.

Use it to answer common budget questions: how much will my AI app cost, which model is cheapest at my volume, and what is the cost per request or per user. The comparison table sorts all models by monthly cost and highlights the cheapest option automatically.

How it works

Enter daily usage

Set requests per day, days per month, and average input and output tokens.

Compare all models

See costs for OpenAI, Anthropic, Google, DeepSeek, and Mistral side by side.

Review charts

Monthly, annual, and input vs output breakdown charts update instantly.

Share results

Copy, download CSV, or share a link with your team.

Features

12 model database

GPT-4o, GPT-4.1, Claude, Gemini, DeepSeek, and Mistral with June 2026 rates.

Multi-provider comparison

Sort all models by monthly cost and highlight the cheapest automatically.

Advanced usage mode

Model costs per user with requests per user and monthly growth %.

Cost charts

Monthly, annual, and input vs output breakdown visualizations.

Export and share

Copy results, download CSV, or share a prefilled link.

Free to use

No account or signup required.

What is LLM pricing?

LLM pricing is the per-token rate providers charge when you call their API. Unlike flat SaaS subscriptions, API billing scales with usage. Every prompt you send and every response you receive consumes tokens, and those tokens are multiplied by model-specific rates.

Flagship models like GPT-4o and Claude Opus cost more because they use larger architectures and deliver higher quality on complex tasks. Nano, mini, and flash tiers trade some capability for dramatically lower per-token rates, which makes them ideal for classification, extraction, and high-volume chatbots.

How token billing works

Providers tokenize your text into subword units before processing. You are charged for every token in your prompt (input) and every token the model generates (output). System prompts, conversation history, and retrieved context all count toward input tokens.

Billing is calculated as (input tokens / 1,000,000) x input price plus (output tokens / 1,000,000) x output price. There is no per-request flat fee on standard API tiers. A single long request with a large context window can cost more than many short requests combined.

Input vs output tokens

Input tokens are everything you send to the model: user messages, system instructions, tool results, and RAG context. Output tokens are the model completion. Output is priced higher on nearly every model because generation requires sequential compute for each token produced.

Apps that generate long responses (summaries, code, reports) skew toward output cost. Apps that send large context but expect short answers (classification, routing) skew toward input cost. The breakdown chart in this calculator shows which side drives your bill for the selected model.

How to estimate AI costs

Start with realistic averages: count tokens in a typical request using the AI token counter, multiply by expected daily requests, and multiply again by working days per month. Add 10 to 20% buffer for retries, tool calls, and context growth.

Run the estimate on both your primary model and the cheapest model in the comparison table. The gap between them is your potential savings from model tiering. For multi-model apps, estimate each endpoint separately and sum the results.

How to reduce API spending

The fastest savings come from model selection: move routing, classification, and FAQ tasks to GPT-4.1 Nano or Gemini Flash. Cap max_tokens on output-heavy endpoints. Cache identical system prompts where your provider supports it.

Trim RAG context to only the chunks you need. Batch overnight jobs through provider batch APIs where available for 50% discounts. Monitor cost per request weekly and set alerts before you hit budget limits.

GPT vs Claude vs Gemini cost comparison

At typical chat volumes, Gemini Flash is often the lowest monthly cost thanks to $0.075 input and $0.30 output per million tokens. GPT-4o sits in the mid tier at $2.50 / $10.00 and offers strong general performance. Claude Sonnet at $3.00 / $15.00 costs slightly more per request but excels at coding and long-context work with 200K tokens.

DeepSeek Chat and Reasoner undercut all three on price for developers comfortable with their data policies. For a detailed provider breakdown, see the LLM cost comparison guide. Provider-specific calculators: OpenAI, Claude.

Best practices for AI budget planning

Set a monthly API budget before launch and track cost per active user from day one. Separate development and production keys so test traffic does not inflate forecasts. Re-run this calculator when you change models, add features, or onboard new user segments.

Plan for 2x to 3x growth in the first quarter if your product gains traction. Use the advanced growth field to stress-test higher volume. Keep a fallback model configured so you can downgrade quickly if costs spike unexpectedly.

Frequently asked questions about LLM API pricing

What is an LLM cost calculator?+

An LLM cost calculator estimates how much you will spend on large language model API calls based on your usage volume, average tokens per request, and model pricing. It helps you compare providers before you scale an AI product.

How are AI API costs calculated?+

Providers bill by the token. You multiply your monthly input tokens by the input price per million, add output tokens times the output price per million, and sum the two. This calculator applies that formula across 12 major models automatically.

Why do output tokens cost more?+

Generating text requires more compute than reading it. Models run autoregressive decoding for every output token, so providers price output at 2x to 5x the input rate on most flagship models.

How many tokens equal one word?+

English text averages about 1.3 tokens per word, though code and JSON use more. A 500-word prompt is roughly 650 input tokens. Use the AI token counter for precise counts on your actual text.

Which AI model is cheapest?+

For high-volume workloads, Gemini Flash and GPT-4.1 Nano are typically the lowest cost per request. DeepSeek Chat and Reasoner also rank among the cheapest for developers. The cheapest model depends on your input-to-output ratio.

How can I reduce LLM costs?+

Route simple tasks to smaller models, cap max output tokens, cache repeated system prompts, trim verbose context, and batch non-urgent jobs. Most teams cut spend 50 to 70% by combining model tiering with output limits.

What is token pricing?+

Token pricing is the per-million-token rate a provider charges for API usage. Input and output are priced separately. Rates vary by model tier, with nano and flash models costing far less than flagship reasoning models.

How much does GPT-4o cost?+

GPT-4o costs $2.50 per million input tokens and $10.00 per million output tokens as of June 2026. A request with 1,000 input and 500 output tokens costs about $0.0075 before volume discounts.

How much does Claude cost?+

Claude Sonnet costs $3.00 per million input tokens and $15.00 per million output tokens. Claude Opus costs $15.00 input and $75.00 output per million. Sonnet is the default production choice for most apps.

Should I use GPT, Claude, or Gemini?+

Choose GPT for the broadest ecosystem and tool support. Claude excels at long documents and coding with a 200K context window. Gemini Flash is the best value for high-volume tasks with a 1M context window. Compare all three in the table above.