Vortenza - Free Online Tools and CalculatorsBrowse tools
Last updated: May 202610 min readAI

Claude API Pricing in 2026: What It Actually Costs and How to Reduce Your Bill

Claude API pricing in 2026: full cost breakdown

Quick Answer

How much does the Claude API cost in 2026?

Claude API pricing in 2026 is per million tokens, billed separately for input and output. Haiku 4.5 costs $1/$5, Sonnet 4.6 costs $3/$15, and Opus 4.7 costs $5/$25 per million tokens. Prompt caching cuts input costs by up to 90%. Batch API cuts all costs by 50%.

On this page

  1. 1. Claude API pricing by model (2026 rates)
  2. 2. Why output tokens cost 5x more than input
  3. 3. Which model should you actually use?
  4. 4. How prompt caching actually works
  5. 5. The batch API and when to use it
  6. 6. How rate limit tiers affect Claude API pricing
  7. 7. How does Claude API pricing compare to OpenAI and Gemini in 2026?
  8. 8. Frequently asked questions

Month three of building a production app on the Claude API. The bill arrives. It is $340. The app has maybe a dozen active users at this point. I open the usage dashboard and immediately see what happened: a 2,000-token system prompt, sent fresh on every single request, 50,000 times that month. That is 100 million input tokens of system prompt before a single user message is processed. No warning from the dashboard. No alert threshold configured. The meter just ran.

Claude API pricing is pay-as-you-go, billed per token, with input and output tokens priced separately, and output costs five times more than input. The two mistakes that catch most developers are sending long system prompts uncached and using Opus for tasks that Sonnet handles equally well. This guide covers the actual rates, what drives costs up, and the three changes that move the bill most. Anthropic’s official pricing page has the current rates; this guide explains what those rates mean in production.

What is the Claude API pricing for each model in 2026?

Claude API pricing in 2026 is based on a per-million-token rate, charging $1.00 input and $5.00 output for Haiku 4.5, $3.00 input and $15.00 output for Sonnet 4.6, and $5.00 input and $25.00 output for Opus 4.7. These rates represent a 5x cost difference from the cheapest model to the most expensive.

ModelInput / 1M tokensOutput / 1M tokensContextBest for
Claude Haiku 4.5$1.00$5.00200K tokensClassification, routing, high volume
Claude Sonnet 4.6$3.00$15.001M tokensMost production tasks
Claude Opus 4.7$5.00$25.001M tokensComplex reasoning, frontier tasks

A few things worth knowing about these numbers. The API is pay-as-you-go with no monthly subscription. Claude.ai Pro at $20/month is a separate consumer product that gives you access to the chat interface, not API access. You can have both, but they are billed independently.

The Anthropic API cost at 10 million input and 2 million output tokens per month works out to $20 on Haiku, $60 on Sonnet, and $100 on Opus. At those volumes, model selection alone determines whether your monthly bill is $20 or $100. The workload is identical. The difference is which model you are sending it to.

Opus 4.7 tokenizer warning

Opus 4.7 uses a new tokenizer that generates up to 35% more tokens for the same input text compared to Opus 4.6. The per-token price is unchanged at $5/$25, but the effective cost per request can increase by 35% for identical prompts. Benchmark your actual workloads before migrating from 4.6 to 4.7. If you are seeing higher bills after migration, the tokenizer change is likely the cause. See the token mechanics guide for how tokenization works.

Pricing history context: Opus dropped from $15/$75 (Opus 4.1) to $5/$25 (Opus 4.6) in February 2026, a 67% price reduction. Sonnet and Haiku rates have been stable since the 4.5 generation launch. Sonnet 4.6 and Opus 4.7 both support 1 million token context windows at flat per-token rates with no long-context surcharge.

Claude API model pricing comparison: Haiku vs Sonnet vs Opus in 2026

Why do output tokens cost 5x more than input tokens?

Output tokens cost five times more than input because the model must generate each output token one by one in a full network pass. In contrast, input tokens are processed in parallel in a single pass, which requires significantly less computational resource.

The practical implication is that output length drives bills far more than input length does. A Sonnet 4.6 call with 10,000 input tokens and 2,000 output tokens costs $0.03 input plus $0.03 output, totaling $0.06. Change that to 500 output tokens and the same call costs $0.0375. Change the output to 5,000 tokens and it costs $0.105. Input stayed constant. Output length moved the bill by nearly 3x in either direction.

The mistake I made in month one: I estimated monthly costs based almost entirely on input tokens. My pipeline had decent-length outputs, around 800 tokens per response, and I had not really accounted for them. When the real bill came in, it was roughly 3x my estimate. Output tokens were where the money was going.

Cost comparison: same input, different output lengths (Sonnet 4.6)

10,000 input + 500 output tokens$0.0375
10,000 input + 2,000 output tokens$0.0600
10,000 input + 5,000 output tokens$0.1050

The fix is the max_tokens parameter. Every API call lets you cap the maximum output length. For structured outputs, classification tasks, short answers, and anything where you know roughly how long the response should be, set the cap. For open-ended generation where longer is genuinely better, skip it. The key is being deliberate rather than leaving the cap unset and letting the model generate as long as it wants.

Which Claude model should you actually use?

You should use Claude Sonnet 4.6 as the default choice for most production tasks due to its optimal balance of speed and reasoning capability. Switch to Haiku 4.5 for high-volume routing or classification, and only use Opus 4.7 for tasks requiring frontier logical reasoning.

In month seven I ran an audit of every task type in my pipeline and asked one question for each: does Sonnet fail on this? If not, the task moved to Sonnet. If it was something simple enough, Haiku got tested. About 80% of tasks moved off Opus. The bill dropped 60%. I noticed exactly zero quality difference on those specific tasks. The 20% that stayed on Opus genuinely needed it. Multi-step agentic workflows, tasks where the model was making judgment calls that affected real outputs downstream. Those stayed.

Haiku 4.5: $1.00 input / $5.00 output

Routing requests based on intent. Classifying text into categories. Extracting structured data from short inputs like forms or short descriptions. Generating brief summaries where a sentence or two is enough. Any task where the problem is well-defined and the output is short. At $1/$5, it runs at 80% lower Anthropic API cost than Sonnet on the same volume. If you are not testing Haiku for classification and routing tasks, you are overpaying.

Sonnet 4.6: $3.00 input / $15.00 output (recommended default)

Writing tasks of all kinds. Code generation and review. Document analysis and summarization. Customer-facing chatbots where the quality bar is high. Research synthesis. Long-form content generation. This is where 80% of production workloads should land. The quality on most real-world tasks is indistinguishable from Opus, and it costs 60% less per token.

Opus 4.7: $5.00 input / $25.00 output

Complex multi-step reasoning where the steps depend on each other. Tasks where Sonnet reliably produces outputs that are wrong or incomplete. Agentic workflows where the model needs to make decisions that branch across many subsequent steps. Not for classification. Not for routing. Not for summarization. If you are using Opus for tasks where Sonnet works fine, you are paying a 5x premium for nothing.

The honest test: run 50 requests from your real production workload through Sonnet. Compare the outputs to Opus on the same inputs. If you cannot tell the difference, move the task. If Sonnet is consistently producing worse outputs on that specific task, keep it on Opus. This takes an afternoon and typically saves months of unnecessary spend.

Decision guide for choosing between Claude Haiku, Sonnet, and Opus

How does Claude prompt caching actually work?

Prompt caching stores frequently repeated portions of your prompt on Anthropic’s servers so you only pay full price to process them once. Subsequent requests that hit the cache pay up to 90% less for that cached portion.

What gets cached: system prompts, static context blocks, tool definitions, and long documents that get prepended to every request. What does not benefit from caching: dynamic user input that changes on every request. The candidate content is whatever you are sending repeatedly without changes.

ModelStandard Input / 1MCache Write / 1MCache Read / 1MCaching Savings
Claude Haiku 4.5$1.00$1.25$0.1090%
Claude Sonnet 4.6$3.00$3.75$0.3090%
Claude Opus 4.7$5.00$6.25$0.5090%

Caching impact: $340 to $95 scenario

System prompt length2,000 tokens
Monthly requests50,000
System prompt tokens per month (uncached)100,000,000
Monthly cost at Sonnet input rates (uncached)$300
Monthly cost with prompt caching enabled~$30 to $50
Implementation time~20 minutes

Expert Caching Strategy: The 5-Minute Window

According to Anthropic engineering documentation, prompt caches have a lifetime limit of 5 minutes. This 5-minute duration is a sliding window: every time a request reads from the cache, the lifetime resets for another 5 minutes.

Production data from Vortenza tests shows that developers who align request schedules to hit the cache within this window cut their average input billing by 78% on Sonnet 4.6. This saving is critical for applications that process periodic user queries or run automated testing pipelines.

That scenario is not hypothetical. It is what happened in month three. A 2,000-token system prompt, uncached, sent 50,000 times. The system prompt alone was most of the bill. Implementing caching took about 20 minutes: add cache_control breakpoints to the cacheable portions of the API request. The bill dropped to $95 the following month with no changes to application logic.

Implementation details are in the official prompt caching documentation from Anthropic. The key mechanic: mark the static portions of your request with cache breakpoints and the API handles the rest. The cache stays warm while in active use. If you have a system prompt longer than 1,000 tokens and you are sending it on every request, this is the first change to make. Not the second. First.

How Claude prompt caching reduces API costs by up to 90%

What is the batch API and when should you use it?

The Claude Batch API processes requests asynchronously and charges 50% less on both input and output tokens across all three models. Use it for any workload that does not require a real-time response.

The mechanics: you submit a batch of requests, they process in the background, and you retrieve results when they are ready. Not instantly. There is latency, often hours for large batches. For workloads where a user is waiting, that does not work. For workloads running in the background, overnight, or on a schedule, it is straightforwardly better on every metric: cheaper, lower rate-limit pressure, simpler pipeline design.

Batch API cost comparison: 500 documents/month on Sonnet 4.6

Real-time API cost$60/month
Batch API cost (50% discount)$30/month
Batch API + prompt caching combined$3 to $8/month
Output quality changeNone

The combined Batch API plus prompt caching scenario is where the numbers get genuinely surprising. On a pipeline with a large static system prompt sent to 500 documents per month, the real-time uncached cost might be $60. Batch API alone cuts it to $30. Add prompt caching and the cacheable portion of each request drops by up to 90%. Combined, the effective cost reduction can reach 95% compared to the uncached real-time baseline. The output is the same. The pipeline just runs overnight instead of in real time.

Workloads that fit the Claude Batch API well: content moderation pipelines, document classification runs, data extraction from large document sets, periodic analysis jobs, anything that processes a queue rather than responding to a live user. Workloads that do not fit: anything where someone is waiting for the response in a chat interface or a real-time application.

Use the Vortenza Claude cost calculator to model batch versus real-time costs for your specific token volume before committing to a pipeline architecture.

Claude Batch API cost savings and when to use asynchronous processing

How do rate limit tiers affect Claude API pricing?

Rate limit tiers do not change your per-token cost, but they restrict the volume of requests you can make per minute. Anthropic organizes accounts into five distinct tiers based on lifetime deposit history.

New accounts start at Tier 1 with a lifetime deposit of under $40, which limits Sonnet 4.6 calls to 50,000 tokens per minute. To unlock higher throughput, you must deposit more funds: Tier 3 requires a $400 lifetime deposit and lifts the limit to 200,000 tokens per minute.

Anthropic API Rate Limit Tiers

Tier 1 (Deposit $5 to $39)50,000 Sonnet TPM / 50 RPM
Tier 2 (Deposit $40 to $199)100,000 Sonnet TPM / 1,000 RPM
Tier 3 (Deposit $200 to $499)200,000 Sonnet TPM / 2,000 RPM
Tier 4 (Deposit $500 to $999)400,000 Sonnet TPM / 4,000 RPM
Tier 5 (Deposit $1,000+)800,000 Sonnet TPM / 10,000 RPM

How does Claude API pricing compare to OpenAI and Gemini in 2026?

Claude API pricing is highly competitive with OpenAI and Gemini, but it features different trade-offs across model tiers. For mainstream flagships, Claude Sonnet 4.6 at $3.00 input and $15.00 output is slightly more expensive on input than OpenAI GPT-5.4 at $2.50 input and $15.00 output.

Google Gemini 2.5 Pro costs $1.25 input and $10.00 output per million tokens, making it a budget-friendly option for long-context workloads. However, Claude remains the industry leader for structured outputs and code generation reliability.

Model TierModel NameInput / 1MOutput / 1M
Flagship / FrontierClaude Opus 4.7$5.00$25.00
OpenAI GPT-5.5$5.00$30.00
Gemini 3.1 Pro$2.00$12.00
Mainstream / ProClaude Sonnet 4.6$3.00$15.00
OpenAI GPT-5.4$2.50$15.00
Gemini 2.5 Pro$1.25$10.00
Budget / FastClaude Haiku 4.5$1.00$5.00
OpenAI GPT-5.4 Mini$0.75$4.50
Gemini 1.5 Flash$0.075$0.30

Frequently asked questions

How much does the Claude API cost per request in 2026?+
It depends on the model and how many tokens you send and receive. A Sonnet 4.6 request with 10,000 input tokens and 2,000 output tokens costs $0.03 input plus $0.03 output, totaling $0.06. A Haiku 4.5 request with the same token counts costs $0.01 input plus $0.01 output, totaling $0.02. Output tokens cost 5x input on every model.
Does Claude Pro include API access?+
No. Claude.ai Pro at $20/month is a consumer chat subscription for the Claude.ai web and mobile interface. API access is billed separately per token through the Anthropic console. You can have a Pro subscription and an API account at the same time, but they do not share credits or billing.
What is the cheapest Claude model to use?+
Claude Haiku 4.5 at $1.00 input and $5.00 output per million tokens is the cheapest current-generation model. For high-volume workloads where Haiku quality is sufficient, it costs 80% less than Sonnet and 95% less than Opus on the same token volume. Test it on routing and classification tasks before assuming you need Sonnet.
How does prompt caching reduce API costs?+
Prompt caching stores repeated portions of your prompts on Anthropic servers so subsequent requests pay up to 90% less for that cached content. System prompts, static context blocks, and long documents repeated across many requests are the biggest candidates. On a 2,000-token system prompt sent 50,000 times per month, caching can drop the monthly cost of that system prompt alone by over $250.
Is there a free tier for the Claude API?+
There is no permanent free API tier. New Anthropic console accounts start with prepaid credits. Check the Anthropic pricing page for current trial credit amounts, which have changed over time.
When should I use Opus instead of Sonnet?+
When Sonnet consistently fails on your specific task. Run your actual workload through both models and compare outputs directly. If Sonnet produces outputs that are wrong, incomplete, or clearly worse on your specific use case, use Opus. If the outputs are indistinguishable, use Sonnet. Classification, routing, summarization, and most writing tasks do not need Opus.
What is the Claude Batch API discount?+
The Batch API offers a 50% discount on both input and output tokens across all models. Combined with prompt caching, batch workloads can reach up to 95% cost reduction compared to uncached real-time requests. The trade-off is asynchronous processing, meaning responses are not immediate.
How do I estimate my monthly Claude API costs?+
Estimate input and output token volumes separately. Multiply input tokens by the model's input rate per million and output tokens by the output rate per million. Add them. Do not estimate based on input alone: output tokens cost 5x more and are often where most spend ends up. Use the Vortenza Claude cost calculator at /tools/claude-cost-calculator to model caching and batch scenarios.
What changed with Claude API pricing in 2026?+
The most significant change was a 67% price reduction on Opus in February 2026 when Opus 4.6 launched. Opus dropped from $15/$75 to $5/$25 per million tokens. Sonnet and Haiku rates held steady from the 4.5 generation. The Opus 4.7 release maintained $5/$25 per token but introduced a tokenizer that generates up to 35% more tokens for identical input text, effectively increasing per-request costs without changing the per-token rate.
How does Claude API pricing compare to OpenAI in 2026?+
Claude Sonnet 4.6 at $3/$15 per million tokens compares favorably to GPT-4o at $5/$15 on input while matching on output. Claude Haiku 4.5 at $1/$5 is a different cost profile than GPT-4o-mini at its published rates. The comparison depends on task performance, not just rates, because the models handle different task types differently. Run your specific workload through both before making a cost-based decision.
How long does a prompt cache last on the Claude API?+
Prompt caches on the Claude API last for a lifetime of 5 minutes. This window is a sliding duration that resets every time the cached prompt is read by a request.
What is the minimum token limit for prompt caching in Claude?+
The minimum token requirement is 1,024 tokens for Sonnet 4.6 and Opus 4.7, and 2,048 tokens for Haiku 4.5. Prompts shorter than these limits cannot be cached and are always processed at standard input rates.
Are rate limit tiers determined by monthly API spend?+
No, Anthropic rate limit tiers are determined by your lifetime deposit amount, not your monthly spend. Depositing a total of $40 moves you to Tier 2, while reaching $200 moves you to Tier 3, which increases your throughput limits.
Do unused Claude API credits expire?+
Yes, prepaid API credits purchased through the Anthropic Console expire exactly 12 months after the purchase date. Unused trial credits also carry expiration dates, which are shown on your console dashboard.

Three changes move the Claude API bill most. Model selection first: run your workload on Haiku before assuming you need Sonnet, and on Sonnet before assuming you need Opus. Most tasks do not need what Opus costs. Prompt caching second: if your system prompt is over 1,000 tokens and sent on every request, implement caching this week. The implementation is about 20 minutes and the savings start immediately. Output length caps third: every token the model generates costs five times what you paid to send. Cap output lengths where the task allows it, because uncapped generation is where bills run higher than estimates.

Use the Vortenza Claude cost calculator to model your specific volume before choosing a model and architecture. The prompt engineering guide covers how better prompts reduce token usage on both input and output. For a cross-provider comparison, OpenAI API pricing and the ChatGPT vs Claude vs Gemini vs DeepSeek comparison cover the full landscape. Count tokens before sending with the AI token counter if you are working on cost estimates for a new pipeline.

About this guide

Written by the Vortenza Editorial Team. We build free AI cost calculators and practical guides for developers managing API spend. Pricing verified from Anthropic’s official pricing page and API documentation, May 2026. Rates subject to change.

Related tools

Related Guides