What an LLM API actually costs

The same model can cost 5x more depending on where you run it. Enter your monthly token volume and see every model ranked by your real bill, across 18 providers.

77 model and provider rates, per 1M tokens, standard non-cached. Checked June 2026.

Want this on your own site? Embed it free

Cheapest for this workload

Model	Provider	$/1M in	$/1M out	Ctx	Your $/mo
gpt-oss-20Bopen	OpenRouter	$0.029	$0.14	131k	$3.42
Llama 3.1 8Bopen	Groq	$0.050	$0.080	128k	$3.96
gpt-oss-120Bopen	OpenRouter	$0.039	$0.18	131k	$4.50
gpt-oss-20Bopen	Together AI	$0.050	$0.20	128k	$5.40
Qwen3 235Bopen	DeepInfra	$0.071	$0.10	256k	$5.46
Qwen3 235Bopen	OpenRouter	$0.071	$0.10	262k	$5.46
Mistral Small 3.2 24Bopen	DeepInfra	$0.075	$0.20	125k	$6.90
gpt-oss-20Bopen	Fireworks AI	$0.070	$0.30	-	$7.80
gpt-oss-20Bopen	Groq	$0.075	$0.30	128k	$8.10
Llama 4 Scoutopen	DeepInfra	$0.080	$0.30	320k	$8.40
Llama 4 Scoutopen	OpenRouter	$0.080	$0.30	10000k	$8.40
DeepSeek V4-Flashopen	DeepInfra	$0.10	$0.20	1024k	$8.40
Llama 3.3 70Bopen	DeepInfra	$0.10	$0.32	128k	$9.84
Llama 4 Scoutopen	Groq	$0.11	$0.34	128k	$10.7
Ministral 3 8Bopen	Mistral1st-party	$0.15	$0.15	128k	$10.8
DeepSeek V4-Flashopen	DeepSeek1st-party	$0.14	$0.28	1000k	$11.8
DeepSeek V4-Flashopen	Fireworks AI	$0.14	$0.28	-	$11.8
Mistral Small 4open	Mistral1st-party	$0.15	$0.60	128k	$16.2
gpt-oss-120Bopen	Together AI	$0.15	$0.60	128k	$16.2
gpt-oss-120Bopen	Fireworks AI	$0.15	$0.60	-	$16.2
gpt-oss-120Bopen	Groq	$0.15	$0.60	128k	$16.2
Llama 4 Maverickopen	DeepInfra	$0.15	$0.60	1024k	$16.2
Llama 4 Maverickopen	OpenRouter	$0.15	$0.60	1000k	$16.2
Llama 4 Scoutopen	Together AI	$0.18	$0.59	1000k	$17.9
Qwen3 235Bopen	Together AI	$0.20	$0.60	262k	$19.2
DeepSeek V3open	OpenRouter	$0.20	$0.80	131k	$21.6
Qwen3 32Bopen	Groq	$0.29	$0.59	131k	$24.5
Llama 4 Maverickopen	AWS Bedrock	$0.24	$0.97	1000k	$26.0
Llama 4 Maverickopen	Together AI	$0.27	$0.85	1048k	$26.4
GPT-5.4 nanoclosed	OpenAI1st-party	$0.20	$1.25	400k	$27.0
Codestralclosed	Mistral1st-party	$0.30	$0.90	256k	$28.8
Qwen3-Coder 480Bopen	DeepInfra	$0.30	$1	256k	$30.0
gpt-oss-120Bopen	Cerebras	$0.35	$0.75	131k	$30.0
MiniMax M3open	MiniMax1st-party	$0.30	$1.20	1000k	$32.4
Gemini 3.1 Flash-Liteclosed	Google1st-party	$0.25	$1.50	-	$33.0
Qwen3-Coder Flashopen	Alibaba1st-party	$0.30	$1.50	1000k	$36.0
DeepSeek V4-Proopen	DeepSeek1st-party	$0.43	$0.87	1000k	$36.5
Llama 3.3 70Bopen	Groq	$0.59	$0.79	128k	$44.9
Mistral Large 3open	Mistral1st-party	$0.50	$1.50	128k	$48.0
Mistral Large 3open	AWS Bedrock	$0.50	$1.50	-	$48.0
Qwen3 235Bopen	Cerebras	$0.60	$1.20	131k	$50.4
Llama 3.3 70Bopen	AWS Bedrock	$0.72	$0.72	128k	$51.8
DeepSeek-R1open	DeepInfra	$0.50	$2.15	160k	$55.8
Kimi K2open	OpenRouter	$0.57	$2.30	131k	$61.8
DeepSeek-R1open	OpenRouter	$0.70	$2.50	164k	$72.0
Llama 3.3 70Bopen	Together AI	$1.04	$1.04	131k	$74.9
Grok Build 0.1closed	xAI1st-party	$1	$2	256k	$84.0
Kimi K2open	Groq	$1	$3	256k	$96.0
GPT-5.4 miniclosed	OpenAI1st-party	$0.75	$4.50	400k	$99.0
MAI-Code-1 Flashclosed	Microsoft1st-party	$0.75	$4.50	-	$99.0
Grok 4.3closed	xAI1st-party	$1.25	$2.50	1000k	$105
Kimi K2.6open	Moonshot1st-party	$0.95	$4	262k	$105
Kimi K2.7 Codeopen	Moonshot1st-party	$0.95	$4	262k	$105
Kimi K2.6open	Fireworks AI	$0.95	$4	-	$105
DeepSeek V4-Proopen	DeepInfra	$1.30	$2.60	1024k	$109
Claude Haiku 4.5closed	Anthropic1st-party	$1	$5	200k	$120
Qwen3-Coder Plusopen	Alibaba1st-party	$1	$5	1000k	$120
Kimi K2.6open	Together AI	$1.20	$4.50	262k	$126
GLM-5.2open	Z.ai1st-party	$1.40	$4.40	200k	$137
Qwen3-Maxclosed	Alibaba1st-party	$1.20	$6	262k	$144
Qwen3-Coder 480Bopen	Together AI	$2	$2	262k	$144
Qwen3-Coder 480Bopen	Cerebras	$2	$2	131k	$144
DeepSeek V4-Proopen	Fireworks AI	$1.74	$3.48	-	$146
DeepSeek V4-Proopen	Together AI	$2.10	$4.40	512k	$179
Kimi (moonshot-v1-128k)closed	Moonshot1st-party	$2	$5	131k	$180
Mistral Medium 3.5open	Mistral1st-party	$1.50	$7.50	128k	$180
Mixtral 8x22Bopen	Mistral1st-party	$2	$6	64k	$192
Gemini 3.5 Flashclosed	Google1st-party	$1.50	$9	1000k	$198
Gemini 3.1 Proclosed	Google1st-party	$2	$12	1000k	$264
GPT-5.3 Codexclosed	OpenAI1st-party	$1.75	$14	400k	$273
GPT-5.4closed	OpenAI1st-party	$2.50	$15	1050k	$330
Claude Sonnet 4.6closed	Anthropic1st-party	$3	$15	1000k	$360
Claude Opus 4.8closed	Anthropic1st-party	$5	$25	1000k	$600
GPT-5.5closed	OpenAI1st-party	$5	$30	1050k	$660
Claude Fable 5closed	Anthropic1st-party	$10	$50	1000k	$1,200
Claude Mythos 5closed	Anthropic1st-party	$10	$50	1000k	$1,200
GPT-5.5 Proclosed	OpenAI1st-party	$30	$180	1050k	$3,960

Sorted cheapest-first for your token volume. Open = open-weight (runnable on multiple hosts). Coloured dot = our confidence in the rate. Tap a provider to verify at the source. Rates exclude caching and batch discounts.

LLM prices move weekly. Get told when yours drop.

We re-check these rates constantly. Drop your email and we will tell you when a model or host you care about changes its price, plus the occasional AI-tools digest. Free, no spam.

Stay ahead of the AI tools curve

Picks, reviews, and automation tips every weekday. Free, no spam.

What the headline rate does not tell you

The things that move your real LLM bill, and that no single pricing page will tell you.

The same open model is not one price

An open-weight model (Llama 4, DeepSeek, Qwen3, Kimi K2, gpt-oss) is sold by many hosts at very different rates. DeepSeek V4-Pro ranges from $0.44 first-party to $2.10 on Together. Pick the model first, then the cheapest host that meets your latency and data needs.

First-party vs reseller

A first-party API (the model owner) often has the lowest price and the newest version, but a host like Groq or Cerebras can be far faster, and hyperscalers (Bedrock, Azure) add governance and data residency at a markup.

Input and output are priced differently

Output tokens usually cost 2 to 8x input. A chatbot (output-light) and a RAG pipeline (input-heavy) rank tools differently, which is why this tool takes your in/out split, not a single number.

Caching and batch cut the bill

Most providers bill cached input at roughly 10% of the normal rate, and batch APIs run about 50% off. For repeated context (RAG, agents) the cached rate, not the headline, is your real cost.

Context-length price tiers

Some models charge more once a prompt crosses a size threshold. Gemini 3.1 Pro doubles input over 200k tokens; Qwen and GPT-5.x step up past their base tier. Long-context workloads can cost several times the headline rate.

Open weights shift the cost, they do not remove it

An open model is cheap per token, but you own quality, evals, rate limits, and uptime. The cheapest row is not always the cheapest total once reliability and engineering time are counted.

Frequently asked questions

What is the cheapest LLM API in 2026?

For most workloads the cheapest options are small open-weight models on commodity hosts: OpenAI gpt-oss-20B on OpenRouter (about $0.03 in / $0.14 out per 1M), Llama 3.1 8B on Groq ($0.05 / $0.08), and Qwen3 235B on DeepInfra ($0.071 / $0.10). The cheapest frontier-class option is usually DeepSeek V4 first-party. The right answer depends on your input-to-output token mix, which is why this tool takes both.

How much does GPT-5.5 cost compared to Claude Opus 4.8?

As of June 2026, GPT-5.5 is $5 per 1M input and $30 per 1M output; Claude Opus 4.8 is $5 input and $25 output. They are close on input, but Opus is cheaper on output, so output-heavy workloads favor Claude and input-heavy ones are roughly even. Both are far above mid-tier models like GPT-5.4 ($2.50 / $15) or Gemini 3.5 Flash ($1.50 / $9).

Why does the same open model cost different amounts on different providers?

An open-weight model (Llama 4, DeepSeek, Qwen3, Kimi K2, gpt-oss) can be hosted by anyone, so each provider sets its own rate based on its hardware and margin. DeepSeek V4-Pro runs $0.44 / $0.87 first-party, $1.30 / $2.60 on DeepInfra, $1.74 / $3.48 on Fireworks, and $2.10 / $4.40 on Together. Pick the model first, then the cheapest host that meets your latency, context, and data-residency needs.

Are open-weight models cheaper than GPT-5 or Claude?

Almost always, per token. Open models like Llama 4 Scout, Qwen3 235B, and gpt-oss run 10 to 100x cheaper than GPT-5.5 or Claude Opus for the same volume. The trade-off is that you own quality, evals, rate limits, and uptime, so the cheapest row is not always the cheapest total once reliability and engineering time are counted.

Where is the cheapest place to run Llama 4 or DeepSeek?

For Llama 4 Scout, DeepInfra and OpenRouter are around $0.08 in / $0.30 out; Groq is fast at $0.11 / $0.34. For DeepSeek, the V4 first-party API is cheapest ($0.44 / $0.87 for Pro, $0.14 / $0.28 for Flash); DeepInfra is the cheapest third-party host. Use the filters above to compare a single model across every host.

Do these prices include caching or batch discounts?

No, these are standard non-cached, on-demand rates. Most providers bill cached input at roughly 10% of the normal rate, and batch APIs run about 50% off. For repeated context (RAG, agents) your real cost can be well below the headline shown here.

Popular pricing comparisons

Head-to-head pricing breakdowns for the models people compare most - token rates, context, and real monthly cost across four workloads.

GPT-5.5 vs Claude Opus 4.8 Claude Opus 4.8 vs Gemini 3.1 Pro GPT-5.5 vs Gemini 3.1 Pro DeepSeek V4-Pro vs GLM-5.2 GLM-5.2 vs Qwen3-Max DeepSeek V4-Pro vs Kimi K2.7 Code GPT-5.5 vs DeepSeek V4-Pro Claude Opus 4.8 vs DeepSeek V4-Pro MiniMax M3 vs Qwen3-Max

How we calculate

Your monthly cost is simply your input tokens times the model input rate plus your output tokens times the output rate. Every rate is the provider standard, on-demand, non-cached price per 1M tokens, taken from the provider official pricing page and checked in June 2026; tap any provider to verify. A coloured dot flags confidence (amber = sourced from a launch or blog page, or a pricing page we could not fetch cleanly).

Open-weight models appear once per host because the same model genuinely costs different amounts on each. First-party rows are the model owner API. Rates exclude cached-input and batch discounts (usually about 90% and 50% off respectively), and exclude context-length surcharges that some models add past a size threshold. Affiliate relationships never change the ranking; rows are sorted purely by your computed cost.

Pricing a whole team? Try the AI Coding Cost Calculator

Embed this comparator on your site (free)

Drop the live pricing table into any article, blog, or docs page. It updates as we re-verify prices, so it never goes stale. Free to use with attribution.

Accent color

Theme

Width

Height (px)

Tune it to fit your page - the preview and code update live.

Live preview

Copy the embed code

<iframe src="https://pondero.ai/embed/llm-prices" title="LLM API Price Comparator by Pondero" width="100%" height="600" loading="lazy" style="border:1px solid #e5e7eb;border-radius:12px;max-width:100%"></iframe>
<p style="font:13px sans-serif">Live LLM API prices via <a href="https://pondero.ai/products/llm-api-price-comparator/">Pondero</a></p>

The caption link sits in your page (not the iframe), so it counts as a normal credit link. Optional: listen for a pondero-embed-height postMessage to auto-size the frame.