What an LLM API actually costs

The same model can cost 5x more depending on where you run it. Enter your monthly token volume and see every model ranked by your real bill, across 15 providers.

71 model and provider rates, per 1M tokens, standard non-cached. Checked June 2026.

Cheapest for this workload

Model Provider $/1M in $/1M out Ctx Your $/mo
gpt-oss-20B open
OpenRouter $0.029 $0.14 131k $3.42
Llama 3.1 8B open
Groq $0.050 $0.080 128k $3.96
gpt-oss-120B open
OpenRouter $0.039 $0.18 131k $4.50
gpt-oss-20B open
Together AI $0.050 $0.20 128k $5.40
Qwen3 235B open
DeepInfra $0.071 $0.10 256k $5.46
Qwen3 235B open
OpenRouter $0.071 $0.10 262k $5.46
Mistral Small 3.2 24B open
DeepInfra $0.075 $0.20 125k $6.90
gpt-oss-20B open
Fireworks AI $0.070 $0.30 - $7.80
gpt-oss-20B open
Groq $0.075 $0.30 128k $8.10
Llama 4 Scout open
DeepInfra $0.080 $0.30 320k $8.40
Llama 4 Scout open
OpenRouter $0.080 $0.30 10000k $8.40
DeepSeek V4-Flash open
DeepInfra $0.10 $0.20 1024k $8.40
Mistral Small 4 open
Mistral 1st-party $0.10 $0.30 128k $9.60
Llama 3.3 70B open
DeepInfra $0.10 $0.32 128k $9.84
Llama 4 Scout open
Groq $0.11 $0.34 128k $10.7
Ministral 3 8B open
Mistral 1st-party $0.15 $0.15 128k $10.8
DeepSeek V4-Flash open
DeepSeek 1st-party $0.14 $0.28 1000k $11.8
DeepSeek V4-Flash open
Fireworks AI $0.14 $0.28 - $11.8
gpt-oss-120B open
Together AI $0.15 $0.60 128k $16.2
gpt-oss-120B open
Fireworks AI $0.15 $0.60 - $16.2
gpt-oss-120B open
Groq $0.15 $0.60 128k $16.2
Llama 4 Maverick open
DeepInfra $0.15 $0.60 1024k $16.2
Llama 4 Maverick open
OpenRouter $0.15 $0.60 1000k $16.2
Llama 4 Scout open
Together AI $0.18 $0.59 1000k $17.9
Qwen3 235B open
Together AI $0.20 $0.60 262k $19.2
DeepSeek V3 open
OpenRouter $0.20 $0.80 131k $21.6
Qwen3 32B open
Groq $0.29 $0.59 131k $24.5
Llama 4 Maverick open
AWS Bedrock $0.24 $0.97 1000k $26.0
Llama 4 Maverick open
Together AI $0.27 $0.85 1048k $26.4
GPT-5.4 nano closed
OpenAI 1st-party $0.20 $1.25 400k $27.0
Codestral closed
Mistral 1st-party $0.30 $0.90 256k $28.8
Qwen3-Coder 480B open
DeepInfra $0.30 $1 256k $30.0
gpt-oss-120B open
Cerebras $0.35 $0.75 131k $30.0
Gemini 3.1 Flash-Lite closed
Google 1st-party $0.25 $1.50 - $33.0
Qwen3-Coder Flash open
Alibaba 1st-party $0.30 $1.50 1000k $36.0
DeepSeek V4-Pro open
DeepSeek 1st-party $0.43 $0.87 1000k $36.5
Llama 3.3 70B open
Groq $0.59 $0.79 128k $44.9
Mistral Large 3 open
Mistral 1st-party $0.50 $1.50 128k $48.0
Mistral Large 3 open
AWS Bedrock $0.50 $1.50 - $48.0
Qwen3 235B open
Cerebras $0.60 $1.20 131k $50.4
Llama 3.3 70B open
AWS Bedrock $0.72 $0.72 128k $51.8
DeepSeek-R1 open
DeepInfra $0.50 $2.15 160k $55.8
Kimi K2 open
OpenRouter $0.57 $2.30 131k $61.8
DeepSeek-R1 open
OpenRouter $0.70 $2.50 164k $72.0
Llama 3.3 70B open
Together AI $1.04 $1.04 131k $74.9
Grok Build 0.1 closed
xAI 1st-party $1 $2 256k $84.0
Kimi K2 open
Groq $1 $3 256k $96.0
GPT-5.4 mini closed
OpenAI 1st-party $0.75 $4.50 400k $99.0
Grok 4.3 closed
xAI 1st-party $1.25 $2.50 1000k $105
Kimi K2.6 open
Moonshot 1st-party $0.95 $4 262k $105
Kimi K2.6 open
Fireworks AI $0.95 $4 - $105
DeepSeek V4-Pro open
DeepInfra $1.30 $2.60 1024k $109
Claude Haiku 4.5 closed
Anthropic 1st-party $1 $5 200k $120
Qwen3-Coder Plus open
Alibaba 1st-party $1 $5 1000k $120
Kimi K2.6 open
Together AI $1.20 $4.50 262k $126
Qwen3-Max closed
Alibaba 1st-party $1.20 $6 262k $144
Qwen3-Coder 480B open
Together AI $2 $2 262k $144
Qwen3-Coder 480B open
Cerebras $2 $2 131k $144
DeepSeek V4-Pro open
Fireworks AI $1.74 $3.48 - $146
DeepSeek V4-Pro open
Together AI $2.10 $4.40 512k $179
Kimi (moonshot-v1-128k) closed
Moonshot 1st-party $2 $5 131k $180
Mistral Medium 3.5 open
Mistral 1st-party $1.50 $7.50 128k $180
Mixtral 8x22B open
Mistral 1st-party $2 $6 64k $192
Gemini 3.5 Flash closed
Google 1st-party $1.50 $9 1000k $198
Gemini 3.1 Pro closed
Google 1st-party $2 $12 1000k $264
GPT-5.3 Codex closed
OpenAI 1st-party $1.75 $14 400k $273
GPT-5.4 closed
OpenAI 1st-party $2.50 $15 1050k $330
Claude Sonnet 4.6 closed
Anthropic 1st-party $3 $15 1000k $360
Claude Opus 4.8 closed
Anthropic 1st-party $5 $25 1000k $600
GPT-5.5 closed
OpenAI 1st-party $5 $30 1050k $660
GPT-5.5 Pro closed
OpenAI 1st-party $30 $180 1050k $3,960

Sorted cheapest-first for your token volume. Open = open-weight (runnable on multiple hosts). Coloured dot = our confidence in the rate. Tap a provider to verify at the source. Rates exclude caching and batch discounts.

LLM prices move weekly. Get told when yours drop.

We re-check these rates constantly. Drop your email and we will tell you when a model or host you care about changes its price, plus the occasional AI-tools digest. Free, no spam.

Stay ahead of the AI tools curve

Picks, reviews, and automation tips every other day. Free, no spam.

What the headline rate does not tell you

The things that move your real LLM bill, and that no single pricing page will tell you.

01

The same open model is not one price

An open-weight model (Llama 4, DeepSeek, Qwen3, Kimi K2, gpt-oss) is sold by many hosts at very different rates. DeepSeek V4-Pro ranges from $0.44 first-party to $2.10 on Together. Pick the model first, then the cheapest host that meets your latency and data needs.

02

First-party vs reseller

A first-party API (the model owner) often has the lowest price and the newest version, but a host like Groq or Cerebras can be far faster, and hyperscalers (Bedrock, Azure) add governance and data residency at a markup.

03

Input and output are priced differently

Output tokens usually cost 2 to 8x input. A chatbot (output-light) and a RAG pipeline (input-heavy) rank tools differently, which is why this tool takes your in/out split, not a single number.

04

Caching and batch cut the bill

Most providers bill cached input at roughly 10% of the normal rate, and batch APIs run about 50% off. For repeated context (RAG, agents) the cached rate, not the headline, is your real cost.

05

Context-length price tiers

Some models charge more once a prompt crosses a size threshold. Gemini 3.1 Pro doubles input over 200k tokens; Qwen and GPT-5.x step up past their base tier. Long-context workloads can cost several times the headline rate.

06

Open weights shift the cost, they do not remove it

An open model is cheap per token, but you own quality, evals, rate limits, and uptime. The cheapest row is not always the cheapest total once reliability and engineering time are counted.

Frequently asked questions

What is the cheapest LLM API in 2026?

For most workloads the cheapest options are small open-weight models on commodity hosts: OpenAI gpt-oss-20B on OpenRouter (about $0.03 in / $0.14 out per 1M), Llama 3.1 8B on Groq ($0.05 / $0.08), and Qwen3 235B on DeepInfra ($0.071 / $0.10). The cheapest frontier-class option is usually DeepSeek V4 first-party. The right answer depends on your input-to-output token mix, which is why this tool takes both.

How much does GPT-5.5 cost compared to Claude Opus 4.8?

As of June 2026, GPT-5.5 is $5 per 1M input and $30 per 1M output; Claude Opus 4.8 is $5 input and $25 output. They are close on input, but Opus is cheaper on output, so output-heavy workloads favor Claude and input-heavy ones are roughly even. Both are far above mid-tier models like GPT-5.4 ($2.50 / $15) or Gemini 3.5 Flash ($1.50 / $9).

Why does the same open model cost different amounts on different providers?

An open-weight model (Llama 4, DeepSeek, Qwen3, Kimi K2, gpt-oss) can be hosted by anyone, so each provider sets its own rate based on its hardware and margin. DeepSeek V4-Pro runs $0.44 / $0.87 first-party, $1.30 / $2.60 on DeepInfra, $1.74 / $3.48 on Fireworks, and $2.10 / $4.40 on Together. Pick the model first, then the cheapest host that meets your latency, context, and data-residency needs.

Are open-weight models cheaper than GPT-5 or Claude?

Almost always, per token. Open models like Llama 4 Scout, Qwen3 235B, and gpt-oss run 10 to 100x cheaper than GPT-5.5 or Claude Opus for the same volume. The trade-off is that you own quality, evals, rate limits, and uptime, so the cheapest row is not always the cheapest total once reliability and engineering time are counted.

Where is the cheapest place to run Llama 4 or DeepSeek?

For Llama 4 Scout, DeepInfra and OpenRouter are around $0.08 in / $0.30 out; Groq is fast at $0.11 / $0.34. For DeepSeek, the V4 first-party API is cheapest ($0.44 / $0.87 for Pro, $0.14 / $0.28 for Flash); DeepInfra is the cheapest third-party host. Use the filters above to compare a single model across every host.

Do these prices include caching or batch discounts?

No, these are standard non-cached, on-demand rates. Most providers bill cached input at roughly 10% of the normal rate, and batch APIs run about 50% off. For repeated context (RAG, agents) your real cost can be well below the headline shown here.

How we calculate

Your monthly cost is simply your input tokens times the model input rate plus your output tokens times the output rate. Every rate is the provider standard, on-demand, non-cached price per 1M tokens, taken from the provider official pricing page and checked in June 2026; tap any provider to verify. A coloured dot flags confidence (amber = sourced from a launch or blog page, or a pricing page we could not fetch cleanly).

Open-weight models appear once per host because the same model genuinely costs different amounts on each. First-party rows are the model owner API. Rates exclude cached-input and batch discounts (usually about 90% and 50% off respectively), and exclude context-length surcharges that some models add past a size threshold. Affiliate relationships never change the ranking; rows are sorted purely by your computed cost.

Pricing a whole team? Try the AI Coding Cost Calculator