What an LLM API actually costs
The same model can cost 5x more depending on where you run it. Enter your monthly token volume and see every model ranked by your real bill, across 15 providers.
71 model and provider rates, per 1M tokens, standard non-cached. Checked June 2026.
Cheapest for this workload
| Model | Provider | $/1M in | $/1M out | Ctx | Your $/mo |
|---|---|---|---|---|---|
| gpt-oss-20B open | OpenRouter | $0.029 | $0.14 | 131k | $3.42 |
| Llama 3.1 8B open | Groq | $0.050 | $0.080 | 128k | $3.96 |
| gpt-oss-120B open | OpenRouter | $0.039 | $0.18 | 131k | $4.50 |
| gpt-oss-20B open | Together AI | $0.050 | $0.20 | 128k | $5.40 |
| Qwen3 235B open | DeepInfra | $0.071 | $0.10 | 256k | $5.46 |
| Qwen3 235B open | OpenRouter | $0.071 | $0.10 | 262k | $5.46 |
| Mistral Small 3.2 24B open | DeepInfra | $0.075 | $0.20 | 125k | $6.90 |
| gpt-oss-20B open | Fireworks AI | $0.070 | $0.30 | - | $7.80 |
| gpt-oss-20B open | Groq | $0.075 | $0.30 | 128k | $8.10 |
| Llama 4 Scout open | DeepInfra | $0.080 | $0.30 | 320k | $8.40 |
| Llama 4 Scout open | OpenRouter | $0.080 | $0.30 | 10000k | $8.40 |
| DeepSeek V4-Flash open | DeepInfra | $0.10 | $0.20 | 1024k | $8.40 |
| Mistral Small 4 open | Mistral 1st-party | $0.10 | $0.30 | 128k | $9.60 |
| Llama 3.3 70B open | DeepInfra | $0.10 | $0.32 | 128k | $9.84 |
| Llama 4 Scout open | Groq | $0.11 | $0.34 | 128k | $10.7 |
| Ministral 3 8B open | Mistral 1st-party | $0.15 | $0.15 | 128k | $10.8 |
| DeepSeek V4-Flash open | DeepSeek 1st-party | $0.14 | $0.28 | 1000k | $11.8 |
| DeepSeek V4-Flash open | Fireworks AI | $0.14 | $0.28 | - | $11.8 |
| gpt-oss-120B open | Together AI | $0.15 | $0.60 | 128k | $16.2 |
| gpt-oss-120B open | Fireworks AI | $0.15 | $0.60 | - | $16.2 |
| gpt-oss-120B open | Groq | $0.15 | $0.60 | 128k | $16.2 |
| Llama 4 Maverick open | DeepInfra | $0.15 | $0.60 | 1024k | $16.2 |
| Llama 4 Maverick open | OpenRouter | $0.15 | $0.60 | 1000k | $16.2 |
| Llama 4 Scout open | Together AI | $0.18 | $0.59 | 1000k | $17.9 |
| Qwen3 235B open | Together AI | $0.20 | $0.60 | 262k | $19.2 |
| DeepSeek V3 open | OpenRouter | $0.20 | $0.80 | 131k | $21.6 |
| Qwen3 32B open | Groq | $0.29 | $0.59 | 131k | $24.5 |
| Llama 4 Maverick open | AWS Bedrock | $0.24 | $0.97 | 1000k | $26.0 |
| Llama 4 Maverick open | Together AI | $0.27 | $0.85 | 1048k | $26.4 |
| GPT-5.4 nano closed | OpenAI 1st-party | $0.20 | $1.25 | 400k | $27.0 |
| Codestral closed | Mistral 1st-party | $0.30 | $0.90 | 256k | $28.8 |
| Qwen3-Coder 480B open | DeepInfra | $0.30 | $1 | 256k | $30.0 |
| gpt-oss-120B open | Cerebras | $0.35 | $0.75 | 131k | $30.0 |
| Gemini 3.1 Flash-Lite closed | Google 1st-party | $0.25 | $1.50 | - | $33.0 |
| Qwen3-Coder Flash open | Alibaba 1st-party | $0.30 | $1.50 | 1000k | $36.0 |
| DeepSeek V4-Pro open | DeepSeek 1st-party | $0.43 | $0.87 | 1000k | $36.5 |
| Llama 3.3 70B open | Groq | $0.59 | $0.79 | 128k | $44.9 |
| Mistral Large 3 open | Mistral 1st-party | $0.50 | $1.50 | 128k | $48.0 |
| Mistral Large 3 open | AWS Bedrock | $0.50 | $1.50 | - | $48.0 |
| Qwen3 235B open | Cerebras | $0.60 | $1.20 | 131k | $50.4 |
| Llama 3.3 70B open | AWS Bedrock | $0.72 | $0.72 | 128k | $51.8 |
| DeepSeek-R1 open | DeepInfra | $0.50 | $2.15 | 160k | $55.8 |
| Kimi K2 open | OpenRouter | $0.57 | $2.30 | 131k | $61.8 |
| DeepSeek-R1 open | OpenRouter | $0.70 | $2.50 | 164k | $72.0 |
| Llama 3.3 70B open | Together AI | $1.04 | $1.04 | 131k | $74.9 |
| Grok Build 0.1 closed | xAI 1st-party | $1 | $2 | 256k | $84.0 |
| Kimi K2 open | Groq | $1 | $3 | 256k | $96.0 |
| GPT-5.4 mini closed | OpenAI 1st-party | $0.75 | $4.50 | 400k | $99.0 |
| Grok 4.3 closed | xAI 1st-party | $1.25 | $2.50 | 1000k | $105 |
| Kimi K2.6 open | Moonshot 1st-party | $0.95 | $4 | 262k | $105 |
| Kimi K2.6 open | Fireworks AI | $0.95 | $4 | - | $105 |
| DeepSeek V4-Pro open | DeepInfra | $1.30 | $2.60 | 1024k | $109 |
| Claude Haiku 4.5 closed | Anthropic 1st-party | $1 | $5 | 200k | $120 |
| Qwen3-Coder Plus open | Alibaba 1st-party | $1 | $5 | 1000k | $120 |
| Kimi K2.6 open | Together AI | $1.20 | $4.50 | 262k | $126 |
| Qwen3-Max closed | Alibaba 1st-party | $1.20 | $6 | 262k | $144 |
| Qwen3-Coder 480B open | Together AI | $2 | $2 | 262k | $144 |
| Qwen3-Coder 480B open | Cerebras | $2 | $2 | 131k | $144 |
| DeepSeek V4-Pro open | Fireworks AI | $1.74 | $3.48 | - | $146 |
| DeepSeek V4-Pro open | Together AI | $2.10 | $4.40 | 512k | $179 |
| Kimi (moonshot-v1-128k) closed | Moonshot 1st-party | $2 | $5 | 131k | $180 |
| Mistral Medium 3.5 open | Mistral 1st-party | $1.50 | $7.50 | 128k | $180 |
| Mixtral 8x22B open | Mistral 1st-party | $2 | $6 | 64k | $192 |
| Gemini 3.5 Flash closed | Google 1st-party | $1.50 | $9 | 1000k | $198 |
| Gemini 3.1 Pro closed | Google 1st-party | $2 | $12 | 1000k | $264 |
| GPT-5.3 Codex closed | OpenAI 1st-party | $1.75 | $14 | 400k | $273 |
| GPT-5.4 closed | OpenAI 1st-party | $2.50 | $15 | 1050k | $330 |
| Claude Sonnet 4.6 closed | Anthropic 1st-party | $3 | $15 | 1000k | $360 |
| Claude Opus 4.8 closed | Anthropic 1st-party | $5 | $25 | 1000k | $600 |
| GPT-5.5 closed | OpenAI 1st-party | $5 | $30 | 1050k | $660 |
| GPT-5.5 Pro closed | OpenAI 1st-party | $30 | $180 | 1050k | $3,960 |
Sorted cheapest-first for your token volume. Open = open-weight (runnable on multiple hosts). Coloured dot = our confidence in the rate. Tap a provider to verify at the source. Rates exclude caching and batch discounts.
LLM prices move weekly. Get told when yours drop.
We re-check these rates constantly. Drop your email and we will tell you when a model or host you care about changes its price, plus the occasional AI-tools digest. Free, no spam.
Stay ahead of the AI tools curve
Picks, reviews, and automation tips every other day. Free, no spam.
What the headline rate does not tell you
The things that move your real LLM bill, and that no single pricing page will tell you.
The same open model is not one price
An open-weight model (Llama 4, DeepSeek, Qwen3, Kimi K2, gpt-oss) is sold by many hosts at very different rates. DeepSeek V4-Pro ranges from $0.44 first-party to $2.10 on Together. Pick the model first, then the cheapest host that meets your latency and data needs.
First-party vs reseller
A first-party API (the model owner) often has the lowest price and the newest version, but a host like Groq or Cerebras can be far faster, and hyperscalers (Bedrock, Azure) add governance and data residency at a markup.
Input and output are priced differently
Output tokens usually cost 2 to 8x input. A chatbot (output-light) and a RAG pipeline (input-heavy) rank tools differently, which is why this tool takes your in/out split, not a single number.
Caching and batch cut the bill
Most providers bill cached input at roughly 10% of the normal rate, and batch APIs run about 50% off. For repeated context (RAG, agents) the cached rate, not the headline, is your real cost.
Context-length price tiers
Some models charge more once a prompt crosses a size threshold. Gemini 3.1 Pro doubles input over 200k tokens; Qwen and GPT-5.x step up past their base tier. Long-context workloads can cost several times the headline rate.
Open weights shift the cost, they do not remove it
An open model is cheap per token, but you own quality, evals, rate limits, and uptime. The cheapest row is not always the cheapest total once reliability and engineering time are counted.
Frequently asked questions
What is the cheapest LLM API in 2026?
For most workloads the cheapest options are small open-weight models on commodity hosts: OpenAI gpt-oss-20B on OpenRouter (about $0.03 in / $0.14 out per 1M), Llama 3.1 8B on Groq ($0.05 / $0.08), and Qwen3 235B on DeepInfra ($0.071 / $0.10). The cheapest frontier-class option is usually DeepSeek V4 first-party. The right answer depends on your input-to-output token mix, which is why this tool takes both.
How much does GPT-5.5 cost compared to Claude Opus 4.8?
As of June 2026, GPT-5.5 is $5 per 1M input and $30 per 1M output; Claude Opus 4.8 is $5 input and $25 output. They are close on input, but Opus is cheaper on output, so output-heavy workloads favor Claude and input-heavy ones are roughly even. Both are far above mid-tier models like GPT-5.4 ($2.50 / $15) or Gemini 3.5 Flash ($1.50 / $9).
Why does the same open model cost different amounts on different providers?
An open-weight model (Llama 4, DeepSeek, Qwen3, Kimi K2, gpt-oss) can be hosted by anyone, so each provider sets its own rate based on its hardware and margin. DeepSeek V4-Pro runs $0.44 / $0.87 first-party, $1.30 / $2.60 on DeepInfra, $1.74 / $3.48 on Fireworks, and $2.10 / $4.40 on Together. Pick the model first, then the cheapest host that meets your latency, context, and data-residency needs.
Are open-weight models cheaper than GPT-5 or Claude?
Almost always, per token. Open models like Llama 4 Scout, Qwen3 235B, and gpt-oss run 10 to 100x cheaper than GPT-5.5 or Claude Opus for the same volume. The trade-off is that you own quality, evals, rate limits, and uptime, so the cheapest row is not always the cheapest total once reliability and engineering time are counted.
Where is the cheapest place to run Llama 4 or DeepSeek?
For Llama 4 Scout, DeepInfra and OpenRouter are around $0.08 in / $0.30 out; Groq is fast at $0.11 / $0.34. For DeepSeek, the V4 first-party API is cheapest ($0.44 / $0.87 for Pro, $0.14 / $0.28 for Flash); DeepInfra is the cheapest third-party host. Use the filters above to compare a single model across every host.
Do these prices include caching or batch discounts?
No, these are standard non-cached, on-demand rates. Most providers bill cached input at roughly 10% of the normal rate, and batch APIs run about 50% off. For repeated context (RAG, agents) your real cost can be well below the headline shown here.
How we calculate
Your monthly cost is simply your input tokens times the model input rate plus your output tokens times the output rate. Every rate is the provider standard, on-demand, non-cached price per 1M tokens, taken from the provider official pricing page and checked in June 2026; tap any provider to verify. A coloured dot flags confidence (amber = sourced from a launch or blog page, or a pricing page we could not fetch cleanly).
Open-weight models appear once per host because the same model genuinely costs different amounts on each. First-party rows are the model owner API. Rates exclude cached-input and batch discounts (usually about 90% and 50% off respectively), and exclude context-length surcharges that some models add past a size threshold. Affiliate relationships never change the ranking; rows are sorted purely by your computed cost.