n8n LLM routing: send each agent request to the right model and stop overpaying

Point a Claude Fable 5 node at every request your n8n agent receives and you are paying $10 per million input tokens to answer questions a $1 model would have nailed (Anthropic models overview, June 2026). That is ten times the input cost of Claude Haiku 4.5 for the same tokens. If a big slice of your traffic is the boring stuff (is this email a refund request or a sales lead, does this ticket need a human, what category is this support message), you are buying a frontier reasoning model to do a job a fast classifier finishes in a fraction of the time and price.

We wired up a routing layer in n8n that fixes this, and the whole thing is four node types: a small-model classifier, a Switch, a model-specific call per branch, and an aggregator that puts the answer back on one rail. The build is short. The savings are not subtle. n8n shipped a routing post on June 10 that lays out the theory (n8n, June 10 2026); this is the part where you actually open the canvas and build it.

The cost problem, priced out

Here is the asymmetry that makes routing worth your afternoon. A frontier model is built to grind through long-horizon reasoning. Most agent requests are not that. They are short, structured, and decided in one pass. You are paying reasoning-model rates for classification-model work, on every request, all day.

The three Claude models that matter for a routing setup, with current per-million-token pricing pulled from Anthropic's docs:

Model (pricing source)	Input ($/MTok)	Output ($/MTok)	Best task type
Claude Haiku 4.5	$1	$5	Classification, routing, extraction, short structured answers
Claude Opus 4.8	$5	$25	Multi-step reasoning, agentic coding, anything that needs a real chain of thought
Claude Fable 5	$10	$50	Frontier reasoning: the hard migration, the long autonomous task, the one nobody else cracks

Pricing per Anthropic models overview, confirmed June 2026. Fable 5's $10 / $50 was set at its June 9 launch.

Read the input column top to bottom. Fable 5 costs ten times Haiku 4.5 on input and ten times on output. Opus 4.8 sits in the middle at five times Haiku's input. The job of a router is to keep Fable 5's meter off for everything that does not earn it.

What does Fable 5 actually earn? Anthropic's launch note is specific: Stripe reported it compressed months of engineering into days, running a codebase-wide migration on a 50-million-line Ruby codebase in a day that would have taken a team over two months by hand (Anthropic, June 9 2026). That is the kind of task that justifies the top tier. Sorting an inbox is not.

Example. Say your agent handles 100,000 requests a month and 60% of them are simple classification (roughly 1,500 input tokens and 200 output tokens each). Route all of it to Fable 5 and that classification slice alone runs about 90M input + 12M output tokens, which at $10/$50 is roughly $900 input + $600 output. Send the same slice to Haiku 4.5 at $1/$5 and it is about $90 + $60. You keep Fable 5 for the 40% that needs it. The math is illustrative, but the shape holds: the cheap tier is where most of your volume should live.

What the router has to do

n8n frames the LLM router as a control-plane component that sits between your workflow and the model backends, and it lists five jobs it has to cover (n8n, June 10 2026):

Request analysis. Classify the incoming request by type, complexity, or domain.
Request forwarding. Send it to the right model's endpoint.
Fallback handling. Detect a rate limit or a degraded response and reroute.
Response aggregation. Put outputs back on a single output rail so downstream nodes do not care which model answered.
Logging. Record which model handled what, at what cost.

NVIDIA's reference router describes the same split, with intent-based routing (a small model reads the query semantics) as the strategy most teams reach for first (NVIDIA LLM Router). You do not need their stack. n8n's built-in nodes cover every one of those five jobs, and you can have the first four wired in an afternoon.

The flow you are building looks like this:

  incoming request
        |
        v
  [ Classifier ]   (Haiku 4.5: returns simple | complex | frontier)
        |
        v
  [  Switch  ]
    |    |    |
 simple complex frontier
    |    |    |
    v    v    v
[Haiku][Opus][Fable 5]
    |    |    |
    +----+----+
        |
        v
  [ Aggregator ]   (one output rail)

Step 1: build the classifier as a sub-workflow

Keep the classifier in its own sub-workflow. You want it isolated so you can swap the model, tweak the prompt, and reuse it across agents without touching the main flow. n8n's sub-workflow conversion turns a selection of nodes into a callable workflow, so this is a first-class pattern, not a hack (n8n docs).

Inside the sub-workflow, a single Haiku 4.5 call does the classifying. The trick is forcing structured output so the Switch downstream has something clean to read. Ask for one token, not a paragraph.

You are a request router. Read the user request and reply with EXACTLY ONE word:
- "simple" for classification, extraction, short factual answers, yes/no decisions
- "complex" for multi-step reasoning, code changes, anything needing a chain of thought
- "frontier" for long-horizon autonomous work or a task that has failed on complex

Request: {{ $json.userRequest }}
Reply with one word only.

Set the Haiku node's max output tokens low (5 is plenty) so a chatty model cannot wander. Then add a small Set node right after to normalize the answer, because models occasionally add punctuation or a stray capital:

// n8n Code node (JavaScript): normalize the classifier output
const raw = ($json.text || $json.message?.content || '').toString();
const route = raw.toLowerCase().replace(/[^a-z]/g, '').trim();
const allowed = ['simple', 'complex', 'frontier'];
return [{ json: { route: allowed.includes(route) ? route : 'complex' } }];

The catch most people miss: default the unknown case to complex, not simple. If your classifier hiccups, you want it failing toward a capable model, not silently downgrading a hard request to Haiku and shipping a wrong answer. A slightly-too-expensive call beats a confidently wrong one.

Step 2: the Switch node

Back in the main workflow, drop a Switch node after the classifier sub-workflow. Set it to Rules mode, which lets you define one output branch per condition rather than evaluating a single expression (n8n Switch docs). Three rules, three outputs:

Rule 1: {{ $json.route }} equals simple → output 0
Rule 2: {{ $json.route }} equals complex → output 1
Rule 3: {{ $json.route }} equals frontier → output 2

Turn on the Switch's fallback output and send it to the complex branch too. Same principle as the classifier default: an unmatched route should land on a model that can handle it, not fall off the end of the workflow into a silent failure.

Step 3: the model branches

Each Switch output feeds one model node. In n8n's AI layer these are the chat-model nodes you attach to an Agent or call directly; each one carries its own credential and model ID (n8n advanced AI docs). Three branches, three models:

Switch output 0 (simple)   → Claude Haiku 4.5 node   (model: claude-haiku-4-5)
Switch output 1 (complex)  → Claude Opus 4.8 node    (model: claude-opus-4-8)
Switch output 2 (frontier) → Claude Fable 5 node     (model: claude-fable-5)

Model IDs follow Anthropic's published aliases (models overview). The user's actual request (not the one-word route) flows into each model node. A clean way to carry it is to pass both fields through the classifier sub-workflow so the original prompt is sitting in {{ $json.userRequest }} when the model node fires.

Here is the shape of a direct Anthropic call if you would rather use an HTTP Request node than the built-in chat-model node, which gives you per-branch control over max_tokens:

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: <YOUR_ANTHROPIC_API_KEY>" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-haiku-4-5",
    "max_tokens": 512,
    "messages": [{"role": "user", "content": "<USER_REQUEST>"}]
  }'

Swap model per branch. The cheap branch gets a tight max_tokens; the frontier branch gets room to think. That single field is a real cost lever, because output tokens on Fable 5 run $50 per million (models overview) and an unbounded reasoning model will use them.

Step 4: the aggregator

Three branches walk in; one rail walks out. After the model nodes, merge back to a single path so everything downstream (your response node, your logging, your CRM write) sees one consistent shape regardless of which model answered. A Set node on each branch normalizes the response field first:

// n8n Code node on each model branch: one output shape
return [{
  json: {
    answer: $json.content?.[0]?.text ?? $json.text ?? '',
    model: $json.model ?? 'unknown',
    route: $('Switch').item.json.route,
  }
}];

Run a Merge node (or just let the three branches converge) and you have the router's logging job covered for free: every record carries the model and route that produced it, so you can see your traffic split and confirm the cheap tier is actually doing the heavy lifting.

Tested 2026-06-20 against n8n 1.107 and the Anthropic Messages API. The node names and the content[0].text response shape match n8n's advanced-AI docs and Anthropic's current Messages response format.

The fallback pattern: when the primary model errors

Routing saves money. Fallback is what keeps the agent up when a provider rate-limits you at the worst moment. n8n calls this out as a core router job, and it is the one people skip until it bites (n8n, June 10 2026).

The clean version uses n8n's error handling on the model node. On each node, open Settings and set On Error to Continue (using error output). That gives the node a second output that only fires on failure. Wire that error output into a fallback model node:

  Opus 4.8 node --(success)------> Aggregator
        |
        +--(error output)--> Haiku 4.5 fallback --> Aggregator

A 429 (rate limit) or a 529 (overloaded) from one model then reroutes to a different one instead of killing the run. Two things make this pattern hold up:

Fall sideways, not just down. If Opus rate-limits, a Haiku fallback returns something fast, which is usually better than a dead workflow. But for a request you classified as complex, consider falling over to a different provider at a similar tier rather than dropping two tiers to Haiku, so you do not silently degrade the answer.
Cap the retries. Set a sane retry count on the node (2 is fine) before the error output fires, so a transient blip self-heals without burning a whole fallback chain.

That is the difference between an agent that shrugs off a provider hiccup and one that pages you at 11pm.

n8n vs Make for this build

n8n's routing story is stronger for this specific job: the Switch node, native sub-workflows, and per-node error outputs mean the whole router lives in one self-hostable canvas where every model ID is in plain sight. Make does conditional routing too, and its router module plus error handlers will get you there if you already live in Make. The tilt: n8n for builders who want self-hosting and code-level control, Make if you want the most polished hosted UI and are happy on someone else's infrastructure.

Optional: feed the router better task data

The classifier is only as good as the signal it reads. If your requests arrive as URLs or scraped pages instead of clean text, a structured extraction step in front of the classifier pays off, because a tidy summary classifies more accurately than raw HTML. A scraping API like Firecrawl can hand the router a clean markdown summary to route on, which keeps your cheap classifier cheap and its decisions sharper. Skip this if your inputs are already plain text.

Which setup to build, by who you are

The router pattern is the same for everyone. How far you take it depends on volume and where you run n8n.

Solo dev on the n8n free / self-hosted tier. Build the three-model Switch and stop there. Skip the fallback chain at first; add it the day a rate limit actually bites you. Your win is the cheap-tier split, and on the free self-hosted n8n the only cost you are optimizing is the model bill, which routing cuts immediately. Run the classifier on Haiku, route the long tail to Opus, and reserve Fable 5 for the handful of requests that genuinely need a frontier model.

Small team on n8n Cloud. Add the fallback pattern and the logging aggregator from day one, because a team workflow that silently fails is a support ticket you will not see coming. Use the model and route fields on every aggregated record to watch your traffic split weekly. If more than a third of your volume is hitting Opus or Fable 5, your classifier prompt is too cautious; tighten the simple definition and push more to Haiku.

Enterprise on self-hosted n8n. Treat the classifier as a real component with its own tests and version history, not an afterthought. Route sensitive-data requests to a local or in-VPC model on principle, which n8n's self-hosting makes straightforward and which turns routing from a cost optimization into a compliance control (n8n, June 10 2026). Add provider-level fallback (a second vendor at the same tier), log every route to your own store, and review the split monthly against your bill. At enterprise volume the cheap-tier split is not a nice-to-have; it is the line item that funds the next quarter.

The through-line for all three: stop sending classification work to a reasoning model. Build the four nodes, default your unknowns toward capable models, and let the cheap tier carry the volume it was built for. You can spin up the router on a free n8n instance today and watch the model bill drop on the first day the cheap branch starts doing its job.