LangGraph in production: 3 patterns we would actually ship

Drafted May 6, 2026 by Pondero Editorial

In short

LangGraph is the state-graph runtime under most non-trivial agents people are putting into production this year. As of our May 2026 review, the repo sits at over 30k stars with active weekly commits (github.com/langchain-ai/langgraph), the Python package and TypeScript package ship in lockstep, and the license is MIT (LICENSE) with no commercial-use caveat. The vendor docs cover the primitives well; what they cover less well is which combinations of those primitives survive contact with a real codebase. Below are three patterns we would build on day one, the code shape for each, and the failure modes that surface only after week three.

Why “state graph” is the right abstraction for agents (and when it isn’t)

Most agent failures we see in production are not model failures. They are state-management failures. The agent looped because nothing told it where it had been. The agent forgot a tool result because the conversation history truncated. The agent restarted from scratch on a transient error because the framework had no concept of “resume from step N.”

LangGraph reframes the agent as a directed graph of nodes and edges over a typed state object. Nodes are functions that read state and return updates. Edges are routing decisions. The runtime owns persistence, interrupts, and replay. This is the same shape a workflow engine has, with one difference: edges can be chosen by an LLM rather than hard-coded. Once that lands, three properties fall out: every step is an inspectable event, every state is a checkpoint you can resume from, and every routing decision is a separate place you can swap models or add guardrails.

The abstraction is wrong when your task is fundamentally one-shot prompt-and-respond, when latency budget is sub-second, or when you do not need persistence at all. For a chat completion behind a Slack slash command, raw LangChain or a direct SDK call is simpler and faster. LangGraph pays for itself when an agent runs longer than a single request, branches, retries, or pauses for human input.

Pattern 1: Human-in-the-loop approval graph

The agent drafts an action. The graph pauses. A human approves or rejects. The graph resumes from where it stopped, with the human decision written into state. This is the pattern teams reach for first because it is the smallest concession that turns a “demo” agent into one ops will actually deploy.

LangGraph supports this directly via the interrupt primitive (docs.langchain.com / human-in-the-loop concepts). You build a normal graph, mark a node as an interrupt point, and the runtime pauses execution, persists state to a checkpointer, and returns control to the caller. The caller surfaces the pending action to a human. When approval comes back, the caller resumes the graph with the approved (or modified) input.

# Illustrative: human-in-the-loop approval graph
# Adapted from langchain-ai.github.io/langgraph/concepts/human_in_the_loop/

from typing import TypedDict, Literal
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import interrupt, Command

class AgentState(TypedDict):
    draft: str
    approved: bool
    final: str

def draft_node(state: AgentState) -> dict:
    # Replace with your own model call.
    return {"draft": "Send refund of $42 to user 8471"}

def approval_node(state: AgentState) -> dict:
    decision = interrupt({"draft": state["draft"]})
    return {"approved": decision["approved"], "draft": decision.get("edited", state["draft"])}

def execute_node(state: AgentState) -> dict:
    if not state["approved"]:
        return {"final": "rejected"}
    # Real side-effect goes here (refund API, email send, etc.).
    return {"final": f"executed: {state['draft']}"}

graph = (
    StateGraph(AgentState)
    .add_node("draft", draft_node)
    .add_node("approval", approval_node)
    .add_node("execute", execute_node)
    .add_edge(START, "draft")
    .add_edge("draft", "approval")
    .add_edge("approval", "execute")
    .add_edge("execute", END)
    .compile(checkpointer=MemorySaver())
)

Resuming after a human decision is one call:

# Illustrative: resume after the interrupt
# Adapted from langgraph human-in-the-loop docs.

config = {"configurable": {"thread_id": "user-8471-refund"}}
graph.invoke({"draft": "", "approved": False, "final": ""}, config=config)
# ... agent pauses at approval_node; surface state to a reviewer ...
graph.invoke(Command(resume={"approved": True}), config=config)

A LangSmith trace (LangSmith tracing for LangGraph) shows the run as three discrete spans: draft, approval (with status interrupted), and execute, each with input and output payloads. That trace is what makes this pattern operable at all. Without it, “the agent is paused” is a black box.

# Illustrative LangSmith trace shape (text representation of the UI)
# Source: docs.smith.langchain.com/observability/how_to_guides/trace_with_langgraph

run: refund-flow / thread-id user-8471-refund
  span draft           ok       180 ms   input {} -> output {draft: "Send refund of $42 ..."}
  span approval        paused     -      interrupt payload {draft: "Send refund of $42 ..."}
  -- human approves via reviewer UI --
  span approval        ok        2 ms   resume {approved: true} -> output {approved: true, draft: ...}
  span execute         ok       240 ms   input {approved: true, draft: ...} -> output {final: "executed: ..."}

Failure modes to plan for. Stale interrupt state: the human takes 48 hours to respond, the model that would have continued has shipped a new revision, prompts drift. Mitigate by pinning model + prompt version in state. Double-resume: two reviewers click approve at the same time and the side-effecting node replays twice. Mitigate with idempotency keys on the executor and a state field for “already executed at.” Lost thread id: the caller does not persist the thread_id and cannot resume; treat the thread id as durable data, not session data.

Pattern 2: Map-reduce parallel sub-agents

You want one agent to fan out to many specialised sub-agents and a synthesiser to roll up their answers. Research-style workloads are the obvious case: “given this brief, ask one sub-agent per source, then summarise.” Code-review is another: one sub-agent per file, one synthesiser for the PR comment.

LangGraph ships the Send API and subgraph composition for this. Send lets a node return a list of work items, and the runtime executes them in parallel before continuing.

# Illustrative: Send + Subgraph map-reduce
# Adapted from langchain-ai.github.io/langgraph/how-tos/map-reduce/

from typing import TypedDict, Annotated
from operator import add
from langgraph.graph import StateGraph, START, END
from langgraph.types import Send

class State(TypedDict):
    topic: str
    sources: list[str]
    findings: Annotated[list[str], add]
    summary: str

def fanout(state: State) -> list[Send]:
    return [Send("research_one", {"topic": state["topic"], "source": s}) for s in state["sources"]]

def research_one(state: dict) -> dict:
    # Replace with your own model + retrieval call.
    return {"findings": [f"finding from {state['source']} on {state['topic']}"]}

def synthesise(state: State) -> dict:
    return {"summary": " | ".join(state["findings"])}

graph = (
    StateGraph(State)
    .add_node("research_one", research_one)
    .add_node("synthesise", synthesise)
    .add_conditional_edges(START, fanout, ["research_one"])
    .add_edge("research_one", "synthesise")
    .add_edge("synthesise", END)
    .compile()
)

The Annotated[list[str], add] reducer is doing the load-bearing work. It tells the runtime how to merge state updates from parallel branches. Without a reducer, parallel writes to the same key clobber each other and you lose findings silently.

Failure modes to plan for. Token bloat: 30 sub-agents each return a 4k-token finding, the synthesiser hits the context window and truncates. Cap per-branch output length and use a hierarchical reducer. Partial failure: one of 30 branches errors and the whole graph fails closed by default. Wrap the branch node with a try/except and emit a failed: True finding so the synthesiser can note it. Cost surprise: parallel fan-out is the easiest way to 10x a daily LLM bill. Set a per-graph token budget and surface it in tracing.

# AI-generated: cost-projection helper for a Send fan-out
# Mark per HIL-386 §6.5: writer-original, model-assisted, math reviewed.

def project_fanout_cost(num_branches: int, in_tok: int, out_tok: int,
                         in_per_m: float, out_per_m: float) -> dict:
    branch = (in_tok / 1_000_000) * in_per_m + (out_tok / 1_000_000) * out_per_m
    total = branch * num_branches
    return {"per_branch_usd": round(branch, 4),
            "fanout_total_usd": round(total, 4),
            "branches": num_branches}

# Sonnet 4.7 at $3/M in, $15/M out; 4k in, 1k out per branch; 30 branches:
# project_fanout_cost(30, 4000, 1000, 3.0, 15.0)
# -> {'per_branch_usd': 0.027, 'fanout_total_usd': 0.81, 'branches': 30}

The number above is the kind of estimate worth pinning to the synthesiser’s run config so a human sees it before the fan-out fires.

Pattern 3: Long-running checkpointed task graph

This is the highest-value pattern in the article and the one most teams skip because the demo without it looks fine. The pattern is straightforward: a graph that may run for an hour or more, persisting state after every node, so a process restart, a deploy, or a transient failure does not lose the work that already happened.

LangGraph supports this via checkpointers. The vendor ships MemorySaver, SqliteSaver, and PostgresSaver; production uses Postgres for the obvious reasons (durability, multi-process readers, observability via SQL).

# Illustrative: PostgresSaver checkpointed graph that survives restart
# Adapted from langchain-ai.github.io/langgraph/how-tos/persistence/

from typing import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.postgres import PostgresSaver

class BatchState(TypedDict):
    cursor: int
    total: int
    processed: list[int]

def step(state: BatchState) -> dict:
    next_idx = state["cursor"]
    # Replace with your own per-item work; this can be slow and idempotent.
    return {"cursor": next_idx + 1, "processed": state["processed"] + [next_idx]}

def is_done(state: BatchState) -> str:
    return END if state["cursor"] >= state["total"] else "step"

POSTGRES_URL = "postgresql://user:pass@localhost:5432/agents"

with PostgresSaver.from_conn_string(POSTGRES_URL) as saver:
    saver.setup()  # one-time table creation; idempotent
    graph = (
        StateGraph(BatchState)
        .add_node("step", step)
        .add_edge(START, "step")
        .add_conditional_edges("step", is_done, {"step": "step", END: END})
        .compile(checkpointer=saver)
    )
    config = {"configurable": {"thread_id": "batch-2026-05-06"}}
    graph.invoke({"cursor": 0, "total": 30, "processed": []}, config=config)
    # If this process is killed mid-run, re-invoking with the same thread_id
    # resumes from the last checkpointed cursor, not from zero.

The Option-2 fallback for teams that do not want a Postgres dependency is SqliteSaver. The API is identical; the durability story is weaker (single-writer, file-on-disk) but for a single-process worker it is a defensible default.

Failure modes to plan for. Schema drift: you change the BatchState shape and old checkpoints no longer deserialise. Treat state as a versioned schema; gate older versions behind a migration step or a “drain old, then deploy new” rollout. Replay-divergence: a node has a non-deterministic side effect (a random id, a datetime.now() call) and replay produces different state than the original run. Make node functions deterministic relative to state, or persist the non-deterministic value into state on first write. Checkpoint thrash: a graph with thousands of nodes per run writes thousands of rows; index thread_id and prune completed threads.

# AI-generated: edge-case prompt set for Pattern 1 reviewer testing
# Mark per HIL-386 §6.5: writer-original, intended as a manual QA seed.

1. Reviewer approves a draft that contains a typo. Expected: agent executes the typoed draft as-is, because approval is binary unless the reviewer also passes `edited`.
2. Reviewer rejects with no comment. Expected: state.approved == false, execute_node returns "rejected", no side effect fires.
3. Reviewer edits the draft and approves. Expected: state.draft is replaced before execute_node runs.
4. Reviewer takes 72 hours; the underlying model has been bumped from Sonnet 4.6 to 4.7. Expected: prompt + model versions pinned in state, no behaviour drift.
5. Two reviewers approve concurrently. Expected: idempotency key in execute_node prevents double side-effect.

A TypeScript equivalent of the same shape exists in @langchain/langgraph, which mirrors the Python API closely enough that porting reads more like translation than rewrite.

When to reach for LangGraph vs CrewAI vs raw LangChain

LangGraph is the right pick when control flow matters: branching, loops, interrupts, persistence, replay. The cost is a steeper concept curve (state, nodes, edges, reducers, checkpointers) and a heavier dependency.

CrewAI (Apache-2.0) is the right pick when the work is naturally a small group of specialised role-agents talking to each other and a manager-agent coordinating. Less to wire, less to learn. The trade-off is that you give up explicit control of the routing graph; the framework decides who talks to whom based on role and goal.

Raw LangChain is the right pick when the work fits in a single chain or a small RunnablePassthrough composition and there is no persistence requirement. Anything more complex tends to grow into a hand-rolled state machine, at which point you have re-implemented LangGraph badly.

Reach for LangGraph when the agent will live longer than one HTTP request. Below that threshold the simpler tools win on operability.

FAQ

Do we need LangSmith to run LangGraph in production? Strictly no. LangGraph runs without any tracing. Practically, LangSmith pays for itself the first time an agent fails in a way that is invisible from logs alone, which on our experience is week one. The integration is one env var and one decorator (LangSmith tracing docs).

Is LangGraph the same thing as LangFlow? No. LangFlow is a visual builder; LangGraph is a runtime. LangFlow can target LangGraph as an execution backend, but you can run LangGraph from Python or TypeScript directly with no LangFlow involved.

What does the license actually permit? LangGraph is MIT (repo LICENSE). Commercial use, modification, distribution, and private use are all permitted. The only obligation is to preserve the copyright notice and license text in copies. There is no commercial-use restriction, no AGPL contagion, and no separate enterprise tier needed for the runtime itself.

How fast is LangGraph moving? The LangChain releases page ships roughly weekly minor updates to the core ecosystem; LangGraph follows a similar cadence on its own repo. Pin a version in production and update on a deliberate schedule rather than tracking main.

What is the smallest production-shaped agent we would write with LangGraph? Pattern 1 (human-in-the-loop) with a MemorySaver checkpointer in tests and PostgresSaver in production. Three nodes, one interrupt, one approve-execute path. The other two patterns are extensions of that skeleton.

Verdict

LangGraph is not the fastest way to get an agent demo running. It is the fastest way to get an agent that survives a deploy, a reviewer pause, or a partial failure. For one-shot completions stay with raw LangChain. For role-agent crews try CrewAI first. For anything that needs to pause, branch, retry, or resume, the state-graph model is what you want, and the three patterns above are where we would start.

For a broader framework comparison, see our companion piece on choosing a multi-agent framework in 2026. For the visual-builder side of the same ecosystem, see Dify vs LangFlow vs Flowise.

Try LangSmith for tracing the patterns above; the free tier covers the volume of a single-developer sandbox.