Claude Opus 4.8 lands with sharper code-flaw flagging and effort control

Anthropic shipped Claude Opus 4.8 on May 28, 2026 (per Anthropic's announcement). The model id is claude-opus-4-8 and it is available on day one through the Claude API (per Anthropic).

The headline change for coding agents

Anthropic's lead claim, and the one that matters most to developer-tool builders: Opus 4.8 is around four times less likely than its predecessor to let code flaws pass unremarked (per Anthropic). The model is also tuned to flag uncertainties more readily and to avoid unsupported claims in agentic loops (per Anthropic).

For autonomous workflows, Anthropic cites four benchmark headlines. Opus 4.8 is the only model to complete every case end-to-end on the Super-Agent benchmark (per Anthropic). It is the first model to break ten percent on the Legal Agent Benchmark all-pass standard (per Anthropic). It scores 84 percent on Online-Mind2Web (per Anthropic). And it exceeded every prior Opus model at every effort level on CursorBench (per Anthropic).

What is new in the product surfaces

Three rollouts ship alongside the model.

Dynamic workflows comes to Claude Code Enterprise, Team, and Max users. The agent picks its own plan instead of running a fixed sequence (per Anthropic).

Effort control lands for claude.ai and Cowork users. It is the knob that lets a user trade reasoning depth for latency and cost on a per-request basis (per Anthropic).

Messages API system entries. The Messages API now accepts system entries inside the messages array, which simplifies long agent loops where the system prompt evolves over multi-step tool calls (per Anthropic).

What stayed the same

Pricing did not change with the version bump. Opus 4.8 remains at five dollars per million input tokens and twenty-five dollars per million output tokens for regular mode, and ten dollars per million input tokens and fifty dollars per million output tokens for fast mode (per Anthropic). The Anthropic announcement does not call out a new context window size or a deprecation of older Opus releases (per Anthropic).

What this means for Pondero readers building on Claude

If you ship code review or agentic coding flows, the four-times-fewer-unflagged-flaws claim is the upgrade to actually evaluate against your own eval set, not just to trust on the announcement page. Run your hardest test cases on both Opus 4.7 and Opus 4.8 and check whether the flagged-flaw rate matches Anthropic's number on your domain.

For builders running Claude Code in production, dynamic workflows is the structural change to watch. A model that picks its own plan is harder to test and harder to interrupt cleanly than one that runs a fixed sequence, so the rollout to Enterprise, Team, and Max users will reveal whether the autonomy is worth the loss of determinism (per Anthropic).

For everyone else paying for Claude in claude.ai or via Cowork, effort control gives a knob that has been API-only until now. Lower the effort on routine prompts to save spend, and raise it on the prompts where you actually need depth (per Anthropic).