Table of Contents
Claude Code Adds /recap and 1-Hour Prompt Caching: Two Updates That Pay Off in Long Sessions
Claude Code v2.1.108 adds /recap and ENABLE_PROMPT_CACHING_1H. Neither matters if you use Claude Code as autocomplete. Both matter if you run it as a teammate on multi-hour tasks, and the second one is a cost lever, not a convenience. The argument of this post: turn on the 1-hour cache the moment a session reloads the same large files across more than five minutes of wall-clock, and ignore both features entirely below that. The default 5-minute TTL is tuned for hot loops, and most operators are not in a hot loop; they just have not noticed the mismatch.
What changed
Anthropic shipped a maintenance-grade Claude Code release that is more interesting than the version number lets on. Two features matter to anyone who runs Claude Code past a quick one-shot edit:
1. The /recap command and session recap feature. Come back to a session after lunch, a meeting, or a context overflow and you used to re-read your own scrollback to find where Claude was in a multi-step task. The new recap gives you a structured summary of what was happening when you stepped away. Configure it via /config, fire it on demand with /recap. Telemetry-disabled environments force the away-summary behavior with the CLAUDE_CODE_ENABLE_AWAY_SUMMARY env var.
2. Granular prompt caching controls via env vars. ENABLE_PROMPT_CACHING_1H opts your session into a 1-hour cache TTL on Anthropic's API, Bedrock, Vertex, and Foundry. FORCE_PROMPT_CACHING_5M does the inverse, pinning the 5-minute TTL when that is what you want. The defaults are unchanged. You just get to pick which one applies to your workflow now.
A few other items shipped in the same window. The Skill tool can invoke built-in slash commands like /init, /review, and /security-review. PowerShell support replaces the Git Bash hard requirement on Windows. And MCP tool output limits jumped to 500K characters in a separate upgrade.
Why it matters
Treat Claude Code as fancy autocomplete and none of this moves the needle. Run it as a junior teammate on a multi-hour task, which is increasingly the right way to use it, and both features pay off on day one.
/recap fixes a real coordination problem. Long sessions pile up state: files touched, decisions made, tests run, follow-ups deferred. Pause and come back, and the agent's "memory" is only as good as your willingness to scroll. A clean recap collapses the re-onboarding tax to a few seconds. It matters most for ops engineers and consultants bouncing between client codebases. You stand up an old session cold, no rereading.
The 1-hour cache is the one with money attached. The mechanism is worth being precise about, because the savings depend on it. Prompt caching bills a cache write once, then bills cache reads at a steep discount for every call that reuses the prefix while the entry is alive. The TTL is what decides whether your next call is a cheap read or a full-price re-write. With the 5-minute default, a refactor where you think for six minutes, then ask a follow-up, just paid the write penalty twice on an unchanged 30k-token prefix. ENABLE_PROMPT_CACHING_1H keeps that prefix in the read tier across the gap. AWS's Bedrock prompt-caching guidance documents up to a 90% input-token cost reduction on workloads that repeatedly resend a stable prefix. That figure is real only when the TTL outlives the gap between reuses, which is exactly the case the default TTL gets wrong for long engineering sessions.
How to use it
/recap
Inside any session, type /recap for a summary of the conversation so far. To control when it fires automatically, run /config and find the recap settings under session behavior.
Disabled telemetry but still want the away-summary feature? Set:
export CLAUDE_CODE_ENABLE_AWAY_SUMMARY=1
Resume with claude in the same directory. The recap fires before you type your first prompt.
1-hour prompt caching
One environment variable. Simplest setup, before launching claude:
export ENABLE_PROMPT_CACHING_1H=1
claude
This applies to all four backends Claude Code supports: Anthropic API key, Amazon Bedrock, Google Vertex, Microsoft Foundry. Routing through a proxy like LiteLLM? Check the proxy's caching config too. The env var only touches the Claude Code client side.
Want the opposite, a session pinned to the 5-minute TTL no matter the model defaults?
export FORCE_PROMPT_CACHING_5M=1
claude
Rule of thumb: turn on the 1-hour cache when the same large files (lockfiles, schemas, long config files) get read over and over across an extended window. Keep the default 5-minute cache for short sessions where you are moving fast and rotating contexts.
Pricing note
Prompt caching pricing follows Anthropic's published cache-write and cache-read rates. The 1-hour TTL does not change the per-token math. It changes how often you dodge the read penalty. Cost only climbs if you cache prefixes you never read again, and that is rare in real engineering work.
Related tools on Pondero
- Claude Code's
ultrareviewcommand is the other April 2026 release worth knowing about. - Claude Code vs Cursor explains how the two stack up for daily coding.
- Cursor 3.2 multitask and canvases covers the agent-runtime pivot from the other side of the market.
- GitHub Copilot review is for teams who want their AI assistant tied to the PR workflow.
This post is part of Pondero's daily coverage of AI tool updates. See all guides