Checkpoint, Validation, and Error Recovery
Automatic checkpointing
The previous page covered verification loops and validation gates as patterns. Now for the infrastructure that makes error recovery practical: automatic checkpoints, rewind capabilities, and generator-critic patterns that catch mistakes before they spread.
Claude Code creates checkpoints automatically. Every user prompt that results in a file edit triggers a checkpoint capturing the state before that edit. No explicit commands or configuration required. The checkpointing system tracks all file modifications made through Claude's editing tools Write, Edit, and NotebookEdit operations.
Checkpoints persist across sessions.
Stored in SQLite within the ~/.claude directory, they survive terminal closure and system restarts.
The default retention period is 30 days, configurable via settings.cleanupPeriodDays.
This creates a rolling safety net that covers recent work without unbounded storage growth.
What checkpoints capture
The system captures file state at edit boundaries:
- Content of files before modification
- New files before creation
- Files before deletion (via edit tools)
Each checkpoint associates with the conversation turn that triggered it.
The interface displays checkpoints as a timeline of your prompts paired with their file impacts, shown in Git-style diff notation: auth.ts +15 -3.
What checkpoints miss
Not all changes flow through Claude's editing tools. Checkpoints do not capture:
- Bash command side effects (
rm,mv,cp, shell redirects) - Manual edits made in external editors
- Changes from concurrent Claude sessions
- External operations like
git pushor database modifications
This reflects architectural boundaries. Claude observes edits through its own tools but cannot intercept arbitrary file system operations. For operations outside the checkpointing system, Git remains the authoritative history.
Tip: When directing agents to move or delete files, prefer the Edit tool's capabilities over Bash commands when possible. This keeps operations within the checkpointing system.
The rewind feature
Checkpoints become useful through rewind. Access it three ways:
# Press Escape twice for instant access
Esc + Esc
# Or use the slash command
/rewind
# Or list checkpoints first, then rewind to a specific one
/checkpoints
/rewind abc123The rewind interface presents your conversation as a timeline. Each entry shows your message and the files it changed. Select a point to restore to the state before that turn.
Three restoration modes
Rewind offers granular control over what gets restored:
| Mode | Effect | Use case |
|---|---|---|
| Conversation only | Returns to earlier conversation state; keeps current code | Agent is confused but code changes are correct |
| Code only | Reverts file changes; keeps conversation understanding | Implementation failed but agent grasps the goal |
| Both | Complete reset to checkpoint state | Everything needs to roll back |
These modes address different failure patterns. Sometimes an agent produces correct code but talks itself into confusion through excessive explanation. Conversation-only rewind clears the verbal drift while preserving working implementation.
Code going wrong while the agent's understanding remains sound is the more common case. Code-only rewind handles this well. The agent retains its grasp of your requirements, the codebase structure, and the approach only the broken implementation disappears. This often enables immediate success on the next attempt because the understanding persists.
Both-mode rewind handles cases where the entire approach failed. A misunderstood requirement might produce confident but wrong code. Full restoration returns to a clean state for a fresh attempt.
Strategic rewind versus starting over
Rewind targets surgical recovery from recent mistakes. It works well when:
- The error is localized to recent turns
- The agent's earlier understanding was sound
- Context accumulated before the error remains valuable
Rewind becomes less effective when:
- Context pollution accumulated over many turns
- The fundamental approach was flawed from the start
- Earlier context has already degraded
For systemic problems, /clear or a new session often works better than rewinding to a distant checkpoint.
Rewind is for tactical recovery; session management handles strategic resets.
Generator-critic patterns
Beyond reactive recovery, generator-critic patterns prevent errors proactively. The pattern separates generation from evaluation, forcing explicit quality checks between production and acceptance.
In single-agent form, the same agent alternates roles:
Generate initial implementation
↓
Shift to critic mode: evaluate against requirements
↓
Identify gaps or errors
↓
Generate corrections
↓
Re-evaluate
↓
(repeat until satisfactory)The role shift matters. When an agent generates code and immediately declares it complete, confirmation bias operates. Forcing explicit evaluation mode disrupts this pattern. The agent treats its own output as it would external content critically rather than defensively.
Implementing critic prompts
Structure critic evaluation explicitly:
Review the implementation above for:
1. Edge cases not handled
2. Error conditions without recovery
3. Type mismatches or null safety issues
4. Deviations from the specification
List issues found. Do not defend the implementation.The instruction to avoid defense prevents the agent from rationalizing problems. Without this constraint, agents tend to explain why apparent issues are actually acceptable.
Multi-agent critic patterns
Stronger separation uses distinct agents for generation and criticism. The generator produces code; a separate critic agent with different instructions evaluates it. The critic either approves or returns feedback for revision.
Multi-agent patterns provide:
- Role isolation: The critic never sees the generation process, only the output
- Independent judgment: No memory of why certain choices were made
- Consistent standards: The critic applies the same rubric regardless of generation complexity
The cost is additional latency and token usage. Each review cycle requires at least one more model call. For code paths where bugs would hurt, this investment pays for itself. For routine changes, the overhead may exceed the benefit.
When critic patterns deliver value
Apply generator-critic patterns when:
- Output quality significantly outweighs speed
- Errors carry meaningful consequences
- Code requires security review or compliance verification
- The domain is unfamiliar to the generator
Skip them for:
- Trivial changes with obvious correctness
- Iterative refinement where tests provide feedback
- Time-sensitive operations where latency matters
Verification loops in practice
Page 10 introduced Boris Cherny's observation that verification loops improve quality 2-3x. Here are the implementation patterns that deliver that improvement.
The test-verify-fix cycle
The tightest verification loop connects agents directly to test execution:
1. Agent writes or modifies code
2. Test suite runs automatically (via hooks or instruction)
3. Agent receives test output
4. Agent fixes failures
5. Repeat until tests passThis closed loop eliminates the "works for me" problem. The agent cannot claim completion while tests fail. Each iteration narrows the gap between implementation and specification.
Configure this loop through Claude Code hooks:
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"command": "npm test"
}
]
}
}With this configuration, every file edit triggers the test suite. The agent sees failures immediately rather than discovering them later when context has evolved.
The validation gate
Validation gates block progress until verification passes. Structure prompts to enforce this:
Implement the authentication middleware.
After implementation, run all auth tests.
Do not proceed to route implementation if any test fails.
Fix failures first.The explicit blocking instruction prevents agents from optimistically continuing. Without it, agents may note failures but proceed anyway, accumulating broken dependencies.
Type checking as verification
TypeScript and other statically-typed languages provide continuous verification:
# Run type checker after edits
tsc --noEmit
# Or in watch mode for continuous feedback
tsc --noEmit --watchType errors caught immediately cost less to fix than those discovered later. Configure type checking in hooks or instruct agents to run it after structural changes.
Browser and UI verification
For frontend work, visual verification catches what unit tests miss. Cherny describes his approach: "Claude tests every single change I land to claude.ai/code using the Claude Chrome extension. It opens a browser, tests the UI, and iterates until the code works and the UX feels good."
Automated browser testing through tools like Playwright or Cypress extends this pattern. The agent writes tests that actually interact with rendered UI, catching layout issues, interaction bugs, and visual regressions.
Error recovery strategies
When verification catches errors, recovery strategies determine what happens next.
Targeted correction
For isolated failures, targeted correction works efficiently:
Test X failed with error Y.
The failure is in function Z.
Fix only the code causing this specific failure.
Do not modify other functions.Scope constraints prevent agents from "helpfully" refactoring adjacent code while fixing bugs. Each fix addresses one problem without introducing new surface area for issues.
Rollback and retry
When fixes compound instead of resolve, rollback becomes appropriate:
The last three attempts to fix this test have introduced new failures.
Rewind to before the first fix attempt.
Approach the original failure differently.Recognizing when iteration produces negative progress requires judgment. Three or four attempts at the same fix without improvement often indicates the approach not just the implementation needs revision.
Escalation to planning
Persistent failures may indicate insufficient planning:
This implementation has required six fix iterations.
Stop implementing.
Switch to plan mode and analyze why the approach keeps failing.
Return with a revised strategy before more code changes.Stepping back to analysis mode breaks the fix-fail cycle. The agent reconsiders assumptions rather than repeatedly attempting the same flawed approach.
Combining recovery mechanisms
Error handling works best when these mechanisms layer together:
- Automatic checkpoints capture state continuously
- Verification loops catch errors immediately
- Generator-critic patterns prevent errors before commits
- Rewind capabilities recover from errors that slip through
- Session failure acceptance handles irrecoverable situations
Each layer addresses different failure modes. No single mechanism handles all cases. Automatic checkpoints cost nothing to maintain; use them as the foundation. Verification loops catch most implementation errors; configure them for every substantial codebase. Generator-critic patterns add cost but catch design errors that tests miss; apply them selectively. Rewind handles the failures that get through; know how to use it quickly when needed. Session failure acceptance handles systemic problems; recognize when fresh starts beat continued effort.
The point is not preventing all errors that is neither possible nor efficient. The point is minimizing the cost of errors by catching them early and recovering from them cheaply. A checkpoint taken automatically costs nothing until needed; a rewind takes seconds; a new session starts clean. The investment in recovery infrastructure pays off in reduced time debugging polluted contexts and compounded mistakes.