Context Compaction
As conversations approach the context window limit, Claude Code employs a multi-layered compaction system to summarise, prune, and reconstruct conversation history. Three levels of compaction work together -- from silent background garbage collection to full conversation summarisation.
Three Levels of Compaction
Level 1: Microcompaction
Microcompaction runs silently before every API call. It targets old tool results (typically large file contents and command outputs) that are unlikely to be needed again. There are two variants:
Cached Microcompaction
Uses the API's cache_edits mechanism to delete old tool result blocks. This is the preferred variant because it preserves the prompt cache -- the API knows which parts were removed and can maintain its cached representation of the conversation prefix.
Time-Based Microcompaction
When a session has been idle for more than 60 minutes, the prompt cache has already expired. In this case, the system uses a more aggressive strategy: it clears all old tool results without worrying about cache preservation, since the cache needs to be rebuilt anyway.
If you step away for an hour and come back, your oldest tool results have been silently removed. This is why Claude sometimes says "let me re-read that file" after breaks -- the earlier read result was garbage-collected.
Level 2: Auto-Compaction
When microcompaction is not enough and context usage exceeds the auto-compact threshold, a full compaction is triggered automatically. The threshold is calculated using this formula:
| Model Context Window | Effective Window | Auto-Compact Threshold | Percentage |
|---|---|---|---|
| 200,000 tokens | 180,000 | 167,000 | ~83% |
| 1,000,000 tokens | 980,000 | 967,000 | ~97% |
shouldAutoCompact() Predicate
Before auto-compaction runs, the shouldAutoCompact() predicate checks several conditions. If any of these are true, auto-compaction is skipped:
Circuit Breaker After 3 Failures
If auto-compaction fails three times in a row, the system gives up and stops retrying. Running /compact manually resets the circuit breaker. If compaction seems stuck, this is usually the cause.
Full Compaction Algorithm
When compaction runs (either auto or manual), it executes a three-phase process:
Phase 1: Pre-Compaction
- Run pre-compact hooks: Any registered hooks fire before compaction begins
- Strip images: All image content blocks are removed from the conversation (they are expensive tokens and cannot be summarised)
- Snapshot state: Current plan mode, active files, and pending tasks are captured
Phase 2: Summarisation
The entire conversation is sent to Claude with a summarisation prompt that requests a structured 9-section summary:
"All User Messages" preserves every user message verbatim. This is deliberate -- user messages are the ground truth for intent, and paraphrasing them risks losing nuance. This section often dominates the summary length.
Phase 3: Prompt-Too-Long Recovery
If the summarised conversation still exceeds the context limit (rare, but possible with very long summaries), the system enters a recovery loop:
- Drop oldest conversation rounds: Remove the earliest user+assistant turn pairs
- Retry summarisation: Send the truncated conversation for re-summarisation
- Maximum 3 retries: If the summary still does not fit after 3 attempts, the system gives up and starts fresh with just the summary
Post-Compaction Reconstruction
After summarisation, the system reconstructs the working environment so Claude can continue seamlessly. This reconstruction phase is surprisingly complex:
| Step | Action | Budget |
|---|---|---|
| 1. Clear cache | Invalidate the prompt cache (the conversation structure changed) | -- |
| 2. Restore files | Re-read up to 5 recently accessed files into context | 50K tokens |
| 3. Restore skills | Re-inject any loaded skill definitions | 25K tokens |
| 4. Tool deltas | Re-apply any tool configuration changes from the session | -- |
| 5. MCP re-injection | Re-register MCP server tool definitions | -- |
| 6. Plan mode | Restore plan mode state if it was active | -- |
Only the 5 most recently accessed files are restored, with a hard cap of 50K tokens. If you were working across many files, some will be lost after compaction. Claude will need to re-read them when it encounters references to files it no longer has in context.
What's Preserved vs What's Lost
| Data | After Compaction | Notes |
|---|---|---|
| User messages | Preserved | Verbatim in Section 6 of summary |
| Primary request / intent | Preserved | Section 1 of summary |
| Pending tasks | Preserved | Section 7 of summary |
| Recent files (up to 5) | Restored | Re-read within 50K budget |
| Skills | Restored | Re-injected within 25K budget |
| Plan mode state | Preserved | Explicitly restored |
| Tool results / file contents | Lost | Only summarised, not preserved verbatim |
| Image content | Stripped | Removed in pre-compaction phase |
| Exact error messages | Summarised | May lose exact stack traces |
| Prompt cache | Invalidated | Must be rebuilt from scratch |
| Older files (beyond top 5) | Lost | Need to be re-read manually |
Partial Compaction
The system supports partial compaction with two directions, each with different trade-offs:
Direction: "from" (Compact from a point)
Keeps everything before the compaction point and summarises everything after. This preserves the prompt cache because the conversation prefix is unchanged.
Direction: "up_to" (Compact up to a point)
Summarises everything before the compaction point and keeps everything after. This breaks the prompt cache because the conversation prefix is replaced with a summary.
The "from" direction is the default for auto-compaction because cache preservation reduces API costs. The "up_to" direction is better when you have just started a new task and the old context is no longer relevant -- it keeps recent work intact at the cost of rebuilding the cache.
Session Memory Compaction
As a faster alternative to full compaction, session memory compaction only compacts the session memory portion of the conversation. This is useful when memory entries are consuming disproportionate context but the actual conversation is still within limits.
Session memory compaction is significantly cheaper because it processes a much smaller input (just the memory entries, not the entire conversation) and does not require post-compaction reconstruction.
Token Limits & Thresholds
| Constant | Value | Purpose |
|---|---|---|
effectiveContextWindow |
modelContext - 20K | Usable context after reserving space for output |
autoCompactThreshold |
effective - 13K | Token count that triggers auto-compaction |
| File restoration budget | 50,000 tokens | Maximum tokens for restoring recent files |
| File restoration count | 5 files | Maximum number of files restored post-compaction |
| Skill restoration budget | 25,000 tokens | Maximum tokens for re-injecting skill definitions |
| Time-based idle threshold | 60 minutes | Idle time before time-based microcompaction |
| Circuit breaker limit | 3 failures | Consecutive failures before auto-compact gives up |
| Prompt-too-long retries | 3 attempts | Maximum retries for over-length summaries |
Warning System
The context warning system provides user-facing indicators as the context window fills up:
The warning indicator appears in the Claude Code UI status bar, giving users a visual cue of how much context remains before compaction triggers.
Environment Variable Configuration
Setting CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=85 is a good default for 200K context sessions -- it gives compaction enough room to produce a summary without hitting the hard limit. For 1M sessions, the default threshold (~97%) works well because there is ample headroom.