Context Compaction

As conversations approach the context window limit, Claude Code employs a multi-layered compaction system to summarise, prune, and reconstruct conversation history. Three levels of compaction work together -- from silent background garbage collection to full conversation summarisation.

Three Levels of Compaction

Level 1: Microcompaction Continuous, non-disruptive. Runs before every API call. Deletes old tool results via cache_edits or time-based pruning. Frequency: every turn | Impact: minimal | User-visible: no threshold exceeded Level 2: Auto-Compaction Threshold-based. Triggers when context usage hits ~83% (200K) or ~97% (1M). Full summarisation with 9-section output. Frequency: as needed | Impact: significant | User-visible: yes user-triggered Level 3: Manual /compact User-triggered with optional custom instructions. /compact <instructions> lets you control what to preserve. Frequency: on demand | Impact: significant | User-visible: yes 1 2 3 Increasing impact and user visibility

Level 1: Microcompaction

Microcompaction runs silently before every API call. It targets old tool results (typically large file contents and command outputs) that are unlikely to be needed again. There are two variants:

Cached Microcompaction

Uses the API's cache_edits mechanism to delete old tool result blocks. This is the preferred variant because it preserves the prompt cache -- the API knows which parts were removed and can maintain its cached representation of the conversation prefix.

Cached Microcompaction
// Iteratively removes the oldest tool_result blocks // until context fits within the target window while (contextTokens > targetWindow) { const oldest = findOldestToolResult(messages); removeViaCacheEdit(oldest); // Preserves prompt cache }

Time-Based Microcompaction

When a session has been idle for more than 60 minutes, the prompt cache has already expired. In this case, the system uses a more aggressive strategy: it clears all old tool results without worrying about cache preservation, since the cache needs to be rebuilt anyway.

Why This Matters

If you step away for an hour and come back, your oldest tool results have been silently removed. This is why Claude sometimes says "let me re-read that file" after breaks -- the earlier read result was garbage-collected.


Level 2: Auto-Compaction

When microcompaction is not enough and context usage exceeds the auto-compact threshold, a full compaction is triggered automatically. The threshold is calculated using this formula:

Auto-Compact Threshold Calculation
const effectiveContextWindow = modelContextWindow - 20000; const autoCompactThreshold = effectiveContextWindow - 13000; // For a 200K context model: // effective = 200,000 - 20,000 = 180,000 // threshold = 180,000 - 13,000 = 167,000 (~83%) // For a 1M context model: // effective = 1,000,000 - 20,000 = 980,000 // threshold = 980,000 - 13,000 = 967,000 (~97%)
Model Context Window Effective Window Auto-Compact Threshold Percentage
200,000 tokens 180,000 167,000 ~83%
1,000,000 tokens 980,000 967,000 ~97%

shouldAutoCompact() Predicate

Before auto-compaction runs, the shouldAutoCompact() predicate checks several conditions. If any of these are true, auto-compaction is skipped:

Disabled Conditions
// Auto-compaction is DISABLED when: // 1. Environment variable override DISABLE_AUTO_COMPACT=1 DISABLE_COMPACT=1 // 2. The trigger source is blacklisted const BLACKLISTED_SOURCES = [ "session_memory", // Don't compact during memory operations "compact", // Don't re-compact during compaction "marble_origami" // Internal test harness ]; // 3. Circuit breaker: 3 consecutive failures if (consecutiveFailures >= 3) { // Stop retrying, require manual /compact return false; }

Circuit Breaker After 3 Failures

If auto-compaction fails three times in a row, the system gives up and stops retrying. Running /compact manually resets the circuit breaker. If compaction seems stuck, this is usually the cause.

Medium Impact

Full Compaction Algorithm

When compaction runs (either auto or manual), it executes a three-phase process:

Phase 1: Pre-Compaction

  • Run pre-compact hooks: Any registered hooks fire before compaction begins
  • Strip images: All image content blocks are removed from the conversation (they are expensive tokens and cannot be summarised)
  • Snapshot state: Current plan mode, active files, and pending tasks are captured

Phase 2: Summarisation

The entire conversation is sent to Claude with a summarisation prompt that requests a structured 9-section summary:

The 9 Summarisation Sections
1. Primary Request What the user originally asked for 2. Key Technical Concepts Architecture decisions, patterns, and technical context 3. Files and Code Sections Which files were read/edited and their current state 4. Errors and Fixes Errors encountered and how they were resolved 5. Problem Solving Efforts Approaches tried, what worked and what did not 6. All User Messages Every user message preserved verbatim (critical for intent) 7. Pending Tasks Work that was agreed upon but not yet completed 8. Current Work What was being actively worked on when compaction triggered 9. Optional Next Step Suggested next action if the conversation was interrupted
Section 6 is Critical

"All User Messages" preserves every user message verbatim. This is deliberate -- user messages are the ground truth for intent, and paraphrasing them risks losing nuance. This section often dominates the summary length.

Phase 3: Prompt-Too-Long Recovery

If the summarised conversation still exceeds the context limit (rare, but possible with very long summaries), the system enters a recovery loop:

  1. Drop oldest conversation rounds: Remove the earliest user+assistant turn pairs
  2. Retry summarisation: Send the truncated conversation for re-summarisation
  3. Maximum 3 retries: If the summary still does not fit after 3 attempts, the system gives up and starts fresh with just the summary

Post-Compaction Reconstruction

After summarisation, the system reconstructs the working environment so Claude can continue seamlessly. This reconstruction phase is surprisingly complex:

Step Action Budget
1. Clear cache Invalidate the prompt cache (the conversation structure changed) --
2. Restore files Re-read up to 5 recently accessed files into context 50K tokens
3. Restore skills Re-inject any loaded skill definitions 25K tokens
4. Tool deltas Re-apply any tool configuration changes from the session --
5. MCP re-injection Re-register MCP server tool definitions --
6. Plan mode Restore plan mode state if it was active --
File Restoration Budget

Only the 5 most recently accessed files are restored, with a hard cap of 50K tokens. If you were working across many files, some will be lost after compaction. Claude will need to re-read them when it encounters references to files it no longer has in context.


What's Preserved vs What's Lost

Data After Compaction Notes
User messages Preserved Verbatim in Section 6 of summary
Primary request / intent Preserved Section 1 of summary
Pending tasks Preserved Section 7 of summary
Recent files (up to 5) Restored Re-read within 50K budget
Skills Restored Re-injected within 25K budget
Plan mode state Preserved Explicitly restored
Tool results / file contents Lost Only summarised, not preserved verbatim
Image content Stripped Removed in pre-compaction phase
Exact error messages Summarised May lose exact stack traces
Prompt cache Invalidated Must be rebuilt from scratch
Older files (beyond top 5) Lost Need to be re-read manually

Partial Compaction

The system supports partial compaction with two directions, each with different trade-offs:

Direction: "from" (Compact from a point)

Keeps everything before the compaction point and summarises everything after. This preserves the prompt cache because the conversation prefix is unchanged.

Partial Compaction - "from"
// Direction: "from" // [KEPT: early conversation] [SUMMARISED: recent conversation] // ^cache preserved^ ^compressed into summary^ // Best when: you want to preserve cache and the early context // is more important than the recent context

Direction: "up_to" (Compact up to a point)

Summarises everything before the compaction point and keeps everything after. This breaks the prompt cache because the conversation prefix is replaced with a summary.

Partial Compaction - "up_to"
// Direction: "up_to" // [SUMMARISED: early conversation] [KEPT: recent conversation] // ^compressed into summary^ ^preserved verbatim^ // Best when: recent context is more important and you don't // mind the cache rebuild cost
Which Direction to Use

The "from" direction is the default for auto-compaction because cache preservation reduces API costs. The "up_to" direction is better when you have just started a new task and the old context is no longer relevant -- it keeps recent work intact at the cost of rebuilding the cache.


Session Memory Compaction

As a faster alternative to full compaction, session memory compaction only compacts the session memory portion of the conversation. This is useful when memory entries are consuming disproportionate context but the actual conversation is still within limits.

Session memory compaction is significantly cheaper because it processes a much smaller input (just the memory entries, not the entire conversation) and does not require post-compaction reconstruction.


Token Limits & Thresholds

Constant Value Purpose
effectiveContextWindow modelContext - 20K Usable context after reserving space for output
autoCompactThreshold effective - 13K Token count that triggers auto-compaction
File restoration budget 50,000 tokens Maximum tokens for restoring recent files
File restoration count 5 files Maximum number of files restored post-compaction
Skill restoration budget 25,000 tokens Maximum tokens for re-injecting skill definitions
Time-based idle threshold 60 minutes Idle time before time-based microcompaction
Circuit breaker limit 3 failures Consecutive failures before auto-compact gives up
Prompt-too-long retries 3 attempts Maximum retries for over-length summaries

Warning System

The context warning system provides user-facing indicators as the context window fills up:

Warning Thresholds
const percentLeft = 1 - (currentTokens / effectiveContextWindow); // Warning levels: // percentLeft > 0.20 --> No warning (green) // percentLeft > 0.10 --> Moderate warning (yellow) // percentLeft > 0.05 --> High warning (orange) // percentLeft <= 0.05 --> Critical warning (red) function isAboveWarningThreshold(currentTokens, contextWindow) { const effective = contextWindow - 20000; const percentUsed = currentTokens / effective; return percentUsed > 0.80; }

The warning indicator appears in the Claude Code UI status bar, giving users a visual cue of how much context remains before compaction triggers.


Environment Variable Configuration

Compaction Environment Variables
# Override the auto-compact threshold (absolute token count) CLAUDE_CODE_AUTO_COMPACT_WINDOW=150000 # Trigger compaction at a specific percentage of context used CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=85 # Disable auto-compaction (keep manual /compact only) DISABLE_AUTO_COMPACT=1 # Disable ALL compaction (auto + manual) DISABLE_COMPACT=1 # Disable 1M context window (forces 200K model behaviour) CLAUDE_CODE_DISABLE_1M_CONTEXT=1 # Cap maximum context tokens CLAUDE_CODE_MAX_CONTEXT_TOKENS=500000
Practical Advice

Setting CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=85 is a good default for 200K context sessions -- it gives compaction enough room to produce a summary without hitting the hard limit. For 1M sessions, the default threshold (~97%) works well because there is ample headroom.