GitHub

Context Compaction

As conversations approach the context window limit, Claude Code employs a multi-layered compaction system to summarise, prune, and reconstruct conversation history. Three levels of compaction work together -- from silent background garbage collection to full conversation summarisation.

Three Levels of Compaction

Level 1: Microcompaction

Microcompaction runs silently before every API call. It targets old tool results (typically large file contents and command outputs) that are unlikely to be needed again. There are two variants:

Cached Microcompaction

Uses the API's cache_edits mechanism to delete old tool result blocks. This is the preferred variant because it preserves the prompt cache -- the API knows which parts were removed and can maintain its cached representation of the conversation prefix.

Cached Microcompaction

// Iteratively removes the oldest tool_result blocks
// until context fits within the target window
while (contextTokens > targetWindow) {
  const oldest = findOldestToolResult(messages);
  removeViaCacheEdit(oldest);  // Preserves prompt cache
}

Time-Based Microcompaction

When a session has been idle for more than 60 minutes, the prompt cache has already expired. In this case, the system uses a more aggressive strategy: it clears all old tool results without worrying about cache preservation, since the cache needs to be rebuilt anyway.

Why This Matters

If you step away for an hour and come back, your oldest tool results have been silently removed. This is why Claude sometimes says "let me re-read that file" after breaks -- the earlier read result was garbage-collected.

Level 2: Auto-Compaction

When microcompaction is not enough and context usage exceeds the auto-compact threshold, a full compaction is triggered automatically. The threshold is calculated using this formula:

Auto-Compact Threshold Calculation

const effectiveContextWindow = modelContextWindow - 20000;
const autoCompactThreshold  = effectiveContextWindow - 13000;

// For a 200K context model:
//   effective = 200,000 - 20,000 = 180,000
//   threshold = 180,000 - 13,000 = 167,000 (~83%)

// For a 1M context model:
//   effective = 1,000,000 - 20,000 = 980,000
//   threshold = 980,000 - 13,000 = 967,000 (~97%)

Model Context Window	Effective Window	Auto-Compact Threshold	Percentage
200,000 tokens	180,000	167,000	~83%
1,000,000 tokens	980,000	967,000	~97%

shouldAutoCompact() Predicate

Before auto-compaction runs, the shouldAutoCompact() predicate checks several conditions. If any of these are true, auto-compaction is skipped:

Disabled Conditions

// Auto-compaction is DISABLED when:

// 1. Environment variable override
DISABLE_AUTO_COMPACT=1
DISABLE_COMPACT=1

// 2. The trigger source is blacklisted
const BLACKLISTED_SOURCES = [
  "session_memory",  // Don't compact during memory operations
  "compact",         // Don't re-compact during compaction
  "marble_origami"   // Internal test harness
];

// 3. Circuit breaker: 3 consecutive failures
if (consecutiveFailures >= 3) {
  // Stop retrying, require manual /compact
  return false;
}

Circuit Breaker After 3 Failures

If auto-compaction fails three times in a row, the system gives up and stops retrying. Running /compact manually resets the circuit breaker. If compaction seems stuck, this is usually the cause.

Medium Impact

Full Compaction Algorithm

When compaction runs (either auto or manual), it executes a three-phase process:

Phase 1: Pre-Compaction

Run pre-compact hooks: Any registered hooks fire before compaction begins
Strip images: All image content blocks are removed from the conversation (they are expensive tokens and cannot be summarised)
Snapshot state: Current plan mode, active files, and pending tasks are captured

Phase 2: Summarisation

The entire conversation is sent to Claude with a summarisation prompt that requests a structured 9-section summary:

The 9 Summarisation Sections

1. Primary Request
   What the user originally asked for

2. Key Technical Concepts
   Architecture decisions, patterns, and technical context

3. Files and Code Sections
   Which files were read/edited and their current state

4. Errors and Fixes
   Errors encountered and how they were resolved

5. Problem Solving Efforts
   Approaches tried, what worked and what did not

6. All User Messages
   Every user message preserved verbatim (critical for intent)

7. Pending Tasks
   Work that was agreed upon but not yet completed

8. Current Work
   What was being actively worked on when compaction triggered

9. Optional Next Step
   Suggested next action if the conversation was interrupted

Section 6 is Critical

"All User Messages" preserves every user message verbatim. This is deliberate -- user messages are the ground truth for intent, and paraphrasing them risks losing nuance. This section often dominates the summary length.

Phase 3: Prompt-Too-Long Recovery

If the summarised conversation still exceeds the context limit (rare, but possible with very long summaries), the system enters a recovery loop:

Drop oldest conversation rounds: Remove the earliest user+assistant turn pairs
Retry summarisation: Send the truncated conversation for re-summarisation
Maximum 3 retries: If the summary still does not fit after 3 attempts, the system gives up and starts fresh with just the summary

Post-Compaction Reconstruction

After summarisation, the system reconstructs the working environment so Claude can continue seamlessly. This reconstruction phase is surprisingly complex:

Step	Action	Budget
1. Clear cache	Invalidate the prompt cache (the conversation structure changed)	--
2. Restore files	Re-read up to 5 recently accessed files into context	50K tokens
3. Restore skills	Re-inject any loaded skill definitions	25K tokens
4. Tool deltas	Re-apply any tool configuration changes from the session	--
5. MCP re-injection	Re-register MCP server tool definitions	--
6. Plan mode	Restore plan mode state if it was active	--

File Restoration Budget

Only the 5 most recently accessed files are restored, with a hard cap of 50K tokens. If you were working across many files, some will be lost after compaction. Claude will need to re-read them when it encounters references to files it no longer has in context.

What's Preserved vs What's Lost

Data	After Compaction	Notes
User messages	Preserved	Verbatim in Section 6 of summary
Primary request / intent	Preserved	Section 1 of summary
Pending tasks	Preserved	Section 7 of summary
Recent files (up to 5)	Restored	Re-read within 50K budget
Skills	Restored	Re-injected within 25K budget
Plan mode state	Preserved	Explicitly restored
Tool results / file contents	Lost	Only summarised, not preserved verbatim
Image content	Stripped	Removed in pre-compaction phase
Exact error messages	Summarised	May lose exact stack traces
Prompt cache	Invalidated	Must be rebuilt from scratch
Older files (beyond top 5)	Lost	Need to be re-read manually

Partial Compaction

The system supports partial compaction with two directions, each with different trade-offs:

Direction: "from" (Compact from a point)

Keeps everything before the compaction point and summarises everything after. This preserves the prompt cache because the conversation prefix is unchanged.

Partial Compaction - "from"

// Direction: "from"
// [KEPT: early conversation] [SUMMARISED: recent conversation]
//          ^cache preserved^   ^compressed into summary^

// Best when: you want to preserve cache and the early context
// is more important than the recent context

Direction: "up_to" (Compact up to a point)

Summarises everything before the compaction point and keeps everything after. This breaks the prompt cache because the conversation prefix is replaced with a summary.

Partial Compaction - "up_to"

// Direction: "up_to"
// [SUMMARISED: early conversation] [KEPT: recent conversation]
//   ^compressed into summary^       ^preserved verbatim^

// Best when: recent context is more important and you don't
// mind the cache rebuild cost

Which Direction to Use

The "from" direction is the default for auto-compaction because cache preservation reduces API costs. The "up_to" direction is better when you have just started a new task and the old context is no longer relevant -- it keeps recent work intact at the cost of rebuilding the cache.

Session Memory Compaction

As a faster alternative to full compaction, session memory compaction only compacts the session memory portion of the conversation. This is useful when memory entries are consuming disproportionate context but the actual conversation is still within limits.

Session memory compaction is significantly cheaper because it processes a much smaller input (just the memory entries, not the entire conversation) and does not require post-compaction reconstruction.

Token Limits & Thresholds

Constant	Value	Purpose
`effectiveContextWindow`	modelContext - 20K	Usable context after reserving space for output
`autoCompactThreshold`	effective - 13K	Token count that triggers auto-compaction
File restoration budget	50,000 tokens	Maximum tokens for restoring recent files
File restoration count	5 files	Maximum number of files restored post-compaction
Skill restoration budget	25,000 tokens	Maximum tokens for re-injecting skill definitions
Time-based idle threshold	60 minutes	Idle time before time-based microcompaction
Circuit breaker limit	3 failures	Consecutive failures before auto-compact gives up
Prompt-too-long retries	3 attempts	Maximum retries for over-length summaries

Warning System

The context warning system provides user-facing indicators as the context window fills up:

Warning Thresholds

const percentLeft = 1 - (currentTokens / effectiveContextWindow);

// Warning levels:
// percentLeft > 0.20  -->  No warning (green)
// percentLeft > 0.10  -->  Moderate warning (yellow)
// percentLeft > 0.05  -->  High warning (orange)
// percentLeft <= 0.05 -->  Critical warning (red)

function isAboveWarningThreshold(currentTokens, contextWindow) {
  const effective = contextWindow - 20000;
  const percentUsed = currentTokens / effective;
  return percentUsed > 0.80;
}

The warning indicator appears in the Claude Code UI status bar, giving users a visual cue of how much context remains before compaction triggers.

Environment Variable Configuration

Compaction Environment Variables

# Override the auto-compact threshold (absolute token count)
CLAUDE_CODE_AUTO_COMPACT_WINDOW=150000

# Trigger compaction at a specific percentage of context used
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=85

# Disable auto-compaction (keep manual /compact only)
DISABLE_AUTO_COMPACT=1

# Disable ALL compaction (auto + manual)
DISABLE_COMPACT=1

# Disable 1M context window (forces 200K model behaviour)
CLAUDE_CODE_DISABLE_1M_CONTEXT=1

# Cap maximum context tokens
CLAUDE_CODE_MAX_CONTEXT_TOKENS=500000

Practical Advice

Setting CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=85 is a good default for 200K context sessions -- it gives compaction enough room to produce a summary without hitting the hard limit. For 1M sessions, the default threshold (~97%) works well because there is ample headroom.