GitHub

Hacks & Optimisations

Practical, actionable tricks extracted from Claude Code's source. These are undocumented environment variables, hidden commands, and configuration patterns that let you control behaviour, cut costs, and work more efficiently.

Environment Variable Reference

All undocumented environment variables extracted from the source. Click any variable to jump to its detailed explanation.

Variable	Default	Category	Description	Impact
`CLAUDE_CODE_SIMPLE`	`unset`	Prompt	Strips system prompt to minimal version	High
`CLAUDE_CODE_AUTO_COMPACT_WINDOW`	`ctx - 33K`	Compaction	Override auto-compact threshold (tokens)	Med
`CLAUDE_AUTOCOMPACT_PCT_OVERRIDE`	`unset`	Compaction	Trigger compaction at % of context used	Med
`DISABLE_AUTO_COMPACT`	`unset`	Compaction	Disable auto-compaction, keep manual	Med
`DISABLE_COMPACT`	`unset`	Compaction	Disable all compaction entirely	Low
`CLAUDE_CODE_DISABLE_1M_CONTEXT`	`unset`	Context	Force 200K window even on 1M models	High
`CLAUDE_CODE_MAX_CONTEXT_TOKENS`	`model max`	Context	Hard-cap maximum context tokens	Med
`CLAUDE_CODE_DISABLE_THINKING`	`unset`	Thinking	Disable extended thinking	Med
`DISABLE_INTERLEAVED_THINKING`	`unset`	Thinking	Disable reasoning between tool calls	Low
`DISABLE_BACKGROUND_TASKS`	`unset`	Background	Disable all background processes	Med
`CLAUDE_CODE_DISABLE_AUTO_MEMORY`	`unset`	Background	Disable auto-memory extraction only	Low
`CLAUDE_CODE_COORDINATOR_MODE`	`unset`	Mode	Multi-agent orchestrator mode	High
`CLAUDE_CODE_FORK_SUBAGENT`	`unset`	Mode	Fork full context into background agent	Med

Usage

Set these as environment variables before launching Claude Code. You can either export them for the whole shell session or prefix them inline for a single run (e.g., CLAUDE_CODE_SIMPLE=1 claude).

Detailed Variable Reference

CLAUDE_CODE_SIMPLE

Shell

export CLAUDE_CODE_SIMPLE=1

What it does: Strips the system prompt down to a minimal version. The full system prompt is massive (15+ sections covering identity, security guardrails, tool usage rules, task execution guidelines, style rules, and more). This flag bypasses nearly all of it, reducing input tokens significantly and making each API call faster and cheaper.

When to use: Scripted/automated pipelines where you send structured prompts and don't need the interactive hand-holding. Also useful for batch operations where you're driving Claude with precise instructions and want maximum throughput.

Risk

Not recommended for interactive use. Without its full instruction set, Claude loses all guardrails, style guidance, and tool usage tips. It may use tools incorrectly (e.g., running grep via Bash instead of using the Grep tool), skip safety checks on destructive operations, produce inconsistent output styles, and forget conventions like reading files before editing them.

CLAUDE_CODE_AUTO_COMPACT_WINDOW

Shell

export CLAUDE_CODE_AUTO_COMPACT_WINDOW=150000

What it does: Overrides the effective context window size used to calculate when auto-compaction triggers. The default formula is modelContextWindow - 33K. Setting this lower (e.g., 150000) forces compaction to trigger earlier, keeping per-turn token costs down.

When to use: Long-running sessions where you want to stay well under the context limit to maintain fast response times and avoid hitting the blocking limit.

Caveat

Setting this too low causes frequent compactions, which means you lose conversation detail more often. Each compaction replaces your full history with a 9-section summary, so nuance and intermediate reasoning get lost. You may find Claude asking to re-read files it already had in context.

CLAUDE_AUTOCOMPACT_PCT_OVERRIDE

Shell

export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=85

What it does: Triggers auto-compaction when context usage reaches a specific percentage of the window, rather than using the absolute token threshold. For example, 85 means compact when 85% of the context window is used.

When to use: When you want a predictable compaction point that scales automatically regardless of which model or context window size you're using. This is often simpler to reason about than an absolute token number.

DISABLE_AUTO_COMPACT

Shell

export DISABLE_AUTO_COMPACT=1

What it does: Disables the automatic compaction trigger entirely. Compaction will only happen when you explicitly run /compact. The silent microcompaction (which cleans old tool results before each API call) still runs.

When to use: When you want full control over when context is summarised -- useful for sessions where you need Claude to retain precise details of earlier work and you're willing to manage the context window manually.

Caveat

If you forget to manually compact, the conversation will eventually hit the blocking limit and Claude will be unable to respond until you run /compact. Long uncompacted contexts also cost significantly more per turn since every API call includes the full history.

DISABLE_COMPACT

Shell

export DISABLE_COMPACT=1

What it does: Disables all compaction -- both automatic and manual. The /compact command itself becomes a no-op. No summarisation, no microcompaction, nothing.

When to use: Short, scripted sessions where you know the conversation will stay well within the context window and you want zero overhead from compaction logic.

Risk

If the conversation exceeds the context window, Claude will hit the blocking limit with no way to recover other than restarting the session. Only use this when you are certain the session will be short-lived.

CLAUDE_CODE_DISABLE_1M_CONTEXT

Shell

export CLAUDE_CODE_DISABLE_1M_CONTEXT=1

What it does: Forces Claude Code to use the standard 200K context window even when the model supports the extended 1M window. This means auto-compaction triggers at ~167K tokens instead of ~967K.

When to use: When you want to cap costs. The 1M context window means each turn can include up to 5x more tokens, which directly multiplies your API costs. Using the 200K window keeps per-turn costs predictable.

Caveat

You lose the ability to hold large codebases or long conversations in context. If your work requires cross-referencing many files simultaneously, the 200K limit can force frequent compactions that lose important context.

CLAUDE_CODE_MAX_CONTEXT_TOKENS

Shell

export CLAUDE_CODE_MAX_CONTEXT_TOKENS=500000

What it does: Hard-caps the maximum context token count, overriding whatever the model's native window supports. For example, setting 500000 on a 1M model means compaction triggers at ~467K instead of ~967K.

When to use: A more flexible alternative to CLAUDE_CODE_DISABLE_1M_CONTEXT. Lets you pick an exact ceiling -- useful for budgeting or when you want some extended context but not the full 1M.

CLAUDE_CODE_DISABLE_THINKING

Shell

export CLAUDE_CODE_DISABLE_THINKING=1

What it does: Disables Claude's extended thinking (the internal reasoning step before generating a response). Without extended thinking, Claude skips the deliberation phase and responds immediately, saving output tokens and latency.

When to use: Straightforward, repetitive tasks where the extra reasoning doesn't improve quality -- batch file edits, simple code generation, or scripted pipelines.

Risk

Extended thinking is what gives Claude the ability to reason through complex multi-step problems, architectural decisions, and debugging. Disabling it noticeably degrades quality on anything non-trivial. Claude is more likely to make mistakes, miss edge cases, and produce shallow solutions.

DISABLE_INTERLEAVED_THINKING

Shell

export DISABLE_INTERLEAVED_THINKING=1

What it does: Prevents Claude from reasoning between tool calls. Normally, Claude can "think" between each tool use (e.g., reason about a file's contents before editing it). This flag forces Claude to chain tool calls without intermediate reasoning.

When to use: When you want faster tool execution chains and are willing to trade reasoning quality. Works well for predictable, linear workflows where Claude doesn't need to adapt between steps.

Caveat

Without interleaved thinking, Claude cannot course-correct between tool calls. If one tool returns unexpected results, Claude will blindly proceed with the next step rather than reconsidering its approach. This can lead to cascading errors in multi-step tasks.

DISABLE_BACKGROUND_TASKS

Shell

export DISABLE_BACKGROUND_TASKS=1

What it does: Prevents Claude Code from spawning any background processes. This includes auto-memory extraction (the forked sub-agent that analyses conversations for memories), the "dream" system (background processing during idle periods), and any other background agent spawns.

When to use: Short scripted sessions, CI/CD pipelines, or any automated workflow where background processing is pure overhead. Also useful if you're debugging and want to eliminate all non-deterministic background activity.

Caveat

You lose all automatic memory extraction. Claude won't remember anything from the session for future conversations unless you explicitly ask it to save a memory. The dream system, which pre-processes tasks during idle time, is also disabled.

CLAUDE_CODE_DISABLE_AUTO_MEMORY

Shell

export CLAUDE_CODE_DISABLE_AUTO_MEMORY=1

What it does: Specifically disables the auto-memory extraction sub-agent without affecting other background tasks. The extraction agent normally forks after every N turns to analyse the conversation for memory-worthy information (corrections, preferences, project context).

When to use: When you want the dream system and other background tasks to remain active but don't want the cost of running the memory extraction agent. Useful if you manage your memories manually or don't use the memory system at all.

CLAUDE_CODE_COORDINATOR_MODE

Shell

export CLAUDE_CODE_COORDINATOR_MODE=1

What it does: Activates coordinator mode, where Claude acts as an orchestrator rather than doing work directly. In this mode, Claude delegates all tasks to worker sub-agents -- it researches via workers, synthesises their findings into implementation specs, then dispatches workers to implement and verify changes. The coordinator follows four phases: research, synthesis, implementation, verification.

When to use: Large, multi-file changes that benefit from parallel execution. The coordinator spawns multiple workers simultaneously, each operating in isolated worktrees, which can significantly speed up broad refactors or multi-component features.

Risk

Coordinator mode spawns multiple sub-agents, each consuming their own API calls and tokens. Costs can multiply quickly -- a single task might spawn 3-5 workers, each running their own conversation. The synthesis step is also critical: if the coordinator writes a poor spec, all workers execute on a flawed plan. Best paired with /plan mode first to review the approach before execution begins.

CLAUDE_CODE_FORK_SUBAGENT

Shell

export CLAUDE_CODE_FORK_SUBAGENT=1

What it does: Enables fork-based sub-agent spawning. When Claude uses the Agent tool without specifying a subagent_type, instead of spawning a fresh agent, it forks the entire current context into a background agent. The forked agent inherits the full conversation history, system prompt, and file state byte-for-byte, including prompt cache.

When to use: When you want a sub-agent that already understands what you've been working on -- no need to re-explain the context. The fork inherits everything, so the first API call to the forked agent is cache-warm.

Caveat

Forked agents are constrained by safety rules: they cannot spawn their own sub-agents (no recursive forking), must execute directly rather than suggesting next steps, and are expected to commit changes and report a hash. The full context duplication also means the forked agent's first turn is expensive -- it includes everything from the parent conversation.

CLAUDE.md Optimisation Tricks

Where Your Instructions Actually Land

The source reveals the exact prompt order Claude sees. Your CLAUDE.md lands at position 10 out of 15+ sections, after 9 sections of Anthropic's own instructions:

Practical Takeaway

Don't repeat what sections 1-6 already cover (e.g., "be concise", "use tools properly"). Instead, add project-specific context that Claude cannot derive from the codebase. Every byte of CLAUDE.md is injected on every turn, so 1KB of CLAUDE.md = 1KB per API call.

Steal Anthropic's Internal Prompt Tricks

Internal Anthropic builds include numeric length anchors that reduced output tokens by ~1.2%. Add similar instructions to your CLAUDE.md:

CLAUDE.md

# Output Rules
- Keep responses under 100 words unless I ask for detail.
- Keep text between tool calls under 25 words.
- No preamble, no summaries, just do the work.

Hidden Slash Commands

Command	What It Does	Status
`/rewind`	Restore code AND conversation to a previous point (full undo)	Hidden
`/compact <instructions>`	Custom compaction with priority instructions	Public
`/context`	See what's actually in your context window	Public
`/stats`	Token usage, costs, timing statistics	Public
`/cost`	Accumulated session cost breakdown	Public
`/diff`	See what files changed this session	Public
`/export`	Export conversation transcript	Public
`/stickers`	Easter egg -- opens StickerMule for Claude Code stickers	Hidden

Custom Compact Instructions

One of the most useful undocumented features. When running /compact, you can tell the compactor what to prioritise:

Examples

# Preserve specific context, discard the rest
/compact Preserve all details about the API migration. Forget the CSS discussion.

# Focus on what matters
/compact Keep the database schema decisions. Summarise everything else.

# Debug-focused compaction
/compact Focus on the test failures and their fixes.

Memory System Tricks

MEMORY.md Has a 200-Line Hard Limit

Lines after 200 in MEMORY.md are silently truncated. If your index is bloated, Claude literally cannot see entries past line 200. Prune regularly.

High Impact

Memory Staleness Caveats

Memories older than 1 day get a caveat injected into the prompt that makes Claude second-guess them. If you have stable memories that shouldn't be doubted, reset the modification time:

Terminal

touch ~/.claude/projects/<project>/memory/important_memory.md

Auto-Extraction Behaviour

Runs as a forked sub-agent after every N turns
Has a 5-turn budget max per extraction run
Sandboxed to only write to the memory directory
Skips if you manually wrote to memory in the same turn
Throttled by GrowthBook feature flags

Tip

Explicitly saying "remember this" is more reliable than hoping auto-extraction catches it, because auto-extraction has throttling and conflict detection that can cause it to skip turns.

Tool Permission Tricks

Commands That Never Prompt

These read-only commands are auto-approved in default permission mode -- no dialog, no delay:

Category	Auto-Approved Commands
Git	`git log`, `git show`, `git diff`, `git status`, `git config --get`, `git branch`, `git remote`
Files	`cat`, `head`, `tail`, `less`, `more`, `file`, `strings`, `od`, `xxd`, `hexdump`
Search	`grep`, `rg`, `find`, `fd`, `locate`, `which`, `type`
Docker	`docker ps`, `docker logs`, `docker inspect`, `docker images`
Other	`xargs` (read-only), `jq`, `pyright`

Agent Optimisation

Explore Agent Uses Haiku (Cheapest Model)

The built-in Explore agent runs on Haiku for external builds. It's read-only, skips CLAUDE.md loading, and is designed for fire-and-forget codebase searches. Use it liberally without worrying about cost.

Medium Impact

Build Custom Agents on Haiku

Create cheap, specialised agents in ~/.claude/agents/ or .claude/agents/:

.claude/agents/test-runner.md

---
name: test-runner
description: Run tests and report results
tools: [Bash, Read, Glob]
model: haiku
maxTurns: 10
---
You are a test runner agent. Run the test suite and report failures concisely.
Do not explain the failures in detail -- just list them.

Full Agent Frontmatter Options

Agent Frontmatter Reference

---
name: my-agent                    # Required: agent type identifier
description: What this agent does # Required: shown in UI and prompt
tools: [Read, Bash, Glob]        # Optional: tool allowlist (default: all)
disallowedTools: [Write]          # Optional: tool denylist
model: haiku                      # Optional: model override (haiku/sonnet/opus)
effort: high                      # Optional: effort level
maxTurns: 20                      # Optional: turn limit
memory: project                   # Optional: memory scope (user/project/local)
isolation: worktree               # Optional: run in git worktree
background: true                  # Optional: run in background by default
color: red                        # Optional: UI colour
skills: skill1, skill2            # Optional: skills to preload
initialPrompt: /run-tests         # Optional: slash command on start
permissionMode: plan              # Optional: permission mode override
mcpServers: [slack, github]       # Optional: MCP servers to connect
---

Cost Reduction Cheat Sheet

Ordered by estimated impact:

CLAUDE_CODE_SIMPLE=1 for automated/scripted pipelines -- massive prompt reduction
Use Explore agent for searches -- runs on Haiku, skips CLAUDE.md
Keep CLAUDE.md short -- it's injected on every turn
Custom compact instructions -- /compact keep only X prevents re-doing work
Create Haiku-based custom agents for repetitive tasks (linting, testing, searching)
Set CLAUDE_CODE_MAX_CONTEXT_TOKENS lower if you don't need the full window
Disable auto-memory (CLAUDE_CODE_DISABLE_AUTO_MEMORY=1) if you don't use it
Disable background tasks (DISABLE_BACKGROUND_TASKS=1) for short scripted sessions

Compaction Tricks

Microcompaction Runs Silently

Two types happen before every API call without you knowing:

Cached microcompaction: Deletes oldest tool results via API cache_edits (preserves prompt cache)
Time-based microcompaction: If idle >60 minutes, clears old tool results (cache expired anyway)

Why This Matters

If you step away for an hour and come back, your oldest tool results have been silently removed. This is why Claude sometimes says "let me re-read that file" after breaks -- the earlier read result was garbage-collected.

Compaction Circuit Breaker

After 3 consecutive auto-compact failures, the system stops retrying. If compaction seems stuck, run /compact manually to reset the circuit breaker.