Claude Code keeps losing context after compaction — here's how to fix it
Every Claude Code session starts fresh. After context compaction, your agent forgets the architecture, the conventions, the decisions you made 30 minutes ago. You re-explain the same things every session. The problem isn't Claude — it's that code knowledge lives in the context window, and the context window is temporary by design.
What actually happens during compaction
Claude Code runs on a finite context window. Opus gives you roughly 200K tokens. That sounds like a lot until you start reading files, running searches, and iterating on a real codebase — which is the entire point of using Claude Code.
Here's how the budget gets spent in a typical session:
| Activity | Tokens | Cumulative |
|---|---|---|
| System prompt + CLAUDE.md | ~4K | 4K |
| You explain the task | ~1K | 5K |
| Agent reads 3 files | ~12K | 17K |
| Agent searches, reads 4 more files | ~18K | 35K |
| First edit + test cycle | ~25K | 60K |
| Second iteration (more reads, more edits) | ~40K | 100K |
| Third iteration | ~45K | 145K |
| You ask a follow-up question | ~2K | 147K |
| Agent reads more files to answer | ~20K | 167K |
At some point between 150K and 200K, Claude Code triggers compaction. Earlier messages get summarized. The summaries are lossy. And here's the part that matters: file contents and tool outputs get dropped first.
After compaction, the agent retains a general sense of what happened — "I edited auth.ts to fix a token refresh bug" — but loses the specifics. Which files did it read? What was the exact function signature? What was in that config file that constrained the solution? What other approach did it consider and reject, and why?
All of that is gone. The agent doesn't know it's gone. It continues working with a vague summary where detailed knowledge used to be.
The five things your agent forgets
From nine months of building with Claude Code on a real codebase, these are the categories of knowledge that consistently get lost after compaction:
- Which files matter. In a repo with 1,000+ files, maybe 40 of them are structurally important — the core modules, the main entry points, the shared types. After compaction, the agent has no way to distinguish
OrderProcessor.tsfromOrderProcessor.test.fixtures.ts. It treats the codebase as flat. - What was decided. "We're not using the deprecated v1 auth flow because it doesn't support refresh tokens." That decision, made 45 minutes ago after reading three files, is now a faint summary. The agent might re-discover the v1 flow and try to use it.
- What's risky to change. Some files have 30 downstream dependents. Some have zero. After compaction, the agent doesn't remember which is which. It edits a core utility with the same confidence it edits a leaf component.
- What it already tried. The agent explored an approach, hit a wall, backed off. After compaction, it may explore the same dead end again. You watch it re-discover the same constraint you both already worked through.
- The conventions. Error handling patterns, naming conventions, import ordering, test structure. These aren't in CLAUDE.md (they're too detailed and numerous). The agent learned them by reading code in the first half of the session. Now they're gone.
Why "just read the file again" doesn't work
The obvious fix is: if the agent lost context on a file, it can just read it again. This is what happens in practice, and it's worse than it sounds.
The first problem is that the agent doesn't remember which files to read. It knows vaguely that it was working on "auth," but was the implementation in src/auth/token.ts or src/services/auth-service.ts or lib/session/refresh.ts? In a large codebase, there might be a dozen plausible files. The agent picks one. Often it picks wrong.
The second problem is that each re-read burns context window budget. You're spending tokens to re-acquire knowledge the agent already had. This accelerates the next compaction, which triggers more re-reads, which accelerates the next compaction. It's a degenerative loop.
The third problem is the agent reads indiscriminately. Without memory of what it learned before, it can't be selective. It reads entire files when it only needed one function. It reads test fixtures thinking they're implementation. It reads generated code — migration files, lock files, build artifacts — that a human would skip instinctively.
The fourth problem is the subtle one: re-reading isn't re-understanding. The first time the agent read those files, it was building a mental model. It saw how auth-service.ts calls token.ts which calls session-store.ts. That structural understanding — the dependency chain, the data flow — came from reading multiple files in sequence. Re-reading one file doesn't reconstruct the model. The agent gets facts without structure.
The structural fix: intelligence that lives outside the context window
The context window is temporary storage. Compaction is a garbage collector you can't control. Any knowledge that only exists in the context window has a half-life measured in minutes during an active session.
The fix is to stop storing code intelligence in the context window entirely. Store it somewhere persistent — somewhere the agent can query on demand without re-reading files, without burning tokens to re-discover things, without losing decisions across compactions.
This is what we built Sverklo to do. It's an MCP server that builds and maintains a local code intelligence index — a SQLite database with the structural knowledge that compaction destroys. The agent queries sverklo instead of re-reading files. Sverklo's answers come from the index, not from the context window, so they survive compaction.
Concretely, here's what sverklo keeps that the context window doesn't:
PageRank over the dependency graph
Sverklo builds a file-level dependency graph and runs PageRank over it. The result is a number for every file in the repo: how structurally central is this file? src/core/types.ts with 47 importers gets a high score. test/fixtures/mock-order.ts with zero importers gets a low score. When the agent asks "which files matter for auth?", sverklo returns results ranked by structural importance, not just textual relevance.
After compaction, the agent doesn't remember that OrderProcessor.ts is a critical hub. But sverklo does. The PageRank scores are in SQLite, not in the context window.
Hybrid search that works on the first query
Sverklo combines BM25 keyword search, vector similarity (ONNX embeddings, local, no API calls), and PageRank into a single result using Reciprocal Rank Fusion. The agent calls sverklo_search with a natural-language query and gets back the right code on the first try — not the fifth.
This matters after compaction because the agent's first instinct is to search. If the search returns the wrong files, the agent reads the wrong files, builds the wrong mental model, and writes the wrong code. Getting the first search right is the highest-leverage intervention.
Bi-temporal memory
When the agent makes a decision — "we're using the v2 auth flow because v1 doesn't support refresh" — it can call sverklo_remember to persist that decision. The memory is stored in SQLite, tagged with the current git SHA. After compaction, the agent calls sverklo_recall and the decision is still there.
The "bi-temporal" part: sverklo tracks both when the decision was made and which code state it applies to. If you check out a different branch, sverklo knows which memories are valid for that branch's code. If the code changes in a way that invalidates a decision, the memory is flagged. This is strictly more useful than a flat text file of notes because the validity is tied to the codebase, not to a timestamp.
Impact analysis
Before the agent edits a file, it can call sverklo_impact to see the blast radius — every file that imports this one, transitively. After compaction, the agent doesn't remember the dependency graph. But sverklo has it indexed. The agent can check "if I change this function signature, what breaks?" without re-reading the entire import tree.
This is the safety net that compaction removes. A fresh agent treats every file as equally safe to edit. An agent with sverklo knows that changing src/core/types.ts touches 47 downstream files and proceeds accordingly.
What this looks like in practice
Setup takes about two minutes:
npm install -g sverklo
cd your-project && sverklo init
That's it. Sverklo indexes the codebase (typically 10-30 seconds for a repo under 5,000 files), builds the dependency graph, computes PageRank, and starts the MCP server. Claude Code auto-discovers it.
After compaction, the difference is concrete:
| Without sverklo | With sverklo |
|---|---|
| Agent searches with grep, gets 200 results, reads the wrong 3 files | Agent calls sverklo_search, gets 5 ranked results with PageRank boosting, reads the right file first |
| Agent forgets the v1/v2 auth decision, explores v1 again | Agent calls sverklo_recall("auth"), gets the decision back in 1 tool call |
| Agent edits a core type with no awareness of downstream impact | Agent calls sverklo_impact("src/core/types.ts"), sees 47 dependents, adjusts approach |
| Agent re-reads 6 files to rebuild context (~12K tokens burned) | Agent calls sverklo_search + sverklo_recall (~800 tokens for both responses) |
The last row is the one that compounds. Fewer tokens spent on re-acquiring context means more tokens available for actual work, which means compaction happens later, which means fewer re-acquisitions. The loop inverts from degenerative to virtuous.
The dogfood proof
We use sverklo to develop sverklo. Across a 3-session dogfood where sverklo was reviewing its own codebase, the persistent memory and search caught 4 bugs that fresh sessions would have missed — cases where the agent needed to remember a constraint from session 1 to recognize the bug in session 3.
One example: the agent discovered in session 1 that a particular file-watcher debounce was set to 100ms, which caused duplicate index updates on fast saves. It remembered this via sverklo_remember. In session 3, while working on a completely different feature, the agent called sverklo_recall before editing the watcher code, got the debounce constraint back, and avoided re-introducing the issue. Without persistent memory, session 3's agent would have had zero context on why that debounce existed.
This is the pattern persistent intelligence enables: knowledge that accumulates across sessions instead of resetting every time the context window fills up.
What sverklo doesn't fix
I want to be clear about the limits.
Compaction is a real LLM limitation, not a bug. Context windows are finite. Summarization is lossy. This is the architecture. Sverklo doesn't make the context window bigger or prevent compaction from happening. It moves the most valuable knowledge out of the blast radius.
You still need CLAUDE.md. Sverklo handles dynamic knowledge — decisions, search results, structural analysis. Static instructions ("always use pnpm", "tests go in __tests__/", "we use Zod for validation") still belong in CLAUDE.md where they're loaded into every session's system prompt. Sverklo and CLAUDE.md are complementary, not competing.
Very small codebases don't need this. If your entire project is 20 files and fits comfortably in the context window without compaction, you won't see much benefit. The pain scales with codebase size. If you're working on a 50-file project and compaction isn't biting you, save your time.
The agent still has to query sverklo. Sverklo doesn't inject knowledge into the context window proactively. The agent has to call sverklo_search or sverklo_recall. In practice, Claude Code does this naturally when sverklo is configured as an MCP server — the tools show up alongside the built-in tools, and the agent uses them when they're relevant. But it's pull, not push.
Why this matters now
Six months ago, most people using Claude Code were working on small projects or doing one-off tasks. Context loss wasn't a top-5 pain point because sessions were short and codebases were small.
That's changed. Teams are using Claude Code on production codebases with hundreds of thousands of lines. Sessions are getting longer as tasks get more complex. Multi-step refactors that span an hour. Architecture decisions that build on each other. Code reviews where the agent needs to understand the full picture.
In this world, compaction isn't an edge case. It's the normal state of any serious session. And without persistent intelligence, every compaction is a partial reset — the agent gets dumber as the session gets longer, which is the exact inverse of what you want.
The context window is a cache. Treat it like one. Keep the source of truth somewhere that doesn't evict under pressure.
Try sverklo on your codebase
npm install -g sverklo cd your-project && sverklo init
Next time Claude Code compacts, sverklo_recall still has your decisions. sverklo_search still finds the right code. sverklo_impact still knows the blast radius.