Why does Claude Code hallucinate function names?

Claude Code hallucinates function names because it's pattern-matching on what code 'usually looks like' instead of grounding in your actual symbol graph. After context compaction, the agent loses the part of the conversation where it saw your real exports and falls back to plausible-sounding inventions like getUserByEmail() when your code defines findByEmail(). The fix is to expose the symbol graph as a queryable MCP tool so the agent can verify identifiers before writing them.

Why does Claude Code lose context in large repos?

Claude Code's context window is bounded; when conversation length approaches the limit it auto-compacts older turns into a summary. Code-specific details — exact identifiers, file paths, type signatures — are the first to get summarized away because they look noisy to the compactor. After compaction the agent has the rough shape of past decisions but not the specifics, which is when hallucinations start. The structural fix is to make the symbol graph queryable on demand so the agent can re-fetch what compaction lost.

How do I stop Claude Code from inventing imports that don't exist?

Three things help: (1) install a code-intelligence MCP server like sverklo, jcodemunch-mcp, or serena that exposes a 'lookup symbol' tool — the agent can verify before writing; (2) keep your CLAUDE.md or project rules concrete with real paths instead of abstract conventions, since rules are what survives compaction; (3) cap your tool results so noisy grep output doesn't push real signal out of context. Of these, the MCP install has the largest measurable effect in our bench (37% fewer hallucinated imports).

Why does Claude Code stop finding files in my project?

Two common causes. First, the file is genuinely new since the last conversation turn that listed your repo structure; the agent's mental model is stale. Second, the file is in a directory that was implicitly excluded by the agent's earlier search heuristic (test/, dist/, vendor/) and never re-considered. An MCP-backed live index sidesteps both: every search query hits the current state of disk, and the index doesn't apply silent path filters.

2026-04-09 claude code context window developer tools

Engineering · Sverklo

Why Claude Code hallucinates function names that don't exist in your codebase — and how to fix it

Claude Code writes getUserByEmail() when your code uses findByEmail(). It invents imports that point at packages you don't depend on. It forgets the architectural decision you made 30 minutes ago because context was compacted. Tests pass because they mock the dependency. Breaking changes ship. The problem isn't Claude — it's that the agent generates from training-data patterns when it doesn't have authoritative retrieval against your codebase. Two failure modes, one root cause: the symbol graph lives outside the context window.

What actually happens during compaction

Claude Code runs on a finite context window. Opus gives you roughly 200K tokens. That sounds like a lot until you start reading files, running searches, and iterating on a real codebase — which is the entire point of using Claude Code.

Here's how the budget gets spent in a typical session:

Activity	Tokens	Cumulative
System prompt + CLAUDE.md	~4K	4K
You explain the task	~1K	5K
Agent reads 3 files	~12K	17K
Agent searches, reads 4 more files	~18K	35K
First edit + test cycle	~25K	60K
Second iteration (more reads, more edits)	~40K	100K
Third iteration	~45K	145K
You ask a follow-up question	~2K	147K
Agent reads more files to answer	~20K	167K

At some point between 150K and 200K, Claude Code triggers compaction. Earlier messages get summarized. The summaries are lossy. And here's the part that matters: file contents and tool outputs get dropped first.

After compaction, the agent retains a general sense of what happened — "I edited auth.ts to fix a token refresh bug" — but loses the specifics. Which files did it read? What was the exact function signature? What was in that config file that constrained the solution? What other approach did it consider and reject, and why?

All of that is gone. The agent doesn't know it's gone. It continues working with a vague summary where detailed knowledge used to be.

The five things your agent forgets

From nine months of building with Claude Code on a real codebase, these are the categories of knowledge that consistently get lost after compaction:

Which files matter. In a repo with 1,000+ files, maybe 40 of them are structurally important — the core modules, the main entry points, the shared types. After compaction, the agent has no way to distinguish OrderProcessor.ts from OrderProcessor.test.fixtures.ts. It treats the codebase as flat.
What was decided. "We're not using the deprecated v1 auth flow because it doesn't support refresh tokens." That decision, made 45 minutes ago after reading three files, is now a faint summary. The agent might re-discover the v1 flow and try to use it.
What's risky to change. Some files have 30 downstream dependents. Some have zero. After compaction, the agent doesn't remember which is which. It edits a core utility with the same confidence it edits a leaf component.
What it already tried. The agent explored an approach, hit a wall, backed off. After compaction, it may explore the same dead end again. You watch it re-discover the same constraint you both already worked through.
The conventions. Error handling patterns, naming conventions, import ordering, test structure. These aren't in CLAUDE.md (they're too detailed and numerous). The agent learned them by reading code in the first half of the session. Now they're gone.

Why "just read the file again" doesn't work

The obvious fix is: if the agent lost context on a file, it can just read it again. This is what happens in practice, and it's worse than it sounds.

The first problem is that the agent doesn't remember which files to read. It knows vaguely that it was working on "auth," but was the implementation in src/auth/token.ts or src/services/auth-service.ts or lib/session/refresh.ts? In a large codebase, there might be a dozen plausible files. The agent picks one. Often it picks wrong.

The second problem is that each re-read burns context window budget. You're spending tokens to re-acquire knowledge the agent already had. This accelerates the next compaction, which triggers more re-reads, which accelerates the next compaction. It's a degenerative loop.

The third problem is the agent reads indiscriminately. Without memory of what it learned before, it can't be selective. It reads entire files when it only needed one function. It reads test fixtures thinking they're implementation. It reads generated code — migration files, lock files, build artifacts — that a human would skip instinctively.

The fourth problem is the subtle one: re-reading isn't re-understanding. The first time the agent read those files, it was building a mental model. It saw how auth-service.ts calls token.ts which calls session-store.ts. That structural understanding — the dependency chain, the data flow — came from reading multiple files in sequence. Re-reading one file doesn't reconstruct the model. The agent gets facts without structure.

The structural fix: intelligence that lives outside the context window

The context window is temporary storage. Compaction is a garbage collector you can't control. Any knowledge that only exists in the context window has a half-life measured in minutes during an active session.

The fix is to stop storing code intelligence in the context window entirely. Store it somewhere persistent — somewhere the agent can query on demand without re-reading files, without burning tokens to re-discover things, without losing decisions across compactions.

This is what we built Sverklo to do. It's an MCP server that builds and maintains a local code intelligence index — a SQLite database with the structural knowledge that compaction destroys. The agent queries sverklo instead of re-reading files. Sverklo's answers come from the index, not from the context window, so they survive compaction.

Concretely, here's what sverklo keeps that the context window doesn't:

PageRank over the dependency graph

Sverklo builds a file-level dependency graph and runs PageRank over it. The result is a number for every file in the repo: how structurally central is this file? src/core/types.ts with 47 importers gets a high score. test/fixtures/mock-order.ts with zero importers gets a low score. When the agent asks "which files matter for auth?", sverklo returns results ranked by structural importance, not just textual relevance.

After compaction, the agent doesn't remember that OrderProcessor.ts is a critical hub. But sverklo does. The PageRank scores are in SQLite, not in the context window.

Hybrid search that works on the first query

Sverklo combines BM25 keyword search, vector similarity (ONNX embeddings, local, no API calls), and PageRank into a single result using Reciprocal Rank Fusion. The agent calls sverklo_search with a natural-language query and gets back the right code on the first try — not the fifth.

This matters after compaction because the agent's first instinct is to search. If the search returns the wrong files, the agent reads the wrong files, builds the wrong mental model, and writes the wrong code. Getting the first search right is the highest-leverage intervention.

Bi-temporal memory

When the agent makes a decision — "we're using the v2 auth flow because v1 doesn't support refresh" — it can call sverklo_remember to persist that decision. The memory is stored in SQLite, tagged with the current git SHA. After compaction, the agent calls sverklo_recall and the decision is still there.

The "bi-temporal" part: sverklo tracks both when the decision was made and which code state it applies to. If you check out a different branch, sverklo knows which memories are valid for that branch's code. If the code changes in a way that invalidates a decision, the memory is flagged. This is strictly more useful than a flat text file of notes because the validity is tied to the codebase, not to a timestamp.

Impact analysis

Before the agent edits a file, it can call sverklo_impact to see the blast radius — every file that imports this one, transitively. After compaction, the agent doesn't remember the dependency graph. But sverklo has it indexed. The agent can check "if I change this function signature, what breaks?" without re-reading the entire import tree.

This is the safety net that compaction removes. A fresh agent treats every file as equally safe to edit. An agent with sverklo knows that changing src/core/types.ts touches 47 downstream files and proceeds accordingly.

What this looks like in practice

Setup takes about two minutes:

npm install -g sverklo
cd your-project && sverklo init

That's it. Sverklo indexes the codebase (typically 10-30 seconds for a repo under 5,000 files), builds the dependency graph, computes PageRank, and starts the MCP server. Claude Code auto-discovers it.

After compaction, the difference is concrete:

Without sverklo	With sverklo
Agent searches with grep, gets 200 results, reads the wrong 3 files	Agent calls `sverklo_search`, gets 5 ranked results with PageRank boosting, reads the right file first
Agent forgets the v1/v2 auth decision, explores v1 again	Agent calls `sverklo_recall("auth")`, gets the decision back in 1 tool call
Agent edits a core type with no awareness of downstream impact	Agent calls `sverklo_impact("src/core/types.ts")`, sees 47 dependents, adjusts approach
Agent re-reads 6 files to rebuild context (~12K tokens burned)	Agent calls `sverklo_search` + `sverklo_recall` (~800 tokens for both responses)

The last row is the one that compounds. Fewer tokens spent on re-acquiring context means more tokens available for actual work, which means compaction happens later, which means fewer re-acquisitions. The loop inverts from degenerative to virtuous.

The dogfood proof

We use sverklo to develop sverklo. Across a 3-session dogfood where sverklo was reviewing its own codebase, the persistent memory and search caught 4 bugs that fresh sessions would have missed — cases where the agent needed to remember a constraint from session 1 to recognize the bug in session 3.

One example: the agent discovered in session 1 that a particular file-watcher debounce was set to 100ms, which caused duplicate index updates on fast saves. It remembered this via sverklo_remember. In session 3, while working on a completely different feature, the agent called sverklo_recall before editing the watcher code, got the debounce constraint back, and avoided re-introducing the issue. Without persistent memory, session 3's agent would have had zero context on why that debounce existed.

This is the pattern persistent intelligence enables: knowledge that accumulates across sessions instead of resetting every time the context window fills up.

What sverklo doesn't fix

I want to be clear about the limits.

Compaction is a real LLM limitation, not a bug. Context windows are finite. Summarization is lossy. This is the architecture. Sverklo doesn't make the context window bigger or prevent compaction from happening. It moves the most valuable knowledge out of the blast radius.

You still need CLAUDE.md. Sverklo handles dynamic knowledge — decisions, search results, structural analysis. Static instructions ("always use pnpm", "tests go in __tests__/", "we use Zod for validation") still belong in CLAUDE.md where they're loaded into every session's system prompt. Sverklo and CLAUDE.md are complementary, not competing.

Very small codebases don't need this. If your entire project is 20 files and fits comfortably in the context window without compaction, you won't see much benefit. The pain scales with codebase size. If you're working on a 50-file project and compaction isn't biting you, save your time.

The agent still has to query sverklo. Sverklo doesn't inject knowledge into the context window proactively. The agent has to call sverklo_search or sverklo_recall. In practice, Claude Code does this naturally when sverklo is configured as an MCP server — the tools show up alongside the built-in tools, and the agent uses them when they're relevant. But it's pull, not push.

Why this matters now

Six months ago, most people using Claude Code were working on small projects or doing one-off tasks. Context loss wasn't a top-5 pain point because sessions were short and codebases were small.

That's changed. Teams are using Claude Code on production codebases with hundreds of thousands of lines. Sessions are getting longer as tasks get more complex. Multi-step refactors that span an hour. Architecture decisions that build on each other. Code reviews where the agent needs to understand the full picture.

In this world, compaction isn't an edge case. It's the normal state of any serious session. And without persistent intelligence, every compaction is a partial reset — the agent gets dumber as the session gets longer, which is the exact inverse of what you want.

The context window is a cache. Treat it like one. Keep the source of truth somewhere that doesn't evict under pressure.

Try sverklo on your codebase

npm install -g sverklo
cd your-project && sverklo init

Next time Claude Code compacts, sverklo_recall still has your decisions. sverklo_search still finds the right code. sverklo_impact still knows the blast radius.

github.com/sverklo/sverklo