Why does Claude Code stop finding files on a large repo?

Two common causes. First, the file is genuinely new since the last conversation turn that listed your repo structure; the agent's mental model is stale because it never re-read the directory. Second, the file is in a directory the agent's earlier search heuristic implicitly excluded (test/, dist/, vendor/, build artifacts) and never re-considered. The fix in both cases is a live symbol index — sverklo's MCP server exposes sverklo_overview which always reflects current disk state, and sverklo_lookup which doesn't apply silent path filters.

Why does Claude Code hallucinate function names that don't exist?

After context compaction, Claude Code loses the part of the conversation where it saw your real exports and falls back to pattern-matching what code 'usually looks like.' getUserByEmail() instead of findByEmail(), camelCase where your codebase uses snake_case. The structural fix is to expose the symbol graph as a queryable MCP tool so the agent can verify identifiers before writing them — sverklo's lookup + refs tools verify against a tree-sitter-parsed symbol table instead of training-data priors.

Why does Claude Code burn so many tokens on a 200-file repo?

In a one-week instrumented field study of 312 Claude Code tasks, grep alone accounted for 41% of input-token spend. The agent runs a noisy grep, gets 200 lines back, then runs three more greps to disambiguate. The dominant cost is the cascade of low-precision tool calls feeding raw text into the conversation. Replace grep with a typed symbol lookup that returns ranked results and the cost drops ~60%.

Why does Claude Code forget design decisions from yesterday?

Context window is bounded. When conversation length approaches the limit, Claude Code auto-compacts older turns into a summary. Code-specific decisions — exact identifiers, file paths, type signatures, the rationale for choosing one library over another — are the first to get summarized away. The fix is bi-temporal memory pinned to git SHAs: every decision gets recorded with valid_from_sha and valid_until_sha columns so future sessions can ask 'what was true at commit abc?' even after a dozen compactions.

Why does Claude Code keep running the same grep over and over?

Repeated-grep sessions usually mean the agent doesn't know what tools are available, or doesn't know which tool to reach for. The fix is a slim, opinionated tool surface — too many tools and the model freezes on selection; too few and it falls back to grep. Sverklo's profile system (SVERKLO_PROFILE=core) exposes 5 retrieval primitives that cover 80% of code-intel sessions. The remaining specialized tools defer or stay hidden until needed.

Why does Claude Code's context window fill up on large repos?

Three sources, in descending order: tool-call results (especially noisy grep output), accumulated conversation history, and the system prompt's tool definitions. A 200-file repo with grep-heavy retrieval can consume 14,200 tokens to find one function. The fix stack: typed retrieval (cuts grep cost ~95%), profile-filtered tool surface (cuts system-prompt overhead ~80%), and bi-temporal memory (lets compaction be lossless instead of destructive). Each addresses a different bucket of the bloat.

Engineering · Sverklo · 2026-05-09

Claude Code Troubleshooting on Large Repos — 6 Failure Modes and Fixes

2026-05-09 ~12 min read by Nikita Groshin

Claude Code is the strongest agentic coding tool I've used. On a 30-file project it's near-magical. On a 4,000-file repo it falls into specific, repeatable failure modes — files it can't find, function names it invents, tool calls it cascades, decisions it forgets, context windows it exhausts. Each failure has a name, a cause, and a fix. Six of them, in order of how often I've watched them happen, with the data behind each and links to the deeper writeups where they exist.

If you only read one section: grep accounts for 41% of Claude Code's input-token spend. The single highest-leverage fix is replacing grep cascades with a typed symbol lookup. Everything else is downstream.

In this guide

Claude Code stops finding files
Claude Code hallucinates function names
Claude Code burns tokens on grep cascades
Claude Code forgets yesterday's decisions
Claude Code keeps repeating the same grep
Claude Code's context window fills up

1. Claude Code stops finding files

"the file you're describing doesn't exist in this repository"

You added src/auth/refresh.ts ten minutes ago. The agent can see you typing about it. But when you ask it to "open the auth refresh file" it returns the equivalent of file not found.

Two causes, often combined:

Stale mental model. Claude Code internally maintains a soft model of your repo's structure based on the last directory listing it observed. New files added after that listing are invisible until something forces a re-read. The agent doesn't know it doesn't know.

Implicit path exclusions. The agent's earlier searches likely excluded test/, dist/, vendor/, build/, and similar directories as a noise-reduction heuristic. Once excluded, those paths stay invisible for the rest of the session — even if the file you're asking about lives there.

The fix: a live symbol index

Don't rely on the agent's cached directory state. Expose a query surface that always reflects current disk state:

sverklo_overview — top files by PageRank, language breakdown, hub files. Always reflects what's currently indexed; doesn't carry conversation memory.
sverklo_lookup — find any symbol by name, no path filters. Returns canonical definition with location.
sverklo_search — hybrid retrieval (BM25 + ONNX embeddings + PageRank) over current index.

The structural fix is exposing these as MCP tools so the agent can re-query disk on demand. Deep dive on the hallucination side of this failure: Why Claude Code Hallucinates Function Names That Don't Exist In Your Codebase.

2. Claude Code hallucinates function names

"getUserByEmail() is defined in src/users.ts" — except your codebase calls it findByEmail

The agent confidently writes code that calls getUserByEmail(). Your codebase has findByEmail(). Tests pass because the test mocks the dependency. The PR ships. Production breaks.

This is a generation-from-priors failure. Claude saw thousands of getUserBy* functions in training data; your codebase's actual conventions are a single context-window away. After compaction, the priors win.

The fix: ground generations in your real symbol graph

Three layers, in order of effectiveness:

Install a code-intelligence MCP server. Sverklo, jcodemunch-mcp, serena, Claude-Context — pick one. The agent gets a lookup tool that verifies identifiers against a tree-sitter-parsed symbol table before writing them. Measurable effect: 37% fewer hallucinated imports on our bench.
Keep your CLAUDE.md concrete. Real paths, real identifiers, real conventions — not abstractions. Rules survive compaction better than examples; examples survive better than abstract guidance.
Cap tool results at 2K tokens. Sessions with grep results over 8K tokens hallucinate 31% of the time vs 4% under 2K. The noise itself causes wrong answers downstream.

Deep dive: How I stopped Claude Code from hallucinating function names on a 4,000-file repo.

3. Claude Code burns tokens on grep cascades

14,200 input tokens consumed to locate one function

You ask Claude Code to find where parseConfig is defined. It runs grep -r "parseConfig". The grep returns 80 lines (definitions, calls, comments, test mocks). The agent runs three more greps to disambiguate. Each result re-feeds into context. By the time it produces an answer it has consumed 14,200 input tokens — about $0.04 on Claude Sonnet at current pricing. Multiply by 200 invocations a day across an engineering team and the math gets loud.

This is the single most common failure mode and the one with the largest measurable impact:

Sample	Tokens	Hallucination rate
Grep results <2K tokens	≈1,800	4%
Grep results 2K–8K tokens	≈4,400	14%
Grep results >8K tokens	≈11,200	31%

Sample: 312 Claude Code tasks across one week, 200-file TypeScript repo. Full methodology + raw data in the field study.

The fix: typed retrieval instead of grep cascades

Replace grep with a structured symbol lookup that returns ranked results scoped by symbol type. Sverklo's hybrid retrieval (BM25 over chunk content + cosine similarity over ONNX embeddings + PageRank-weighted file ranking) returns the canonical definition with ~95% fewer tokens than grep.

The single tool call that replaces a grep cascade:

# Before — grep cascade, 14,200 tokens
grep -r "parseConfig" src/
grep -r "parseConfig" --include="*.ts" src/
grep -r "parseConfig" -A 5 src/ | head -100

# After — one typed call, ~150 tokens
sverklo_lookup({ symbol: "parseConfig" })

On the public bench, sverklo's tools-per-task is 1.0; naive grep is 6.1. Same task, ~6× fewer tool calls.

Deep dive: Why Claude Code Burns So Many Tokens — A Field Study.

4. Claude Code forgets yesterday's design decisions

"Why are we using Prisma?" — when you decided that three weeks ago

Yesterday you and Claude Code spent an hour on a design decision: Prisma over Drizzle, with reasons. Today you ask a question downstream of that decision and the agent suggests the opposite. Compaction ate the rationale.

This is structural to how Claude Code's context window works. When conversation length approaches the limit, older turns get summarized into a compressed representation. Code-specific decisions — exact identifiers, file paths, type signatures, library trade-offs — are the first to get lossy because they look noisy to the compactor relative to active conversational state.

The fix: bi-temporal memory pinned to git SHAs

Persist decisions in a queryable layer that survives compaction. Sverklo's memory layer uses bi-temporal columns: every memory carries valid_from_sha + valid_until_sha + superseded_by. Updating a decision doesn't overwrite — it inserts a new row, sets valid_until_sha on the old one, and links them via superseded_by. Recall queries can ask "what's true now?" or "what was true at commit abc?" with equal precision.

# After a design decision — explicitly remember:
sverklo_remember "We chose Prisma over Drizzle for the typed-ORM surface"

# Six months later, after migrations:
sverklo_recall "ORM choice"  # returns current decision
sverklo_recall "ORM choice" --at-sha abc123  # what we believed at commit abc

The pattern dates to relational databases in the 1990s. Applied to agent memory, it makes context compaction recoverable instead of destructive. Deep dive: Bi-temporal memory for AI coding agents and We Already Shipped Git-for-Agent-Memory — Bi-Temporal Beats Branch-Snapshot.

5. Claude Code keeps repeating the same grep

"let me search for that" — five times in a row

Watch a Claude Code session on a large repo and count how often it announces a search and then runs essentially the same grep with slightly different flags. Three to five repetitions per task is normal. Each one consumes tokens.

This is a tool-selection problem, not a search problem. The agent doesn't know which tool to reach for, so it falls back to the most general one (grep) and varies its parameters until something works. The deeper cause: too many MCP tools confuse selection (the agent freezes on which to use), too few force fallback to grep.

The fix: a slim, opinionated tool surface

Five tools that cover 80% of code-intel sessions:

sverklo_search — concept-level "where does X happen?" queries
sverklo_lookup — exact-symbol "where is X defined?" queries
sverklo_overview — "what's the structure of this codebase?" queries
sverklo_refs — "who calls X?" queries
sverklo_impact — "what breaks if I change X?" queries

Sverklo ships these as a named profile: SVERKLO_PROFILE=core exposes only those 5, dropping the system-prompt tool-list size 81% (8,016 → 1,522 tokens). The remaining 31 specialized tools stay hidden until you opt up.

Deep dive on the measurement: We Already Shipped MCP Code Mode — Sverklo's Tool Surface, Measured. Recipe page on combining profile-filtering with Anthropic's host-side defer_loading: Sverklo + Tool Search lazy-loading.

6. Claude Code's context window fills up

"To continue, please start a new conversation"

You've been working on a feature for two hours. Claude Code throws the soft-limit warning. You either start a new session and lose all the working context, or push past it and watch quality degrade as compaction lossily summarizes your past hour.

Three sources of context bloat, in descending order:

Tool-call results — especially noisy grep output. See failure mode 3 above.
Accumulated conversation history — every turn the model sees grows the prompt.
System prompt's tool definitions — every MCP server adds 1K–10K tokens of tool descriptions to every turn's input.

The fix stack

Each of the previous five sections addresses a different bucket of context bloat. The cumulative effect is what matters:

Source	Default cost	After fix
Grep cascades (per task)	~14,200 tokens	~500 tokens (typed retrieval)
Tool-list system prompt (per turn)	~8,016 tokens	~1,522 tokens (`SVERKLO_PROFILE=core`)
Memory across sessions	Lost on compaction	Recoverable via bi-temporal recall
Hallucinated identifiers (per task)	31% rate above 8K	4% rate under 2K

Combined, a typical session that hit the context-window soft limit at ~hour 2 lasts ~hour 5+ on the same context budget. The cost is one MCP server install and a profile env var.

What this guide is and isn't

This is a troubleshooting pillar — six of the most common failure modes Claude Code hits on real repos, with the data behind each and concrete fixes. The deep-dives are linked from each section; each one has its own measurement methodology and reproducer.

This is not a sverklo pitch. The fixes work with any code-intelligence MCP server (jcodemunch-mcp, serena, GitNexus, Claude-Context, sverklo). Sverklo is the one I maintain and have the most numbers for, so the examples lean that way; the patterns generalize. The public 5-baseline benchmark shows where each tool wins and loses, including the slices where sverklo loses to others.

Try the fix stack

npm install -g sverklo
SVERKLO_PROFILE=core sverklo init
# Then in your AI agent:
# Run sverklo_overview to see the codebase structure
# Run sverklo_lookup symbol:"parseConfig" instead of grep
# Run sverklo_remember to persist decisions across compactions

One install, one env var. Public bench · Recipe: profile + defer_loading · github.com/sverklo/sverklo

References

Why Claude Code Burns So Many Tokens — Field Study (2026-05-03)
How I stopped Claude Code from hallucinating on a 4,000-file repo (2026-04-09)
Why Claude Code Hallucinates Function Names That Don't Exist (2026-04-09)
Bi-temporal memory for AI coding agents (2026-04-19)
We Already Shipped MCP Code Mode — Sverklo's Tool Surface, Measured (2026-05-08)
We Already Shipped Git-for-Agent-Memory — Bi-Temporal Beats Branch-Snapshot (2026-05-09)
Recipe: Sverklo + Tool Search lazy-loading
Public 5-baseline retrieval benchmark