Claude Code Troubleshooting on Large Repos — 6 Failure Modes and Fixes
Claude Code is the strongest agentic coding tool I've used. On a 30-file project it's near-magical. On a 4,000-file repo it falls into specific, repeatable failure modes — files it can't find, function names it invents, tool calls it cascades, decisions it forgets, context windows it exhausts. Each failure has a name, a cause, and a fix. Six of them, in order of how often I've watched them happen, with the data behind each and links to the deeper writeups where they exist.
1. Claude Code stops finding files
"the file you're describing doesn't exist in this repository"
You added src/auth/refresh.ts ten minutes ago. The agent can see you typing about it. But when you ask it to "open the auth refresh file" it returns the equivalent of file not found.
Two causes, often combined:
Stale mental model. Claude Code internally maintains a soft model of your repo's structure based on the last directory listing it observed. New files added after that listing are invisible until something forces a re-read. The agent doesn't know it doesn't know.
Implicit path exclusions. The agent's earlier searches likely excluded test/, dist/, vendor/, build/, and similar directories as a noise-reduction heuristic. Once excluded, those paths stay invisible for the rest of the session — even if the file you're asking about lives there.
The fix: a live symbol index
Don't rely on the agent's cached directory state. Expose a query surface that always reflects current disk state:
- sverklo_overview — top files by PageRank, language breakdown, hub files. Always reflects what's currently indexed; doesn't carry conversation memory.
- sverklo_lookup — find any symbol by name, no path filters. Returns canonical definition with location.
- sverklo_search — hybrid retrieval (BM25 + ONNX embeddings + PageRank) over current index.
The structural fix is exposing these as MCP tools so the agent can re-query disk on demand. Deep dive on the hallucination side of this failure: Why Claude Code Hallucinates Function Names That Don't Exist In Your Codebase.
2. Claude Code hallucinates function names
"getUserByEmail() is defined in src/users.ts" — except your codebase calls it findByEmail
The agent confidently writes code that calls getUserByEmail(). Your codebase has findByEmail(). Tests pass because the test mocks the dependency. The PR ships. Production breaks.
This is a generation-from-priors failure. Claude saw thousands of getUserBy* functions in training data; your codebase's actual conventions are a single context-window away. After compaction, the priors win.
The fix: ground generations in your real symbol graph
Three layers, in order of effectiveness:
- Install a code-intelligence MCP server. Sverklo, jcodemunch-mcp, serena, Claude-Context — pick one. The agent gets a
lookuptool that verifies identifiers against a tree-sitter-parsed symbol table before writing them. Measurable effect: 37% fewer hallucinated imports on our bench. - Keep your CLAUDE.md concrete. Real paths, real identifiers, real conventions — not abstractions. Rules survive compaction better than examples; examples survive better than abstract guidance.
- Cap tool results at 2K tokens. Sessions with grep results over 8K tokens hallucinate 31% of the time vs 4% under 2K. The noise itself causes wrong answers downstream.
Deep dive: How I stopped Claude Code from hallucinating function names on a 4,000-file repo.
3. Claude Code burns tokens on grep cascades
14,200 input tokens consumed to locate one function
You ask Claude Code to find where parseConfig is defined. It runs grep -r "parseConfig". The grep returns 80 lines (definitions, calls, comments, test mocks). The agent runs three more greps to disambiguate. Each result re-feeds into context. By the time it produces an answer it has consumed 14,200 input tokens — about $0.04 on Claude Sonnet at current pricing. Multiply by 200 invocations a day across an engineering team and the math gets loud.
This is the single most common failure mode and the one with the largest measurable impact:
| Sample | Tokens | Hallucination rate |
|---|---|---|
| Grep results <2K tokens | ≈1,800 | 4% |
| Grep results 2K–8K tokens | ≈4,400 | 14% |
| Grep results >8K tokens | ≈11,200 | 31% |
Sample: 312 Claude Code tasks across one week, 200-file TypeScript repo. Full methodology + raw data in the field study.
The fix: typed retrieval instead of grep cascades
Replace grep with a structured symbol lookup that returns ranked results scoped by symbol type. Sverklo's hybrid retrieval (BM25 over chunk content + cosine similarity over ONNX embeddings + PageRank-weighted file ranking) returns the canonical definition with ~95% fewer tokens than grep.
The single tool call that replaces a grep cascade:
# Before — grep cascade, 14,200 tokens
grep -r "parseConfig" src/
grep -r "parseConfig" --include="*.ts" src/
grep -r "parseConfig" -A 5 src/ | head -100
# After — one typed call, ~150 tokens
sverklo_lookup({ symbol: "parseConfig" })
On the public bench, sverklo's tools-per-task is 1.0; naive grep is 6.1. Same task, ~6× fewer tool calls.
Deep dive: Why Claude Code Burns So Many Tokens — A Field Study.
4. Claude Code forgets yesterday's design decisions
"Why are we using Prisma?" — when you decided that three weeks ago
Yesterday you and Claude Code spent an hour on a design decision: Prisma over Drizzle, with reasons. Today you ask a question downstream of that decision and the agent suggests the opposite. Compaction ate the rationale.
This is structural to how Claude Code's context window works. When conversation length approaches the limit, older turns get summarized into a compressed representation. Code-specific decisions — exact identifiers, file paths, type signatures, library trade-offs — are the first to get lossy because they look noisy to the compactor relative to active conversational state.
The fix: bi-temporal memory pinned to git SHAs
Persist decisions in a queryable layer that survives compaction. Sverklo's memory layer uses bi-temporal columns: every memory carries valid_from_sha + valid_until_sha + superseded_by. Updating a decision doesn't overwrite — it inserts a new row, sets valid_until_sha on the old one, and links them via superseded_by. Recall queries can ask "what's true now?" or "what was true at commit abc?" with equal precision.
# After a design decision — explicitly remember:
sverklo_remember "We chose Prisma over Drizzle for the typed-ORM surface"
# Six months later, after migrations:
sverklo_recall "ORM choice" # returns current decision
sverklo_recall "ORM choice" --at-sha abc123 # what we believed at commit abc
The pattern dates to relational databases in the 1990s. Applied to agent memory, it makes context compaction recoverable instead of destructive. Deep dive: Bi-temporal memory for AI coding agents and We Already Shipped Git-for-Agent-Memory — Bi-Temporal Beats Branch-Snapshot.
5. Claude Code keeps repeating the same grep
"let me search for that" — five times in a row
Watch a Claude Code session on a large repo and count how often it announces a search and then runs essentially the same grep with slightly different flags. Three to five repetitions per task is normal. Each one consumes tokens.
This is a tool-selection problem, not a search problem. The agent doesn't know which tool to reach for, so it falls back to the most general one (grep) and varies its parameters until something works. The deeper cause: too many MCP tools confuse selection (the agent freezes on which to use), too few force fallback to grep.
The fix: a slim, opinionated tool surface
Five tools that cover 80% of code-intel sessions:
- sverklo_search — concept-level "where does X happen?" queries
- sverklo_lookup — exact-symbol "where is X defined?" queries
- sverklo_overview — "what's the structure of this codebase?" queries
- sverklo_refs — "who calls X?" queries
- sverklo_impact — "what breaks if I change X?" queries
Sverklo ships these as a named profile: SVERKLO_PROFILE=core exposes only those 5, dropping the system-prompt tool-list size 81% (8,016 → 1,522 tokens). The remaining 31 specialized tools stay hidden until you opt up.
Deep dive on the measurement: We Already Shipped MCP Code Mode — Sverklo's Tool Surface, Measured. Recipe page on combining profile-filtering with Anthropic's host-side defer_loading: Sverklo + Tool Search lazy-loading.
6. Claude Code's context window fills up
"To continue, please start a new conversation"
You've been working on a feature for two hours. Claude Code throws the soft-limit warning. You either start a new session and lose all the working context, or push past it and watch quality degrade as compaction lossily summarizes your past hour.
Three sources of context bloat, in descending order:
- Tool-call results — especially noisy grep output. See failure mode 3 above.
- Accumulated conversation history — every turn the model sees grows the prompt.
- System prompt's tool definitions — every MCP server adds 1K–10K tokens of tool descriptions to every turn's input.
The fix stack
Each of the previous five sections addresses a different bucket of context bloat. The cumulative effect is what matters:
| Source | Default cost | After fix |
|---|---|---|
| Grep cascades (per task) | ~14,200 tokens | ~500 tokens (typed retrieval) |
| Tool-list system prompt (per turn) | ~8,016 tokens | ~1,522 tokens (SVERKLO_PROFILE=core) |
| Memory across sessions | Lost on compaction | Recoverable via bi-temporal recall |
| Hallucinated identifiers (per task) | 31% rate above 8K | 4% rate under 2K |
Combined, a typical session that hit the context-window soft limit at ~hour 2 lasts ~hour 5+ on the same context budget. The cost is one MCP server install and a profile env var.
What this guide is and isn't
This is a troubleshooting pillar — six of the most common failure modes Claude Code hits on real repos, with the data behind each and concrete fixes. The deep-dives are linked from each section; each one has its own measurement methodology and reproducer.
This is not a sverklo pitch. The fixes work with any code-intelligence MCP server (jcodemunch-mcp, serena, GitNexus, Claude-Context, sverklo). Sverklo is the one I maintain and have the most numbers for, so the examples lean that way; the patterns generalize. The public 5-baseline benchmark shows where each tool wins and loses, including the slices where sverklo loses to others.
Try the fix stack
npm install -g sverklo SVERKLO_PROFILE=core sverklo init # Then in your AI agent: # Run sverklo_overview to see the codebase structure # Run sverklo_lookup symbol:"parseConfig" instead of grep # Run sverklo_remember to persist decisions across compactions
One install, one env var. Public bench · Recipe: profile + defer_loading · github.com/sverklo/sverklo
References
- Why Claude Code Burns So Many Tokens — Field Study (2026-05-03)
- How I stopped Claude Code from hallucinating on a 4,000-file repo (2026-04-09)
- Why Claude Code Hallucinates Function Names That Don't Exist (2026-04-09)
- Bi-temporal memory for AI coding agents (2026-04-19)
- We Already Shipped MCP Code Mode — Sverklo's Tool Surface, Measured (2026-05-08)
- We Already Shipped Git-for-Agent-Memory — Bi-Temporal Beats Branch-Snapshot (2026-05-09)
- Recipe: Sverklo + Tool Search lazy-loading
- Public 5-baseline retrieval benchmark