Why Claude Code Burns So Many Tokens — A Field Study (14,200 Tokens to Find One Function)

I logged every tool call my Claude Code agent made on a 200-file repo for a week. Grep alone cost me $47. Here's the data, and what I changed.

May 2026 · Nikita Groshin · ~2,000 words

The setup

Most engineers using Claude Code or Cursor have a vague sense that AI agents are expensive on large repos. Few of us have actually measured it. I instrumented one week of normal work — 47 sessions, 312 tasks, all on private codebases between 200 and 4,000 files — and parsed every tool call out of the session logs.

The results are not subtle.

Where the tokens go

Across 312 tasks, the average input-token spend per task was 22,840 tokens. That includes the system prompt, conversation history, and tool-call results. The split:

Source% of input tokensNotes
grep results41%Returned full lines, often hundreds per call
File reads28%Re-reading files the agent had already touched
glob results7%Listing files, often multiple times per session
Conversation history18%The agent's own prior outputs
System + tool definitions6%Fixed cost

Grep alone — a single tool call type — accounted for 41% of the entire token spend. On the median session, the agent ran 9 grep calls. The most expensive single grep returned 14,200 tokens of output for a query that produced one useful line.

The 14,200-token grep

Here's the actual call, paraphrased to scrub identifying details:

Tool: grep
Query: "logRequest|logResponse|requestId"
Files matched: 312
Lines returned: 1,847
Output tokens: 14,184

The agent was trying to find the canonical request-logging function in a 4,000-file repo. The query was reasonable — three identifiers that might match the answer. The output was 1,847 lines of context-free regex hits, of which exactly 3 were actually useful.

The agent then spent another 8,200 tokens reading two files to disambiguate, and ultimately edited the wrong one. The task took 4 grep calls + 6 file reads. Total: 47,300 input tokens. At Claude Sonnet's $3/M input rate, that's $0.14 — for one task.

I do roughly 50 of these a day. Pure grep cost on a normal week: $47.10.

The compounding problem

This isn't just expensive. It cascades.

When grep returns 1,847 lines, the agent has to read all of it as input. Its working context is now polluted with hundreds of irrelevant matches. The next tool call has worse signal-to-noise. The model's prior — that function names like logResponseTime exist in most codebases — wins over the actual evidence in your repo, which has scrolled out of attention.

This is the load-bearing failure mode of AI coding agents on large repos: expensive search → noisy context → model falls back to training-data priors → hallucinated function names → wrong edit.

You can see the chain in the data. Sessions with grep results over 8,000 tokens had a hallucination rate of 31%. Sessions with grep results under 2,000 tokens: 4%. The correlation is r = 0.74.

Why grep isn't the right tool

Grep matches identifiers lexically. It does three things wrong on code:

  1. No ranking. A grep on "request" returns 312 matches with no signal about which is load-bearing — which functions are central to the call graph and which are utility code. The agent reads the first three results and stops, which on a 4,000-file repo is almost always wrong.
  2. No semantic recall. Asking grep "what handles request timing in this repo?" doesn't work. The string "request timing" probably doesn't appear; the actual function is called recordLatency.
  3. No structure. Grep can't tell you which functions transitively call logRequest. For refactor tasks the agent needs the call graph, not the textual matches.

The honest fix is hybrid retrieval: BM25 for exact identifiers, embeddings for concepts, PageRank on the call graph for ranking, all combined and exposed as ranked results. None of that is exotic — it's the standard retrieval stack from search engines, applied to code. But no AI coding agent ships with it built in. You have to add it.

What I changed

I installed Sverklo, a local-first MCP server that gives Claude Code 37 extra retrieval tools. (Disclosure: I wrote it. The data above is on real private repos; you can reproduce it on your own.)

The exact change in my Claude Code config:

{
  "mcpServers": {
    "sverklo": {
      "command": "npx",
      "args": ["-y", "sverklo"]
    }
  }
}

Then cd your-project && sverklo init. Indexing took 47 seconds for the 4,000-file repo.

I re-ran the same 312 tasks against the indexed repo. Same prompts, same models, same machine. New numbers:

MetricBeforeAfterDelta
Avg input tokens / task22,8406,210−73%
Avg tool calls / task9.21.8−80%
Hallucinated function names31% of large-grep sessions2%−94%
Weekly token cost$47.10$12.83−73%

Most of the gain is from sverklo_search returning roughly 300 ranked tokens instead of 1,847 lines, plus sverklo_lookup answering "where is X defined?" in a single call. The agent doesn't need to grep its way around anymore.

Where it still doesn't help

I want to be honest about the slice where this changed nothing.

If your workflow is dominated by P1/P2, your token savings will be real but smaller than mine. If it's dominated by exploration ("what does this repo do?", "what handles X?", "what calls Y transitively?"), the ratio above is roughly what you should expect.

How to measure your own session

I shipped the instrumentation as a sverklo subcommand. It's in the latest npm release (v0.20.1):

npm install -g sverklo
sverklo receipt

It parses your last week of Claude Code session logs (~/.claude/projects/**/*.jsonl) and prints the same breakdown above for your own data. Output looks like this:

sverklo receipt
──────────────────────────────────────────────────────────
Last 7 days · 134 sessions · 10,317 tool calls

Token spend
  Input (new):                       310,032
  Cache reads (cheap):         5,344,464,042
  Cache writes (full price):     162,448,306
  Output:                         13,045,579

Estimated cost
  Sonnet rates:                     $2287.30
  Opus rates:                      $11436.49
  Projected yearly (Sonnet):      $119266.25

Top tool consumers
  Bash        4836 calls
  Edit        1938 calls
  Read        1435 calls
  Grep         228 calls
  …

The receipt is the cheapest experiment I can suggest. If your repo is small or your workflow doesn't include big exploration, the receipt will tell you so. If it doesn't, the install command is one line and the uninstall is npm uninstall -g sverklo.

Use --since 30d to widen the window, or --format json if you want to pipe it somewhere.

The deeper point

The cost of running an AI coding agent is not the model's per-token rate. It's the search inefficiency baked into the agent's tool surface. When the agent's only retrieval primitive is grep, every task pays a 5–10× tax for noisy context. The model's prior fills the gaps with confident-sounding fabrication. Engineers feel this as hallucination, slowness, and a $50/week bill they can't fully account for.

The fix isn't a smarter model. It's giving the agent a retrieval stack that's roughly what humans have been using on codebases for the last twenty years — ranked search, symbol lookup, call-graph traversal — exposed as cheap MCP tools.

That's the whole post. Run sverklo receipt on your own week and tell me if the numbers match.

Reproduce the bench: all numbers in this post that come from the sverklo bench are reproducible via npm run bench in the repo. 60 tasks, 5 baselines, raw data at sverklo.com/bench. If you find a number wrong, open an issue.

Updated: May 3, 2026 · MIT-licensed prose, reuse with attribution. Related: How I stopped Claude Code from hallucinating function names · A Practical Guide to MCP Servers for Code Intelligence.