Engineering · Sverklo · 2026-05-08

We Already Shipped MCP Code Mode — Sverklo's Tool Surface, Measured

2026-05-08 ~6 min read by Nikita Groshin

A commenter on yesterday's leaderboard launch told me 37 tools is too much. They're not wrong. Q2 2026's defining MCP conversation is exactly this: Cloudflare's Code Mode cut a 2,500-endpoint API spec from 1.17M tokens to ~1K, Anthropic's MCP Tool Search lazy-load hit ~95% context reduction on Claude Code, Maxim wrote about cutting 92% at 500+ tools. Sverklo has shipped the same idea for months under SVERKLO_PROFILE. Today I sat down and measured it.

SVERKLO_PROFILE=core drops the tools-list from 8,016 tokens to 1,522 tokens. 81% reduction with one env var. No recompile, no lazy-load wiring, no API redesign.

Method

Spawn sverklo MCP server with each SVERKLO_PROFILE value, send the JSON-RPC tools/list request, capture the response, and measure the JSON byte/token cost. Tokens estimated as ceil(chars / 3.5) — same heuristic sverklo uses internally for its own bench. The full code is tool-overrides.ts; the script that produced these numbers is at the bottom of this post and reproduces in <30 seconds.

The numbers

Profile Tools JSON chars Est. tokens Reduction vs full
core55,3241,52281.0%
nav87,9522,27271.7%
review109,7572,78865.2%
lean1112,1413,46956.7%
research1813,8243,95050.7%
full (default)3628,0558,016

The headline: 81% reduction with SVERKLO_PROFILE=core. That's competitive with Maxim's 92% (which they hit at 500+ tools, where the marginal token saving curve is steeper) and with Anthropic's lazy-load 95% (which only kicks in if the agent doesn't actually use most tools). Sverklo's profile system gets there with five hard-coded named subsets — no tool-search round-trip, no lazy-load handshake, no behavioral change at all from the agent's perspective.

What's in each profile

core (5 tools — code-intel hot path)

sverklo_search, sverklo_lookup, sverklo_overview, sverklo_refs, sverklo_impact. The five tools an agent reaches for in 80% of code-intelligence sessions: search-by-concept, lookup-by-name, file structure, find-references, blast-radius. If your agent only does code retrieval and graph navigation, this is the right pick.

nav (8 — adds dependency + context surfaces)

Core plus sverklo_deps, sverklo_context, sverklo_status. The point of nav is to handle file-level questions ("who imports this?", "what does it depend on?") without bringing in memory or audit. Useful for refactoring agents where you want the full graph but not the human-facing audit grades.

lean (11 — adds memory + diff review)

Nav plus sverklo_remember, sverklo_recall, sverklo_review_diff. This is the most common power-user pick: continuous memory of past decisions across sessions, plus the diff-review surface for PR-time. If you're using sverklo with Claude Code on a long-running project, lean is probably what you want — full graph, full memory, no specialized audit/concept tools eating context until you need them.

research (18 — open-ended exploration)

Adds sverklo_search_iterative, sverklo_investigate, sverklo_ask, sverklo_concepts, sverklo_patterns, sverklo_clusters, sverklo_verify, sverklo_critique, and the ctx_* handle ops. For agents doing onboarding, code archaeology, or methodology questions where the next tool call depends on the previous answer.

review (10 — PR/MR review)

Diff tools front-and-center: sverklo_review_diff, sverklo_diff_search, sverklo_test_map, sverklo_impact, sverklo_refs, sverklo_lookup, sverklo_search, sverklo_investigate, sverklo_verify, sverklo_status. For agents that wake up on PR-open and need to assess risk before posting a comment.

full (36 — default)

All 36 tools including memory ops (pin, unpin, promote, demote, forget, memories, wakeup), context-handle suite (ctx_grep, ctx_peek, ctx_slice, ctx_stats, head_results, grep_results), audit (audit), ast_grep, and the legacy get_indexing_status compatibility alias. The default for a reason — it's the most capable surface — but the right setting only when context is cheap and agent capability is high.

How to set it

Three options, in order of binding strength:

# 1. Process env var (per session)
SVERKLO_PROFILE=core sverklo .

# 2. .sverklo.yaml (per project, version-controlled)
profile: core

# 3. Shell rc (per machine)
export SVERKLO_PROFILE=core

The cache is built once at sverklo startup, so changes need a restart of the MCP server (the MCP client's IDE window). This matches how agents already expect tool metadata to work — stable per session.

For finer control, SVERKLO_DISABLED_TOOLS=tool1,tool2 drops specific tools from any profile (useful when you want core but minus sverklo_overview, say). And SVERKLO_TOOL_<NAME>_DESCRIPTION overrides individual descriptions, which is how power users repurpose sverklo_remember as an architecture decision log without forking.

What this is not

Sverklo's profile system is not Cloudflare's Code Mode. The Cloudflare pattern collapses N tools to two: search(query) and execute(action, args). The agent never sees the typed surface; it queries for available actions on demand. That's a 99.9% reduction at 2,500 endpoints — but it costs the agent two extra round-trips per invocation, and you lose the structured tool descriptions that help the agent pick the right verb upfront.

Sverklo's design bet is the opposite: the typed surface is a feature for capable agents (the bench measures sverklo at 1.0 tools-per-task vs naive grep's 6.1, on the same task suite — one typed call gets the answer, no cascade). Slim the surface enough to fit the context budget, but keep it typed.

If we run into a deployment where 1,522 tokens for the core profile is still too many, the Cloudflare pattern is a real next step. For now the data says profile filtering is sufficient. Tell me if you hit a wall.

The subagent example

Yesterday I shipped agents/sverklo-explore.md as a drop-in replacement for Claude Code's built-in Explore subagent. It's the slim-surface posture in a copy-paste artifact: the subagent declares only seven sverklo tools (lookup, refs, deps, overview, impact, search, status) and explicit anti-patterns ("don't grep, don't chain, don't summarize beyond tool output"). The parent agent only sees those seven in the subagent's frontmatter, which is identical to running with a custom SVERKLO_DISABLED_TOOLS list — same effect, different mechanism.

If you're using Claude Code on a large repo today and watching tokens, that subagent file is the cheapest possible thing to copy in.

Honest tradeoffs

What you lose in core:

None of these break retrieval. They just remove sverklo's specialized surfaces. The bench (which measures retrieval F1 only) is unaffected by profile choice — sverklo F1=0.60 on the public ranking is the core number too, since P1/P2/P4 only need lookup/refs/deps.

Reproducer

npm install -g sverklo
SVERKLO_PROFILE=core sverklo /path/to/repo &
# Send tools/list via JSON-RPC stdin, count chars in result.tools

Or copy the <50-line measurement script we used here from the sverklo repo at scripts/measure-profiles.mjs. Re-run on your own machine and post a delta if the numbers don't match.

Why publish this now

The 37-tools complaint on the launch thread was a real signal. The Q2 2026 trend is real. Sverklo has shipped the answer for months but never measured it publicly. The cost of writing this post was about three hours including the measurement and the SoftwareApplication schema additions on the comparison pages. The cost of not writing it was that anyone evaluating sverklo against the Code Mode discourse would assume we hadn't thought about it.

The pattern keeps being right. The bench losing teaches the bench more than the bench winning. The slim-profile post lands tonight; if a maintainer of a competing MCP server wants to publish their own profile measurements, the harness is portable. Open an issue if you want me to bench against a specific tool.

Try sverklo with the slim profile

npm install -g sverklo
SVERKLO_PROFILE=core sverklo init

5 tools, 1,522 tokens of system-prompt overhead, full hybrid retrieval underneath. Public bench · Drop-in subagent for Claude Code · Source of the profile system

References

See also