Does sverklo have too many MCP tools?

Sverklo exposes 36 tools in the default `full` profile, which is the biggest local-first surface in the category. The cost is 8,016 tokens of system-prompt overhead — about 4% of a 200K context window. For agents with capable models that can ignore irrelevant tools this is fine. For token-conscious deployments, set `SVERKLO_PROFILE=core` to drop to 5 tools and 1,522 tokens — 81% reduction with one env var, no recompile.

How do I reduce my MCP server's tool list size?

Three patterns. (1) Profile filtering: most servers have an env var or config to expose a subset of tools — sverklo has `SVERKLO_PROFILE=core|nav|lean|research|review|full`. (2) Anthropic's MCP Tool Search lazy-loads tools on demand instead of upfront, ~95% reduction on Claude Code. (3) Cloudflare's Code Mode collapses an entire API surface to two tools (`search()` + `execute()`) with the discoverable surface kept server-side. Sverklo implements pattern (1) today; the bench wins on tools-per-task=1.0 even before profile filtering.

What is MCP Code Mode?

Code Mode is the term Cloudflare introduced for an MCP design pattern that exposes only two tools to the agent — `search()` to discover available actions, and `execute()` to run them — instead of registering every action upfront. They demonstrated cutting a 2,500-endpoint API spec from 1.17 million tokens to ~1,000 tokens (99.9% reduction). Anthropic's MCP Tool Search lazy-load is a related but distinct mechanism that keeps the typed surface but defers loading until first use. Both target the same problem: tool-list bloat eating the agent's context window.

Which sverklo profile should I use?

If you only do code retrieval (find symbols, callers, dependencies), use `core` — 5 tools, 1,522 tokens. If you want the audit and review surfaces too, use `lean` (11 tools, 3,469 tokens). For open-ended exploration that needs concept clusters and verification, use `research` (18 tools, 3,950 tokens). For PR-review-focused agents, use `review` (10 tools, 2,788 tokens). Default `full` exposes all 36 tools at 8,016 tokens — fine for power users on Claude Sonnet, expensive for small models or long sessions.

Engineering · Sverklo · 2026-05-08

We Already Shipped MCP Code Mode — Sverklo's Tool Surface, Measured

2026-05-08 ~6 min read by Nikita Groshin

A commenter on yesterday's leaderboard launch told me 37 tools is too much. They're not wrong. Q2 2026's defining MCP conversation is exactly this: Cloudflare's Code Mode cut a 2,500-endpoint API spec from 1.17M tokens to ~1K, Anthropic's MCP Tool Search lazy-load hit ~95% context reduction on Claude Code, Maxim wrote about cutting 92% at 500+ tools. Sverklo has shipped the same idea for months under SVERKLO_PROFILE. Today I sat down and measured it.

SVERKLO_PROFILE=core drops the tools-list from 8,016 tokens to 1,522 tokens. 81% reduction with one env var. No recompile, no lazy-load wiring, no API redesign.

Method

Spawn sverklo MCP server with each SVERKLO_PROFILE value, send the JSON-RPC tools/list request, capture the response, and measure the JSON byte/token cost. Tokens estimated as ceil(chars / 3.5) — same heuristic sverklo uses internally for its own bench. The full code is tool-overrides.ts; the script that produced these numbers is at the bottom of this post and reproduces in <30 seconds.

The numbers

Profile	Tools	JSON chars	Est. tokens	Reduction vs full
core	5	5,324	1,522	81.0%
nav	8	7,952	2,272	71.7%
review	10	9,757	2,788	65.2%
lean	11	12,141	3,469	56.7%
research	18	13,824	3,950	50.7%
full (default)	36	28,055	8,016	—

The headline: 81% reduction with SVERKLO_PROFILE=core. That's competitive with Maxim's 92% (which they hit at 500+ tools, where the marginal token saving curve is steeper) and with Anthropic's lazy-load 95% (which only kicks in if the agent doesn't actually use most tools). Sverklo's profile system gets there with five hard-coded named subsets — no tool-search round-trip, no lazy-load handshake, no behavioral change at all from the agent's perspective.

What's in each profile

core (5 tools — code-intel hot path)

sverklo_search, sverklo_lookup, sverklo_overview, sverklo_refs, sverklo_impact. The five tools an agent reaches for in 80% of code-intelligence sessions: search-by-concept, lookup-by-name, file structure, find-references, blast-radius. If your agent only does code retrieval and graph navigation, this is the right pick.

nav (8 — adds dependency + context surfaces)

Core plus sverklo_deps, sverklo_context, sverklo_status. The point of nav is to handle file-level questions ("who imports this?", "what does it depend on?") without bringing in memory or audit. Useful for refactoring agents where you want the full graph but not the human-facing audit grades.

lean (11 — adds memory + diff review)

Nav plus sverklo_remember, sverklo_recall, sverklo_review_diff. This is the most common power-user pick: continuous memory of past decisions across sessions, plus the diff-review surface for PR-time. If you're using sverklo with Claude Code on a long-running project, lean is probably what you want — full graph, full memory, no specialized audit/concept tools eating context until you need them.

research (18 — open-ended exploration)

Adds sverklo_search_iterative, sverklo_investigate, sverklo_ask, sverklo_concepts, sverklo_patterns, sverklo_clusters, sverklo_verify, sverklo_critique, and the ctx_* handle ops. For agents doing onboarding, code archaeology, or methodology questions where the next tool call depends on the previous answer.

review (10 — PR/MR review)

Diff tools front-and-center: sverklo_review_diff, sverklo_diff_search, sverklo_test_map, sverklo_impact, sverklo_refs, sverklo_lookup, sverklo_search, sverklo_investigate, sverklo_verify, sverklo_status. For agents that wake up on PR-open and need to assess risk before posting a comment.

full (36 — default)

All 36 tools including memory ops (pin, unpin, promote, demote, forget, memories, wakeup), context-handle suite (ctx_grep, ctx_peek, ctx_slice, ctx_stats, head_results, grep_results), audit (audit), ast_grep, and the legacy get_indexing_status compatibility alias. The default for a reason — it's the most capable surface — but the right setting only when context is cheap and agent capability is high.

How to set it

Three options, in order of binding strength:

# 1. Process env var (per session)
SVERKLO_PROFILE=core sverklo .

# 2. .sverklo.yaml (per project, version-controlled)
profile: core

# 3. Shell rc (per machine)
export SVERKLO_PROFILE=core

The cache is built once at sverklo startup, so changes need a restart of the MCP server (the MCP client's IDE window). This matches how agents already expect tool metadata to work — stable per session.

For finer control, SVERKLO_DISABLED_TOOLS=tool1,tool2 drops specific tools from any profile (useful when you want core but minus sverklo_overview, say). And SVERKLO_TOOL_<NAME>_DESCRIPTION overrides individual descriptions, which is how power users repurpose sverklo_remember as an architecture decision log without forking.

What this is not

Sverklo's profile system is not Cloudflare's Code Mode. The Cloudflare pattern collapses N tools to two: search(query) and execute(action, args). The agent never sees the typed surface; it queries for available actions on demand. That's a 99.9% reduction at 2,500 endpoints — but it costs the agent two extra round-trips per invocation, and you lose the structured tool descriptions that help the agent pick the right verb upfront.

Sverklo's design bet is the opposite: the typed surface is a feature for capable agents (the bench measures sverklo at 1.0 tools-per-task vs naive grep's 6.1, on the same task suite — one typed call gets the answer, no cascade). Slim the surface enough to fit the context budget, but keep it typed.

If we run into a deployment where 1,522 tokens for the core profile is still too many, the Cloudflare pattern is a real next step. For now the data says profile filtering is sufficient. Tell me if you hit a wall.

The subagent example

Yesterday I shipped agents/sverklo-explore.md as a drop-in replacement for Claude Code's built-in Explore subagent. It's the slim-surface posture in a copy-paste artifact: the subagent declares only seven sverklo tools (lookup, refs, deps, overview, impact, search, status) and explicit anti-patterns ("don't grep, don't chain, don't summarize beyond tool output"). The parent agent only sees those seven in the subagent's frontmatter, which is identical to running with a custom SVERKLO_DISABLED_TOOLS list — same effect, different mechanism.

If you're using Claude Code on a large repo today and watching tokens, that subagent file is the cheapest possible thing to copy in.

Honest tradeoffs

What you lose in core:

Memory persistence — sverklo_remember / sverklo_recall are not in core. If your agent relies on bi-temporal memory across sessions, use lean instead.
Audit grades — sverklo_audit is not in core. Drops the dead-code / coupling / coverage report. For one-shot retrieval that's fine; for review agents, use review or full.
Context-handle ops — ctx_grep, ctx_peek, ctx_slice, ctx_stats are gone. If your agent uses sverklo's handle system to refine retrieval iteratively, use research.
Diff-aware review — sverklo_review_diff, sverklo_diff_search, sverklo_test_map are absent. Use review or lean.

None of these break retrieval. They just remove sverklo's specialized surfaces. The bench (which measures retrieval F1 only) is unaffected by profile choice — sverklo F1=0.60 on the public ranking is the core number too, since P1/P2/P4 only need lookup/refs/deps.

Reproducer

npm install -g sverklo
SVERKLO_PROFILE=core sverklo /path/to/repo &
# Send tools/list via JSON-RPC stdin, count chars in result.tools

Or copy the <50-line measurement script we used here from the sverklo repo at scripts/measure-profiles.mjs. Re-run on your own machine and post a delta if the numbers don't match.

Why publish this now

The 37-tools complaint on the launch thread was a real signal. The Q2 2026 trend is real. Sverklo has shipped the answer for months but never measured it publicly. The cost of writing this post was about three hours including the measurement and the SoftwareApplication schema additions on the comparison pages. The cost of not writing it was that anyone evaluating sverklo against the Code Mode discourse would assume we hadn't thought about it.

The pattern keeps being right. The bench losing teaches the bench more than the bench winning. The slim-profile post lands tonight; if a maintainer of a competing MCP server wants to publish their own profile measurements, the harness is portable. Open an issue if you want me to bench against a specific tool.

Try sverklo with the slim profile

npm install -g sverklo
SVERKLO_PROFILE=core sverklo init

5 tools, 1,522 tokens of system-prompt overhead, full hybrid retrieval underneath. Public bench · Drop-in subagent for Claude Code · Source of the profile system

References

Cloudflare: Code Mode MCP — the 2,500-endpoint to 2-tool reduction
Anthropic: Code Execution with MCP — the official Tool Search / lazy-load piece
Maxim: Cutting MCP token costs by 92% at 500+ tools
sverklo: tool-overrides.ts — the actual profile filter