Recipe · Sverklo

Sverklo + Tool Search lazy-loading (defer_loading)

2026-05-09 ~5 min read

Sverklo exposes 36 MCP tools by default — about 8,016 tokens of system-prompt overhead. Claude Code 4.x and the Claude API both ship a "Tool Search" / defer_loading mechanism that loads tool definitions on demand instead of upfront. They compose. This recipe shows three patterns: profile-only (server-side, hard-coded), defer-only (host-side, dynamic), and hybrid (both, for the most aggressive context budget).

Sverklo's SVERKLO_PROFILE filters which tools the MCP server exposes. defer_loading filters which tools the Claude host surfaces to the model. They're orthogonal — set both for maximum reduction.

Quickstart: which pattern do I want?

Pattern	Where	Tradeoff
Profile-only	sverklo (env var or .sverklo.yaml)	Static — pick a named subset, sverklo never exposes the others. Simple. Use this if your usage is consistent.
Defer-only	Claude Code settings.json / Claude API MCP Connector	Dynamic — sverklo exposes all 36, host lazy-loads. Tools appear when needed. Use this if your usage is variable.
Hybrid	Both	Smallest possible upfront list. Profile picks the always-loaded core; defer_loading governs the rest. Use this if you want minimum context overhead.

Pattern 1: Profile-only (server-side)

Set SVERKLO_PROFILE at sverklo startup. The MCP server's tools/list response only contains the named subset; everything else is invisible to every host.

# In your shell rc:
export SVERKLO_PROFILE=core    # 5 tools, 1,522 tokens (81% reduction)
# or:
export SVERKLO_PROFILE=nav     # 8 tools
export SVERKLO_PROFILE=lean    # 11 tools
export SVERKLO_PROFILE=full    # 36 tools (default)

# Or per-project, in .sverklo.yaml:
profile: core

List all profiles + their members: sverklo profile list. Suggest a profile from your real usage data: sverklo profile suggest.

Pro: works with any MCP host (Claude Code, Cursor, Windsurf, Zed, custom proxies), no host config required, deterministic. Con: hard-coded set — if you occasionally need a tool outside the profile, the host can't surface it.

Pattern 2: Defer-only (host-side)

Sverklo exposes all 36 tools; the Claude host lazy-loads them. This is automatic in Claude Code when MCP tool definitions exceed ~10K tokens — sverklo's full surface (8,016 tokens) is just under, but adding any other MCP server crosses the threshold and Tool Search activates.

For Claude API direct, configure defer_loading per-tool via the MCP Connector:

{
  "type": "mcp_toolset",
  "mcp_server_name": "sverklo",
  "default_config": { "defer_loading": true },
  "configs": {
    "sverklo_search":   { "defer_loading": false },
    "sverklo_lookup":   { "defer_loading": false },
    "sverklo_overview": { "defer_loading": false },
    "sverklo_refs":     { "defer_loading": false },
    "sverklo_impact":   { "defer_loading": false }
  }
}

This is the same shape as SVERKLO_PROFILE=core, but the deferred tools remain accessible — Claude can search for them when needed.

Pro: nothing missing — agent can reach any tool dynamically, just doesn't pay the upfront token cost. Con: requires host that supports Tool Search (Claude Code 4.x, Claude API). Not portable to other MCP clients.

Pattern 3: Hybrid

Profile-only filters server-side and defer_loading filters host-side. Best for context-conscious deployments running long sessions.

# Server: sverklo only exposes the lean profile (11 tools, 3,469 tokens)
export SVERKLO_PROFILE=lean

# Host (Claude API): of those 11, only 5 load eagerly; the rest defer.
{
  "type": "mcp_toolset",
  "mcp_server_name": "sverklo",
  "default_config": { "defer_loading": true },
  "configs": {
    "sverklo_search":   { "defer_loading": false },
    "sverklo_lookup":   { "defer_loading": false },
    "sverklo_overview": { "defer_loading": false },
    "sverklo_refs":     { "defer_loading": false },
    "sverklo_impact":   { "defer_loading": false }
  }
}

Result: ~1,522 tokens upfront for the 5 always-loaded tools, with the other 6 (deps, context, status, remember, recall, review_diff) reachable via Tool Search when relevant. ~81% reduction vs full, with no functionality lost.

How to find your right profile from real usage

After ~1 week of normal sverklo use, sverklo's activity log will have enough data to recommend a profile. Run:

sverklo profile suggest
# Or with a custom window:
sverklo profile suggest --days 14

Output looks like:

Sverklo profile suggestion — based on 498 tool calls in the last 30 days

----------------------------------------------------------------------
tool                                       calls     share
----------------------------------------------------------------------
sverklo_lookup                               150     30.1%
sverklo_refs                                 150     30.1%
sverklo_deps                                  75     15.1%
sverklo_audit                                 75     15.1%
sverklo_status                                48      9.6%
----------------------------------------------------------------------

Profile coverage on your usage:
core        5 tools   covers  60.2%
nav         8 tools   covers  84.9%   ~ close
review     10 tools   covers  69.9%
lean       11 tools   covers  84.9%   ~ close
research   18 tools   covers  84.9%   ~ close

Closest match: SVERKLO_PROFILE=nav (8 tools, 84.9% coverage)
But you also use sverklo_audit (15.1% of calls) which aren't in this profile.

Reads ~/.sverklo/<project>/activity.jsonl, aggregates per-tool counts, computes coverage for each named profile. Two paths if no profile is a clean fit: accept the gap (use nav, audit calls fail until you bump to full), or keep full and use SVERKLO_DISABLED_TOOLS to drop the long-tail tools you never call.

References

Sverklo's Code Mode measurement post — full per-profile token breakdown
Claude API: Tool Search Tool
Claude API: MCP Connector with defer_loading
Anthropic Engineering: Code execution with MCP
sverklo's profile filter source