/* the paper, the data, the runs we lose */

Sverklo, on the record.

A local-first MCP code-intelligence server, two reproducible benchmarks, and an 8-page preprint that publishes both. Independent research, CC BY 4.0, on Zenodo with a permanent DOI.

The benchmarks are the point. We ship the harness, the inputs, the scoring, and the runs — including the categories where a tuned grep baseline beats us on aggregate F1. If you’re going to trust a code-intelligence layer with your repo, you should be able to reproduce its numbers in an afternoon. So you can.

Preprint · April 2026 · CC BY 4.0 · DOI: 10.5281/zenodo.19802051

Read the paper (PDF) → Cite (BibTeX) Reproduce: npm run bench:primitives

What’s in it

§ III · SYSTEM

The system

Sverklo’s architecture, the indexing pipeline, hybrid retrieval via channelized RRF over BM25 + embeddings + symbol & path channels, bi-temporal memory pinned to git SHAs, and the 37-tool MCP surface.

§ IV.A · BENCHMARK 1

bench:primitives

A deterministic 60-task suite measuring definition lookup, reference finding, file dependencies, and dead-code detection on two repos. Sverklo vs naive grep vs a tuned smart-grep, scored on F1, precision, recall, and input tokens.

§ IV.B · BENCHMARK 2

bench:swe

The harder one. 65 hand-curated research questions across five popular open-source projects (Express, NestJS, Vite, Prisma, FastAPI) in two ecosystems. Required-evidence-file scoring. Pull requests welcome.

The honest result

What the paper actually says. On structural questions — “where is this defined?”, “what depends on this file?” — sverklo wins decisively (F1 0.75 and 0.86 vs smart-grep’s 0.60 and 0.63) while using 65% fewer input tokens (255 vs 731). On reference-finding and dead-code detection, a tuned grep baseline beats us. Aggregate F1 across all four categories: 0.58 (sverklo) vs 0.67 (smart-grep).

We published that number on purpose. A benchmark you only release when you win is marketing; a benchmark you release when you lose is a benchmark. The contribution is the harness, not the leaderboard — and the harness is what you’ll use to evaluate the next code-intelligence MCP, including ours when we ship the next version.

Reproduce in 90 seconds

Numbers in the paper come from sverklo v0.17.1 and result file benchmark/results/2026-04-07T23-07-14-211Z. To reproduce locally:

$ npm install -g sverklo
$ git clone https://github.com/sverklo/sverklo
$ cd sverklo && npm run bench:primitives

Output is written to benchmark/results/. The bench:swe harness clones each pinned repo into benchmark/.cache/, indexes it, and writes per-question results to a timestamped output directory. Apple M-series, 16 GB RAM, macOS 14, Node.js 20.11.

Cite this

The canonical citation:

Groshin, N. (2026). Sverklo: A Local-First Code Intelligence MCP Server and a Cross-Repository Software Engineering Benchmark. Zenodo. https://doi.org/10.5281/zenodo.19802051

BibTeX:

@misc{groshin2026sverklo,
  author    = {Groshin, Nikita},
  title     = {{Sverklo}: A Local-First Code Intelligence {MCP} Server
               and a Cross-Repository Software Engineering Benchmark},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19802051},
  url       = {https://doi.org/10.5281/zenodo.19802051}
}

Try sverklo

If you reached this page from the paper, the install is one line:

$ npm install -g sverklo
$ cd your-project && sverklo init

That writes .mcp.json, appends sverklo instructions to your CLAUDE.md, and runs sverklo doctor to verify the MCP handshake. No API keys, no cloud, no telemetry by default. Back to the homepage →