Reverse-chronological feed of what shipped in sverklo. Releases, blog posts, bench refreshes, methodology fixes, and the negative results we publish alongside the wins. Updated weekly.
2026-05-09
fix
v0.20.6
Audit JSON gets structured fields (format 1.0.0)
sverklo audit --format json now emits structured grade, numeric_score, and dimensions: [{name, grade, score, detail}] directly. The 0.4.0 schema only had a markdown content blob; consumers had to parse the headline + table to extract grades. Discovered while writing the dogfood workflow when the published GitHub Action's PR-comment builder was posting "Overall: ?" because of the missing structured fields. content stays for backwards compat.
commit 65806c0·bin/sverklo.ts
2026-05-09
distrib
Sverklo Audit GitHub Action published to Marketplace
Drop one line into your workflow and get a graded PR comment without uploading code: - uses: sverklo/sverklo@main. Local-first by design — the audit runs on your own GitHub Actions runner; no SaaS round-trip. Listing went live this afternoon.
marketplace listing·action.yml
2026-05-09
ship
Self-audit dogfood workflow on every PR
New .github/workflows/audit-self.yml runs sverklo's own audit against the local PR build (not the published npm version) and posts a sticky comment with the grade. The marketplace Action installs from npm, so PRs that change audit logic would grade themselves with stale code; the dogfood workflow uses the freshly-built binary. First PR opened against main gets the first comment.
commit 08405e4
2026-05-08
post
Q2 2026's MCP discourse landed on tool-list bloat. Cloudflare cut a 1.17M-token spec to ~1K with Code Mode; Anthropic shipped Tool Search lazy-load. Sverklo's SVERKLO_PROFILE env var has implemented the same idea for months — first time we measured it publicly: 8,016 → 1,522 tokens, 81% reduction with one env var. Per-profile breakdown table.
2026-05-08
ship
Hub schemas on /vs/ and /blog/ index
Added ItemList JSON-LD enumerating all 13 comparison pages and all 14 blog posts. Pages were SEO orphans — child pages had no canonical hub feeding internal PageRank. Compounding lift expected across every /vs/* and /blog/* URL.
vs hub·blog hub
2026-05-08
ship
Drop-in subagent for Claude Code users
agents/sverklo-explore.md ships as a curl-installable replacement for Claude Code's built-in Explore subagent. The default uses Read + Grep cascade (~14,200 tokens to find one function); this version uses sverklo's typed MCP tools (~150-800 tokens, single tool call). Tools-per-task on the bench: sverklo 1.0 vs naive grep 6.1.
subagent definition·agents/ directory
2026-05-07
bench
5 baselines (sverklo, smart-grep, jcodemunch-mcp, naive-grep, gitnexus) compared on 120 hand-verified retrieval tasks across 4 OSS codebases (express, lodash, requests, sverklo). Sverklo F1=0.60 overall leader; jcodemunch wins P1 def-lookup outright at 0.78 and we publish that loss visibly. The methodology repo at sverklo-bench accepts new baseline submissions via PR.
2026-05-07
fix
v0.20.3
Cascade bug — dependency-graph data integrity (sv-p4-04)
FileStore.upsert was using INSERT OR REPLACE, which on conflict deleted the row before re-inserting — triggering ON DELETE CASCADE on every dependency edge involving that file (both as source and target). buildGraph only restored outgoing edges, so incoming edges from cached source files were silently lost on every modification. Fix: INSERT … ON CONFLICT(path) DO UPDATE. Migration v8→v10 repairs corrupted DBs. Bench impact: sverklo P4 0.51 → 0.72.
issue thread·commit b3458c5
2026-05-05
post
Negative-result writeup. Wired a poor-man's late-interaction reranker into sverklo_lookup and sverklo_refs, ran A/B against the bench three times deterministically, F1 dropped from 0.5847 to 0.5551 (-3pp; -7.5pp on P1). Diagnosis: SQL match-quality (exact > prefix > substring) is already optimal for symbol-name queries; semantic alignment dilutes the signal instead of sharpening it. Promotion gate published for the next ColBERT v2 attempt.
2026-05-03
post
Instrumented a week of Claude Code sessions across 312 tasks. Grep accounts for 41% of input-token spend. Sessions with grep results >8K tokens hallucinate 31% of the time vs 4% under 2K. The fix is hybrid retrieval exposed as MCP tools — measurable ~60% token reduction.
2026-05-03
post
The honest "best of" landscape doc. 12 MCP servers compared on license, hosting, language coverage, tool count, and retrieval substrate. Includes sverklo's own gaps. Updated to current bench numbers.