The fix that wasn't

2026-05-26 · Nikita Groshin · ~10 min read

Our growth campaign Day 1 was supposed to be Saturday. Instead we shipped six npm versions in five days closing user-reported bugs. The most useful moment was the one we got wrong: shipping a fix to npm, watching our own test pass, and then a user came back and the failure still reproduced.

This post is the log. What broke, what we shipped, what we changed about how we ship after the false-positive caught us. If you maintain anything that other people run on Windows, the second half is the operationally useful part.

Five days, six versions

Date	Version	Reason
2026-05-22	v0.24.0	cross-repo MCP search + dashboard error banner + git-branch in /api/status
2026-05-22	v0.25.0	five fixes for one Windows tester (#53 MCP probe, #58 reindex EBUSY, #59 embedding provider wiring, #60 coverage diagnostic, #61 hybrid lane attribution)
2026-05-22	v0.25.1	dashboard.js had a literal `SyntaxError` from a May 14 template-literal split — silently broken for 8 days
2026-05-24	v0.25.2	v0.25.0 #59 didn't actually fix the user-visible failure — second pass
2026-05-24	v0.26.0	#69 provider-change detection: refuse to mix vector spaces
2026-05-25	v0.26.1, v0.27.0	four new community issues (#71 tool-name double-prefix, #72 `init --global`, #73 `unregister --by-path`, #74 registry timestamp) — two closed same-day

None of this was the plan. The plan was a calibrated content launch — bench-v2 announce on r/mcp Saturday morning, an X thread, the comparison page going live. Instead we got a 48-hour stress test from one careful user and we kept finding things to fix.

The fix that wasn't

Day two of the campaign-that-wasn't, we shipped v0.25.0 with five fixes including #59 — Ollama embedding provider configured for 1024-dimensional vectors but the index stored 384-dimensional ones. The fix was a two-line wiring gap in the factory: createEmbeddingProvider() was called with no arguments, so the YAML config was a silent no-op. We patched it, wrote a regression test that asserted the indexer selected the Ollama provider when the config requested it, ran the test, watched it pass, shipped.

Twelve hours later the user came back. sverklo doctor still reported the dim mismatch. He'd run sverklo reindex --force --timing, watched the embed phase take 151 seconds (so the configured provider really was being called — not a silent fallback), and the index still contained 384-dim vectors.

The agent's test passed because it had answered the wrong question. "Did the indexer select the configured provider?" — yes. "Did the configured provider produce vectors of the configured dimension and store them?" — no. OllamaProvider claimed dimensions = 1024 in its API contract because that's what the user's YAML said, but it never validated the actual response length from Ollama against that number. So if Ollama returned 384-dim vectors despite the config — which is what was happening — the provider faithfully wrote 384-dim vectors into the index while claiming to its caller that it was producing 1024-dim ones.

The test had asserted on the wrong join.

What we changed

We sat down mid-firefight and amended the project constitution with a sixth principle: Validate the Fix, Then Ship. The required sequence:

Reproduce first: confirm the bug actually fails on the current code, on the user's reported platform when possible. "Looks like the bug" is not reproduction.
Regression test before code: write a test that fails on the unpatched code. A test that passes against the broken code provides no protection.
Patch then green test: apply the fix and confirm the regression test passes.
Validate against the artifact, not the source: run the user's original reproduction against the built sverklo (post-npm pack + global install, or post-tag npm publish). A green source-tree test that hasn't been exercised against the actual binary doesn't count as "fixed."
Close against a verified version: when closing the issue, reference a specific shipped version (v0.X.Y such that npm view sverklo version matches), not an unmerged branch or unbuilt commit.

Step 4 was the change we hadn't internalized. We'd been validating against the source tree, and source-tree validation is necessary-but-insufficient: vitest run exercising a unit test doesn't tell you whether the published npm install -g sverklo@0.25.2 binary actually does the right thing when the user runs their original command against it. We caught one false-positive because the user came back. The next false-positive doesn't have to come back.

What "validate against the artifact" looks like

For v0.25.2 (the real fix for #59), we installed the published binary globally on a fresh machine, configured a project with a deliberately-mismatched embeddings.dimensions: 1024 in .sverklo.yaml, and ran a tiny mock Ollama server on a non-standard port that returned 384-dimensional vectors regardless of input. Then we ran sverklo reindex --force against that setup. The output:

$ sverklo reindex --force
Clearing index at /tmp/sverklo-vi4-XXXX…
Reindexing from scratch…
Error: Ollama model 'fake-model' returned 384-dim vectors but the
provider was configured for 1024-dim. Update embeddings.dimensions
in .sverklo.yaml to 384, or switch to a model whose output matches
the configured dimension. (sverklo/sverklo#66)
    at OllamaProvider.embed (.../dist/src/indexer/embedding-providers.js:217:27)

That's the binary that npm install ships, exercising the actual failure path the user reported, throwing the actual error message we wrote — not a unit-test mock. Three more bugs (the fingerprintOf wiring in v0.26.0, the registry timestamp in v0.26.1, the init --global feature in v0.27.0) got the same treatment before being declared closed.

The eight-day silent SyntaxError

Adjacent failure, different lesson. On May 14 we extracted the dashboard's inline JavaScript out of a template literal in dashboard-html.ts into a standalone dashboard.js file (Tier 2.3 of a refactor cleanup). The extraction doubled all the single-quote escapes — what was \' inside the template literal got serialized as \\' in the standalone file, which is one literal backslash followed by an end-of-string in JavaScript, which is a SyntaxError.

The whole file failed to parse. The whole dashboard was blank for 8 days. Nobody noticed, including us, until v0.24.0's error-banner code (also in dashboard.js) couldn't help — because the banner code was inside the same file that failed to parse.

TypeScript build was green. The 600-some vitest tests were green. Nothing in CI ever loaded dashboard.js in a JavaScript parser because the test suite doesn't spin up a browser. So we shipped two minor versions on top of broken UI before a real user ran sverklo ui . and reported the SyntaxError.

The fix took one PR. The second fix — making sure this class of bug can't ship silently again — added scripts/lint-assets.mjs to CI: it runs node --check on every .js file under src/server/assets/ as part of the test job. One additional second of CI, one entire class of silent regression closed.

What it cost vs what it bought

Cost: five days that were supposed to be growth and were instead firefighting. Six npm releases that the launch plan didn't budget for. A new constitutional principle and a CI gate.

What we bought:

The cleanest base we've shipped. v0.27.0 has 711 tests on three operating systems, regression tests for every closed bug, and a CI parse-check on browser assets. We'd been planning to launch publicly on v0.23.1 — which had the silent embedding fallback, the false-success reindex, the SyntaxError dashboard, and the broken Windows MCP probe all latent. That launch would have been a disaster.
A discipline we didn't have before. Principle VI is the kind of rule that's obvious in hindsight and hard to follow without ratifying. Three subsequent bug fixes (v0.25.2, v0.26.0, v0.27.0) used the artifact-validation step explicitly. None of them shipped a false-positive.
A live tester who came back. The user who surfaced the original five bugs filed four new issues on Monday. That's a retention signal that doesn't show up in npm download counts.

What's next

The bench-v2 comparison still hasn't shipped publicly — that's the original campaign Day 1 content, blocked on running the bench against v0.27.0 numbers. Yesterday's community issues opened up the priority queue: #71 (the MCP tool-name double-prefix — every doc and skill in the wild has the wrong name in it) is the v0.28.0 work in progress. #72 (init --global) shipped in v0.27.0; #73 and #74 in v0.26.1, both with VI-compliant regression tests.

If you maintain something that ships to npm and you don't have a "run the published binary against the user's original reproduction" step in your release flow, the operationally useful takeaway here is: add one. It's the cheapest insurance against the test-passes-but-bug-persists class of failure. The unit test asserts your model of the code. The artifact validation asserts the code.

Sverklo is local-first MCP code intelligence — symbol graph, blast-radius, git-pinned memory. The repo is MIT. The public bench is reproducible from the published binary. The constitution is in the repo. Install with npm install -g sverklo; run sverklo doctor to check setup.

Discussion: r/mcp · GitHub issues · @sverklo on X