Best MCP Servers for Code Intelligence — Honest Comparison of 12 Options (2026)

An opinionated landscape of 12 servers, with a comparison matrix you can actually use. Honest about where Sverklo (the project that wrote this guide) loses.

May 2026 · Nikita Groshin · ~3,200 words

Contents

  1. What MCP actually is (60 seconds)
  2. A taxonomy: four kinds of code-intel MCP
  3. The comparison matrix (12 servers)
  4. Decision tree: which one for you?
  5. Security PSA (read this even if you skip everything else)
  6. Glossary
  7. Where Sverklo loses (honest section)

What MCP actually is (60 seconds)

Model Context Protocol (MCP) is a JSON-RPC spec that lets a language-model client (Claude Code, Cursor, Windsurf, Cline, Zed, the Cursor SDK, the Anthropic SDK) call external tools the model author didn't ship. Tools live in servers — separate processes the client launches over stdio or talks to over HTTP. Each tool has a name, a schema, and a description; the model picks them up at conversation start and treats them like first-class functions.

The practical effect: instead of waiting for Anthropic to add a "search my repo" tool to Claude, you run a local MCP server that exposes my_repo_search and the agent uses it. The server can do anything — query a database, hit an internal API, run a tree-sitter parser, whatever.

For code intelligence specifically, MCP closes a loop that was previously broken: the agent stops being limited to its built-in grep / read / glob tools and starts seeing the codebase through your retrieval pipeline. That's the whole pitch.

A taxonomy: four kinds of code-intel MCP

The category is less unified than the marketing makes it sound. There are four meaningfully different things people call "code-intel MCP," and mixing them up causes most of the confusion in side-by-side comparisons.

1. Search servers

Index your repo (BM25 / embeddings / both) and answer "find me code about X." Examples: Claude Context (Zilliz), Local Code Search MCP, grepai, Seroost. Strength: cheap exploration. Weakness: no graph, no impact analysis, no memory.

2. Knowledge-graph / structural servers

Build a call graph or symbol graph and let the agent traverse it. Examples: GitNexus, CodeGraphContext, code-graph-mcp, codesight-mcp. Strength: structural queries ("what calls X transitively?"). Weakness: graph-only is bad at semantic recall and weak at ranking; many require Neo4j or Kuzu.

3. LSP-backed semantic servers

Wrap a Language Server Protocol implementation and expose its rename / refs / definition signals as MCP tools. Examples: Serena. Strength: precision (real renames, not regex). Weakness: requires a working LSP per language; configuration-heavy.

4. Hybrid / "all-of-the-above" intelligence servers

Combine search, structure, ranking, and memory. Examples: Sverklo (this site), jcodemunch-mcp (search + tree-sitter symbols), codebase-memory-mcp. Strength: covers more failure modes. Weakness: heavier installs and more complex to evaluate (which is why we wrote a benchmark).

None of these is wrong. They're answers to different questions. The decision tree below walks through which one matches your workload.

The comparison matrix (12 servers)

Numbers verified May 2026. Star counts and license terms change; for a value as load-bearing as the license, click through and read the actual LICENSE file before relying on this table.

Server Category License Stars Hosting Languages Tools Retrieval substrate Memory layer
Sverklo Hybrid MIT growing Local (SQLite + ONNX) 12 37 BM25 + embeddings + PageRank (RRF) Bi-temporal, git-SHA pinned
GitNexus Knowledge graph PolyForm Noncommercial 1.0 ~28-35K Local (KuzuDB) + browser UI 14 ~12 Cypher graph queries None
Serena LSP-backed MIT ~24K Local (LSP processes) 40+ (via LSP) ~15 LSP refs/defs/renames None
jcodemunch-mcp Search + symbols Dual: free non-commercial; $79–$1,999 commercial ~1.7K Local (Python) 10+ ~6 Tree-sitter symbol index None
Claude Context Search Apache 2.0 ~6K Local + external Milvus broad ~5 BM25 + dense vectors (Milvus) None
codebase-memory-mcp Hybrid (light) Open source ~1.6K Local (single static binary) 66 ~10 Symbol KG + git-aware diff Shallow
CodeGraphContext Knowledge graph MIT ~3K Local (Neo4j) broad ~8 Neo4j graph queries None
code-graph-mcp Knowledge graph MIT smaller Local 5+ ~6 Tree-sitter symbol graph None
Code Pathfinder Search + AST Apache 2.0 smaller Local ~6 ~7 AST + structural queries None
Local Code Search MCP Search MIT smaller Local broad ~3 Lexical + simple ranking None
code-review-graph Hybrid MIT ~13.5K Local (SQLite + FTS5) 23 28 FTS5 + RRF + optional embeddings Lite
Greptile (not strictly MCP) Cloud PR review Closed / paid n/a Cloud (their infra) broad n/a Hosted hybrid Hosted

Greptile included for completeness — it's the named cloud incumbent in the category, although it doesn't ship as an MCP server. Listed here so the comparison answers the literal question many engineers ask: "open-source alternative to Greptile."

Decision tree: which one for you?

The right question isn't "which is best" — there is no best. It's "which trade-offs do I want to make." Walk this tree top to bottom; the first matching answer is the practical recommendation.

Q1. Can your code legally leave the machine?

No (compliance, regulated industry, customer contract): you need local-first. Skip Greptile entirely. Skip Claude Context if you don't want to run Milvus. Yes: Greptile becomes a viable option for hosted PR review specifically, and you have the full local set as well.

Q2. Is the project commercial?

Yes (any agency, contractor, SaaS, or company that pays anyone to use the tool): you cannot use GitNexus without a separate commercial license from Akon Labs (PolyForm Noncommercial 1.0 forbids it). You cannot use the free tier of jcodemunch-mcp; their commercial tiers run $79–$1,999 depending on team size. MIT-licensed servers (Sverklo, Serena, CodeGraphContext, code-review-graph) are unrestricted. No (personal, OSS contribution, evaluation): all options are open.

Q3. What's your dominant retrieval workload?

Q4. What's the install pain you can absorb?

Q5. Do you need a published, reproducible benchmark to defend the choice to your team?

Sverklo is the entry on the matrix above with a peer-reviewable benchmark and reproducible eval harness (sverklo.com/bench) — 90 tasks, 5 baselines, raw data downloadable. The methodology repo at github.com/sverklo/sverklo-bench is open for new baseline submissions. If "we picked X because the vendor said it's best" isn't a defensible answer in your org, having a shared eval matters.

Security PSA (read this even if you skip everything else)

MCP's stdio transport spawns child processes. If the MCP client passes the server command string through a shell, an attacker who controls part of that string can execute arbitrary commands. Anthropic's stance is that this is by design — stdio is intended for local trusted use. That's defensible, but "by design" is also where real users get burned because client implementers shell out via sh -c for convenience.

Four rules:

  1. Never pass server commands through a shell. Use exec-style spawn.
  2. Treat any user-controlled config as untrusted input.
  3. Pin server binaries by absolute path or content hash.
  4. Sandbox at the OS level (containers, sandbox-exec).

Full writeup with the 30-second audit grep one-liners: MCP STDIO command-injection audit.

Glossary

MCP (Model Context Protocol)
JSON-RPC spec for tool servers that LLM clients (Claude Code, Cursor, etc.) call at runtime. Open spec; no Anthropic-only dependency.
BM25
Lexical ranking function from classical information retrieval. Fast, deterministic, strong on rare identifiers. Bad at concepts.
RRF (Reciprocal Rank Fusion)
Standard way to combine multiple ranked result lists (e.g., BM25 + embeddings) without tuning weights. Simple and surprisingly competitive.
PageRank on a call graph
Treats functions as nodes and calls as edges; central functions get higher scores. Used to rank retrieval results so load-bearing code surfaces first.
Tree-sitter
Incremental parser library. The de facto AST parser for code-intel servers because it handles incomplete / broken code gracefully.
Bi-temporal memory
Memory store that tracks both when a fact was true in the codebase (git SHA) and when the agent learned it (wall clock). Lets agent recall not drift when you git checkout back to an older state.
Blast radius
Set of all functions that could be affected (transitively) by a change to a given function. Computed from the call graph.
P1, P2, P4, P5
Sverklo bench category labels: P1 = symbol definition lookup, P2 = reference finding, P4 = file dependencies, P5 = dead code detection. Used throughout the comparison matrix.
Stdio transport
MCP's local transport: client spawns server as a subprocess and talks over stdin/stdout. Default for local-first servers. See security PSA above.

Where Sverklo loses (honest section)

This guide is published on the sverklo blog, so the right thing to do is name the cases where sverklo is not the right choice. From the public bench and from real usage:

The matrix and decision tree above try to be honest about all of this. The case for sverklo isn't "it wins every category" — it doesn't. The case is: if you need local-first + commercial-friendly + multi-tool surface + persistent memory + an actually-published benchmark, no other server in the matrix offers all five.

What to try first

If you've read this far and want to actually run one of these on a real repo today:

npm install -g sverklo
cd your-project
sverklo init
sverklo receipt   # see what your agent has been spending tokens on

sverklo init auto-detects your AI coding agent (Claude Code, Cursor, Windsurf, Zed) and writes the right MCP config. sverklo receipt parses your last week of Claude Code session logs and prints a Spotify-Wrapped-style breakdown of where your tokens went — useful as a baseline before/after.

If sverklo isn't the right pick after reading the decision tree, try one of the others linked in the matrix. Honest landscape post; I'm not pretending the answer is always us.

Reproduce the bench: all numbers in this guide that come from the sverklo bench are reproducible via npm run bench in the repo. 60 tasks, 5 baselines, raw data at sverklo.com/bench. If you find a number wrong, open an issue.

Updated: May 3, 2026 · 3,200 words · MIT-licensed prose, reuse with attribution