What is the best MCP server for code intelligence?

There is no single best — different MCP servers solve different problems. For local-first commercial use with the broadest tool surface and a published benchmark, Sverklo (MIT) is the leading choice. For browser-first knowledge-graph exploration, GitNexus (PolyForm Noncommercial). For LSP-backed semantic precision, Serena (MIT). For dollar-sized token-savings on definition lookup, jcodemunch-mcp (dual-licensed). For cloud-hosted PR review, Greptile (paid SaaS).

What is the best open-source alternative to Greptile?

Sverklo is one MIT-licensed local-first alternative to Greptile, with a published 90-task benchmark and reproducible eval harness. It works with Claude Code, Cursor, Windsurf, and Zed via MCP, runs entirely on the engineer's machine, and includes risk-scored PR review. Sourcegraph Cody is another option (source-available, paid for teams).

Which MCP server gives Claude Code the most tools?

Sverklo exposes 37 MCP tools to Claude Code, the largest local-first surface in the category. The tools span hybrid search, blast-radius analysis, reference finding, file-dependency graphs, code audit, and bi-temporal git-pinned memory. By contrast, GitNexus exposes ~12, Serena ~15, and jcodemunch-mcp ~6.

How do I reduce token cost when using Claude Code on a large repo?

The dominant cost on large repos is grep noise — agents spend 40% of input tokens on tool results. Installing a local MCP server with ranked retrieval (such as Sverklo or jcodemunch-mcp) returns ~300-500 ranked tokens per query instead of 5,000-15,000 lines of grep output. Real-world reductions: 13.8x fewer input tokens than naive grep, 2.9x fewer than tuned grep on the public sverklo bench.

Best MCP Servers for Code Intelligence — Honest Comparison of 12 Options (2026)

An opinionated landscape of 12 servers, with a comparison matrix you can actually use. Honest about where Sverklo (the project that wrote this guide) loses.

May 2026 · Nikita Groshin · ~3,200 words

What MCP actually is (60 seconds)
A taxonomy: four kinds of code-intel MCP
The comparison matrix (12 servers)
Decision tree: which one for you?
Security PSA (read this even if you skip everything else)
Glossary
Where Sverklo loses (honest section)

What MCP actually is (60 seconds)

Model Context Protocol (MCP) is a JSON-RPC spec that lets a language-model client (Claude Code, Cursor, Windsurf, Cline, Zed, the Cursor SDK, the Anthropic SDK) call external tools the model author didn't ship. Tools live in servers — separate processes the client launches over stdio or talks to over HTTP. Each tool has a name, a schema, and a description; the model picks them up at conversation start and treats them like first-class functions.

The practical effect: instead of waiting for Anthropic to add a "search my repo" tool to Claude, you run a local MCP server that exposes my_repo_search and the agent uses it. The server can do anything — query a database, hit an internal API, run a tree-sitter parser, whatever.

For code intelligence specifically, MCP closes a loop that was previously broken: the agent stops being limited to its built-in grep / read / glob tools and starts seeing the codebase through your retrieval pipeline. That's the whole pitch.

A taxonomy: four kinds of code-intel MCP

The category is less unified than the marketing makes it sound. There are four meaningfully different things people call "code-intel MCP," and mixing them up causes most of the confusion in side-by-side comparisons.

1. Search servers

Index your repo (BM25 / embeddings / both) and answer "find me code about X." Examples: Claude Context (Zilliz), Local Code Search MCP, grepai, Seroost. Strength: cheap exploration. Weakness: no graph, no impact analysis, no memory.

2. Knowledge-graph / structural servers

Build a call graph or symbol graph and let the agent traverse it. Examples: GitNexus, CodeGraphContext, code-graph-mcp, codesight-mcp. Strength: structural queries ("what calls X transitively?"). Weakness: graph-only is bad at semantic recall and weak at ranking; many require Neo4j or Kuzu.

3. LSP-backed semantic servers

Wrap a Language Server Protocol implementation and expose its rename / refs / definition signals as MCP tools. Examples: Serena. Strength: precision (real renames, not regex). Weakness: requires a working LSP per language; configuration-heavy.

4. Hybrid / "all-of-the-above" intelligence servers

Combine search, structure, ranking, and memory. Examples: Sverklo (this site), jcodemunch-mcp (search + tree-sitter symbols), codebase-memory-mcp. Strength: covers more failure modes. Weakness: heavier installs and more complex to evaluate (which is why we wrote a benchmark).

None of these is wrong. They're answers to different questions. The decision tree below walks through which one matches your workload.

The comparison matrix (12 servers)

Numbers verified May 2026. Star counts and license terms change; for a value as load-bearing as the license, click through and read the actual LICENSE file before relying on this table.

Server	Category	License	Stars	Hosting	Languages	Tools	Retrieval substrate	Memory layer
Sverklo	Hybrid	MIT	growing	Local (SQLite + ONNX)	12	37	BM25 + embeddings + PageRank (RRF)	Bi-temporal, git-SHA pinned
GitNexus	Knowledge graph	PolyForm Noncommercial 1.0	~28-35K	Local (KuzuDB) + browser UI	14	~12	Cypher graph queries	None
Serena	LSP-backed	MIT	~24K	Local (LSP processes)	40+ (via LSP)	~15	LSP refs/defs/renames	None
jcodemunch-mcp	Search + symbols	Dual: free non-commercial; $79–$1,999 commercial	~1.7K	Local (Python)	10+	~6	Tree-sitter symbol index	None
Claude Context	Search	Apache 2.0	~6K	Local + external Milvus	broad	~5	BM25 + dense vectors (Milvus)	None
codebase-memory-mcp	Hybrid (light)	Open source	~1.6K	Local (single static binary)	66	~10	Symbol KG + git-aware diff	Shallow
CodeGraphContext	Knowledge graph	MIT	~3K	Local (Neo4j)	broad	~8	Neo4j graph queries	None
code-graph-mcp	Knowledge graph	MIT	smaller	Local	5+	~6	Tree-sitter symbol graph	None
Code Pathfinder	Search + AST	Apache 2.0	smaller	Local	~6	~7	AST + structural queries	None
Local Code Search MCP	Search	MIT	smaller	Local	broad	~3	Lexical + simple ranking	None
code-review-graph	Hybrid	MIT	~13.5K	Local (SQLite + FTS5)	23	28	FTS5 + RRF + optional embeddings	Lite
Greptile (not strictly MCP)	Cloud PR review	Closed / paid	n/a	Cloud (their infra)	broad	n/a	Hosted hybrid	Hosted

Greptile included for completeness — it's the named cloud incumbent in the category, although it doesn't ship as an MCP server. Listed here so the comparison answers the literal question many engineers ask: "open-source alternative to Greptile."

Decision tree: which one for you?

The right question isn't "which is best" — there is no best. It's "which trade-offs do I want to make." Walk this tree top to bottom; the first matching answer is the practical recommendation.

Q1. Can your code legally leave the machine?

No (compliance, regulated industry, customer contract): you need local-first. Skip Greptile entirely. Skip Claude Context if you don't want to run Milvus. Yes: Greptile becomes a viable option for hosted PR review specifically, and you have the full local set as well.

Q2. Is the project commercial?

Yes (any agency, contractor, SaaS, or company that pays anyone to use the tool): you cannot use GitNexus without a separate commercial license from Akon Labs (PolyForm Noncommercial 1.0 forbids it). You cannot use the free tier of jcodemunch-mcp; their commercial tiers run $79–$1,999 depending on team size. MIT-licensed servers (Sverklo, Serena, CodeGraphContext, code-review-graph) are unrestricted. No (personal, OSS contribution, evaluation): all options are open.

Q3. What's your dominant retrieval workload?

"Find every caller of X" (P2) — a well-tuned ripgrep ties most graph-based servers on this. Surprisingly, smart-grep alone is competitive; the bench numbers are below. If P2 is dominant, skip the heavy infrastructure and just configure ripgrep with sensible defaults.
"Where is X defined?" (P1) — jcodemunch-mcp is the strongest at 0.65 F1 on the public sverklo bench. Sverklo is at 0.45.
"What does this codebase do?" / 5-minute mental model — you want PageRank-ranked overviews. Sverklo's sverklo_overview and code-review-graph's wiki feature both do this.
"What breaks if I change this?" (P5 / blast radius) — call-graph-aware servers (Sverklo, GitNexus, CodeGraphContext). Don't bother with pure search servers for this question.
"What did we decide last week?" (memory) — only Sverklo (bi-temporal git-SHA pinned), code-review-graph (lite memory), and codebase-memory-mcp (shallow) have any memory layer. Most others have none.
"Risk-scored PR review" — Sverklo (sverklo review), Greptile (cloud bot). Different deployment models; same goal.

Q4. What's the install pain you can absorb?

None — must be one command. Sverklo (npm i -g sverklo && sverklo init). codebase-memory-mcp (single binary). That's basically it.
I can install Python and a few deps. Add jcodemunch-mcp, Code Pathfinder.
I can run a database (Neo4j / Kuzu / Milvus). Add CodeGraphContext, GitNexus, Claude Context.
I'll set up LSPs per language. Add Serena.

Q5. Do you need a published, reproducible benchmark to defend the choice to your team?

Sverklo is the entry on the matrix above with a peer-reviewable benchmark and reproducible eval harness (sverklo.com/bench) — 90 tasks, 5 baselines, raw data downloadable. The methodology repo at github.com/sverklo/sverklo-bench is open for new baseline submissions. If "we picked X because the vendor said it's best" isn't a defensible answer in your org, having a shared eval matters.

Security PSA (read this even if you skip everything else)

MCP's stdio transport spawns child processes. If the MCP client passes the server command string through a shell, an attacker who controls part of that string can execute arbitrary commands. Anthropic's stance is that this is by design — stdio is intended for local trusted use. That's defensible, but "by design" is also where real users get burned because client implementers shell out via sh -c for convenience.

Four rules:

Never pass server commands through a shell. Use exec-style spawn.
Treat any user-controlled config as untrusted input.
Pin server binaries by absolute path or content hash.
Sandbox at the OS level (containers, sandbox-exec).

Full writeup with the 30-second audit grep one-liners: MCP STDIO command-injection audit.

Glossary

MCP (Model Context Protocol): JSON-RPC spec for tool servers that LLM clients (Claude Code, Cursor, etc.) call at runtime. Open spec; no Anthropic-only dependency.
BM25: Lexical ranking function from classical information retrieval. Fast, deterministic, strong on rare identifiers. Bad at concepts.
RRF (Reciprocal Rank Fusion): Standard way to combine multiple ranked result lists (e.g., BM25 + embeddings) without tuning weights. Simple and surprisingly competitive.
PageRank on a call graph: Treats functions as nodes and calls as edges; central functions get higher scores. Used to rank retrieval results so load-bearing code surfaces first.
Tree-sitter: Incremental parser library. The de facto AST parser for code-intel servers because it handles incomplete / broken code gracefully.
Bi-temporal memory: Memory store that tracks both when a fact was true in the codebase (git SHA) and when the agent learned it (wall clock). Lets agent recall not drift when you git checkout back to an older state.
Blast radius: Set of all functions that could be affected (transitively) by a change to a given function. Computed from the call graph.
P1, P2, P4, P5: Sverklo bench category labels: P1 = symbol definition lookup, P2 = reference finding, P4 = file dependencies, P5 = dead code detection. Used throughout the comparison matrix.
Stdio transport: MCP's local transport: client spawns server as a subprocess and talks over stdin/stdout. Default for local-first servers. See security PSA above.

Where Sverklo loses (honest section)

This guide is published on the sverklo blog, so the right thing to do is name the cases where sverklo is not the right choice. From the public bench and from real usage:

jcodemunch-mcp wins P1 (definition lookup) at 0.65 F1 vs sverklo's 0.45. Their tree-sitter symbol indexing is sharper than ours. If you spend most of your time asking "where is X defined?", jcodemunch is the better tool.
Smart-grep ties sverklo on P2 (reference finding) at 0.50 F1 each. The semantic graph adds nothing to a textually-pure question. If P2 is dominant in your workflow, sverklo is genuinely overkill — well-configured ripgrep is fine.
Cypher graph expressiveness is something sverklo doesn't offer at all. If you want to write ad-hoc graph queries against your codebase ("show me every function in auth/ that calls something in db/"), GitNexus's Cypher interface is the right tool.
40+ language coverage via LSPs is Serena's. Sverklo parses 24 languages — 10 first-class (tree-sitter or custom: TS, JS, Python, Go, Rust, C#, Vue, Markdown, Jupyter) and 14 via regex fallback. If your codebase is a polyglot of obscure languages with semantic-correctness requirements, Serena via LSP wins on coverage.
Cold-start indexing on very large repos. First-time index of a 500K-LOC repo takes ~5 minutes on sverklo. After that it's incremental and fast, but the first run isn't free.

The matrix and decision tree above try to be honest about all of this. The case for sverklo isn't "it wins every category" — it doesn't. The case is: if you need local-first + commercial-friendly + multi-tool surface + persistent memory + an actually-published benchmark, no other server in the matrix offers all five.

What to try first

If you've read this far and want to actually run one of these on a real repo today:

npm install -g sverklo
cd your-project
sverklo init
sverklo receipt   # see what your agent has been spending tokens on

sverklo init auto-detects your AI coding agent (Claude Code, Cursor, Windsurf, Zed) and writes the right MCP config. sverklo receipt parses your last week of Claude Code session logs and prints a Spotify-Wrapped-style breakdown of where your tokens went — useful as a baseline before/after.

If sverklo isn't the right pick after reading the decision tree, try one of the others linked in the matrix. Honest landscape post; I'm not pretending the answer is always us.

Reproduce the bench: all numbers in this guide that come from the sverklo bench are reproducible via npm run bench in the repo. 60 tasks, 5 baselines, raw data at sverklo.com/bench. If you find a number wrong, open an issue.

Updated: May 3, 2026 · 3,200 words · MIT-licensed prose, reuse with attribution

Best MCP Servers for Code Intelligence — Honest Comparison of 12 Options (2026)

Contents

What MCP actually is (60 seconds)

A taxonomy: four kinds of code-intel MCP

1. Search servers

2. Knowledge-graph / structural servers

3. LSP-backed semantic servers

4. Hybrid / "all-of-the-above" intelligence servers

The comparison matrix (12 servers)

Decision tree: which one for you?

Q1. Can your code legally leave the machine?

Q2. Is the project commercial?

Q3. What's your dominant retrieval workload?

Q4. What's the install pain you can absorb?

Q5. Do you need a published, reproducible benchmark to defend the choice to your team?

Security PSA (read this even if you skip everything else)

Glossary

Where Sverklo loses (honest section)

What to try first