Engineering · Sverklo · 2026-04-29

Bi-temporal memory for AI coding agents — the 1990s database pattern that fixes "my agent forgot what we decided yesterday"

2026-04-29 ~9 min read by Nikita Groshin

Most "memory" features for AI coding agents are flat key-value stores: when you update a memory, the old value is gone. That's the wrong abstraction for a codebase, where the team's beliefs change and the question "what did we think about auth at commit abc123?" is a real one. Sverklo borrows a 30-year-old database pattern — bi-temporal memory — and pins it to git SHAs. Here's how it works, why it matters, and the SQLite schema.

The failure mode

Last month I was debugging an auth flow with Claude Code. We made a deliberate decision: JWT verification happens in the middleware, not the route handler. The reason was specific — we wanted token validation to short-circuit before the route's body parser ran, which mattered for a CSRF protection we'd added two weeks earlier.

Three days later, in a fresh session, Claude proposed moving JWT verification into the route handler. It hadn't seen the previous decision. Context was compacted. The agent had no way to retrieve "we decided X for reason Y."

The conventional fix is a CLAUDE.md file with project invariants. Works for stable rules ("never use any in TypeScript") but not for evolving design decisions. The CLAUDE.md would either:

None of these answer "what did we believe at this specific commit?" — and that's the question that comes up when you're triaging a bug introduced in some past version.

Bi-temporal databases — the 1990s pattern

This problem isn't new. Database researchers solved a closely-related version of it 30 years ago for transactional systems where you needed both "what is true now" and "what did we think was true on Date X." The textbook is Richard Snodgrass's Developing Time-Oriented Database Applications in SQL (1999) — bi-temporal modelling, still readable, still applicable.

The core idea: every row in a memory table has two time dimensions:

Updating a fact doesn't overwrite — it inserts a new row with new valid-time bounds, and sets valid_until on the previous row. Queries can ask "current truth" (where valid_until IS NULL) or "what we believed at time T" (where valid_from <= T < COALESCE(valid_until, infinity)).

The git-SHA substitution

Wall-clock time is the wrong axis for a codebase. Code changes don't happen continuously — they happen at commits. The right "valid time" axis for an AI coding agent's memory is the git commit graph, not the wall clock.

So sverklo's memory schema substitutes valid_from and valid_until with valid_from_sha and valid_until_sha:

CREATE TABLE memories (
  id              INTEGER PRIMARY KEY,
  category        TEXT NOT NULL,    -- decision, preference, pattern, ...
  content         TEXT NOT NULL,
  tier            TEXT NOT NULL,    -- core, project, archive
  kind            TEXT,             -- episodic, semantic, procedural

  -- Bi-temporal columns
  valid_from_sha  TEXT NOT NULL,    -- commit at which this memory was created
  valid_until_sha TEXT,             -- commit at which this memory was superseded; NULL = current
  superseded_by   INTEGER REFERENCES memories(id),  -- pointer to the new memory

  -- Provenance
  created_at      INTEGER NOT NULL,
  last_accessed   INTEGER NOT NULL,
  access_count    INTEGER NOT NULL DEFAULT 0,
  confidence      REAL NOT NULL DEFAULT 1.0,

  -- Retrieval
  embedding_id    INTEGER REFERENCES memory_embeddings(rowid),
  pins            TEXT,             -- JSON array of file paths / symbols
  tags            TEXT              -- JSON array
);

CREATE INDEX idx_memories_current ON memories(valid_until_sha)
  WHERE valid_until_sha IS NULL;

The partial index on valid_until_sha IS NULL makes "current truth" queries fast — most queries only want active memories.

How updates work

When the agent calls sverklo_remember and the memory already exists (matched by category + similar embedding), sverklo doesn't update the row. It:

  1. Inserts a new row with the new content, valid_from_sha = HEAD
  2. Sets valid_until_sha = HEAD on the old row
  3. Sets superseded_by on the old row to point at the new row's id

This preserves the lineage. sverklo_recall queries naturally filter to valid_until_sha IS NULL by default — but the --timeline flag opens the supersession history.

The query that justifies the design

The reason this matters is the query "what did we believe at commit abc123?" Conventional flat-overwrite memory can't answer this. Bi-temporal memory can:

-- Memories that were active at commit abc123:
-- (created at or before abc123, AND not superseded before abc123)
WITH commit_ancestry AS (
  -- precomputed via `git rev-list --ancestry-path`
  SELECT sha, depth FROM commit_graph WHERE 'abc123' IN (sha)
)
SELECT m.*
FROM memories m
WHERE m.valid_from_sha IN (SELECT sha FROM commit_ancestry WHERE depth <= 0)
  AND (m.valid_until_sha IS NULL
       OR m.valid_until_sha NOT IN (SELECT sha FROM commit_ancestry WHERE depth < 0));

It's messier in SQL than in pseudocode because git's commit graph is a DAG, not a line. The trick: precompute commit ancestry once as a closure table, store as commit_graph, then the query is straightforward set-membership.

For the auth-flow example earlier: at commit abc123 (when the JWT-in-middleware decision was made), the relevant memory says "JWT verification in middleware, reason: CSRF protection ordering." At commit def456 (after we revisited and changed our minds), the relevant memory says something different. sverklo_recall --at abc123 returns the first; sverklo_recall alone returns the second.

Why not just use git's own log?

Two reasons.

First, git's log is a record of code changes, not a record of beliefs about code. Many decisions are made without code changes ("we decided to migrate to Postgres next quarter, but we haven't started"). Git doesn't capture those. Bi-temporal memory does.

Second, git commit messages are unstructured prose. Searching them for "what did we decide about X?" is a needle-in-haystack problem that grep handles badly (which is what got us into this mess in the first place — see /bench/). Sverklo's memory layer is structured: each memory has a category, a confidence, a tier, embeddings for semantic search, and the lineage. Retrieval is hybrid (FTS5 + cosine over an ONNX embedding) and pinned to the git SHA where the question is being asked.

Lineage matters more than current truth

The pattern that's underrated in conventional memory systems: knowing why a previous decision was overturned is often more useful than knowing the current decision.

Concrete example: an engineer joins the team six months from now. They look at the auth middleware and see JWT verification in the route handler. Why? They run sverklo_recall --timeline auth and see:

2026-03-15  abc123  "JWT in middleware — CSRF ordering"
2026-04-22  def456  "JWT in route handler — middleware ran for SSE endpoints unnecessarily,
                     causing 12% latency overhead. CSRF ordering moved to a separate guard."

The current decision (JWT in route handler) makes sense only when paired with the previous decision and the reason for the change. Flat-overwrite memory deletes the why.

Constraints and tradeoffs

The DAG problem

Git is a DAG, not a chain. Branches diverge, merges create commits with multiple parents, cherry-picks copy commits across branches. "What did we believe at commit X?" requires walking the ancestry of X, which is multiple ancestor paths if there were merges.

Sverklo handles this by precomputing the ancestry closure for any commit you query — typically a small operation (<10ms even on repos with 100k commits) because git itself stores the ancestry compactly. The query plan is O(1) lookup against the precomputed table, not O(commits) per query.

Storage cost

Bi-temporal storage is more expensive than flat storage by a factor of ~2-5×, depending on how often memories supersede. In practice, sverklo's per-project memory database is a few megabytes for active projects — the storage cost is irrelevant compared to the query power.

Confidence decay

Memories don't get equally trustworthy as they age. A 6-month-old "we decided X" memory should be flagged for review when retrieved. Sverklo applies a decay function based on access patterns and commit-distance from HEAD; staleness shows up in the recall output as [STALE] for memories that haven't been accessed in 90+ days.

What this enables that you couldn't do before

Borrowed from databases, applied to LLMs

Bi-temporal modelling isn't novel. The novelty is the application surface: AI coding agents have an ephemeral context window that compacts every few hours, and they need a memory layer that survives compaction and tracks belief change.

Most existing memory MCPs (the ones I've benchmarked, see comparison) are wrappers around an external vector DB. They treat memory as a flat searchable bag. The bi-temporal pattern adds a second axis (time), and pinning that axis to git SHAs (rather than wall-clock time) aligns with how engineering teams actually think about decisions.

It's an old idea applied to a new problem. The implementation lives in src/storage/memory-store.ts and src/memory/prune.ts — about 800 lines of TypeScript over SQLite.

Try it

Sverklo is MIT-licensed and runs on your laptop. Memory is local — no cloud, no external services, no API keys.

npm install -g sverklo
cd your-project
sverklo init

Memory tools available immediately: sverklo_remember, sverklo_recall, sverklo_memories, sverklo_pin, sverklo_promote, sverklo_demote. The agent calls them; you don't have to.

GitHub: sverklo/sverklo · 60-task retrieval benchmark · Comparisons

Cite this

@misc{sverklo_bench_primitives_2026,
  title  = {Sverklo bench:primitives — a 60-task retrieval evaluation for AI coding agents},
  author = {Groshin, Nikita},
  year   = {2026},
  doi    = {10.5281/zenodo.19802051},
  url    = {https://sverklo.com/bench/}
}

References

See also