A community-driven registry for the Claude Code ecosystem. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Persistent memory for Claude Code & Codex CLI. Auto-extracted knowledge graph, multi-representation embeddings, 3D WebGL
The only memory layer that learns how you work — not just what you said. Persistent, local memory for AI coding agents: Claude Code, Codex CLI, Cursor, any MCP client. Temporal knowledge graph · procedural memory · AST codebase ingest · cross-project analogy · 3D WebGL visualization.
Why this, not mem0 / Letta / Zep / Supermemory / Cognee? → docs/vs-competitors.md
Bugfix release. Four v11 W3 MCP tools — memory_recall_iterative,
memory_temporal_query, memory_entity_resolve, memory_consolidate_status —
were silently broken on main: the dispatcher forwarded an out-of-scope
args symbol, the resulting NameError was swallowed by call_tool's
exception handler, and clients saw "Error: name 'args' is not defined".
Fix passes the per-call args, with regression coverage via
tests/test_v11_dispatch_args.py.
Also aligns the Codex installer env with the .tam memory layout
(TAM_MEMORY_DIR canonical, CLAUDE_MEMORY_DIR kept as compatibility alias,
MEMORY_MODE=fast default) and isolates install tests from real
launchctl / systemctl / XDG directories. Full notes in
CHANGELOG.md.
Claude Code v2.1.139+ emits subagent IDs on every API request
(x-claude-code-agent-id / x-claude-code-parent-agent-id HTTP headers,
plus the same fields as agent_id / parent_agent_id attributes on the
claude_code.tool and claude_code.llm_request OTEL spans). v12.1 wires
these through end-to-end:
028_agent_lineage.sql) — nullable agent_id and
parent_agent_id columns on knowledge, partial indexes (WHERE … IS NOT NULL)
so lineage filters are free.memory_save and memory_save_fast accept two new
optional inputs: agent_id and parent_agent_id. Old callers see no
behaviour change.extract_transcript.py — reads agent_id / agentId /
parent_agent_id / parentAgentId from .jsonl when Claude Code writes
them, and falls back to isSidechain=true as a proxy: sessions with any
sidechain activity get agent_id = "session-<id>" plus a
has-subagent-work tag on their auto-extracted rows.spawned_by — when memory_save carries both ids, the
store auto-records TemporalKG.add_fact(agent, "spawned_by", parent, source="agent-lineage", invalidate_previous=False). Idempotent.
kg_at(timestamp) and kg_timeline() can now reconstruct the subagent
lineage tree at any past moment.A reconnect of the MCP memory server is required for clients to see the
updated inputSchema. Full notes in CHANGELOG.md.
total-agent-memory (2026-05-16)The project was renamed from claude-total-memory to total-agent-memory to
reflect that it works with every MCP client, not just Claude Code (Cursor,
Codex CLI, Cline, Continue, Aider, Windsurf, Gemini CLI, OpenCode — all
covered).
Nothing breaks. The old PyPI package (claude-total-memory==11.3.0) is now a
deprecation shim that auto-resolves to total-agent-memory>=12.0.0. Legacy
imports, CLI binaries, env vars, and the ~/.claude-memory/ directory keep
working through automatic migration:
| Old | New | Backward-compat |
|---|---|---|
pip install claude-total-memory | pip install total-agent-memory | old name still works (shim + warning) |
from claude_total_memory import … | from total_agent_memory import … | old import still works (sys.modules alias + warning) |
claude-total-memory CLI | total-agent-memory (alias tam) | old CLI still ships in v12 wheel |
CLAUDE_MEMORY_DIR env | TAM_MEMORY_DIR env | old env still respected (deprecation warning) |
~/.claude-memory/ dir | ~/.tam/ dir | auto-migrated on first run; ~/.claude-memory becomes a symlink to ~/.tam/ so pinned scripts keep working |
Six install paths — pick one:
npx -y total-agent-memory connect claude-code # Node, zero-install
uvx total-agent-memory # Python via uv (fast)
pipx install total-agent-memory # Python via pipx (isolated)
brew install vbcherepanov/tap/total-memory # Homebrew (macOS / Linuxbrew)
docker run -p 37737:37737 -v ~/.tam:/data \
ghcr.io/vbcherepanov/total-agent-memory:12.2.0 # Docker (multi-arch amd64+arm64)
git clone https://github.com/vbcherepanov/total-agent-memory \
~/total-agent-memory && cd ~/total-agent-memory && ./install.sh # manual
The npx path also wires the MCP entry into the IDE you pass to connect <ide>:
claude-code, codex, cursor, cline, continue, aider, windsurf,
gemini-cli, opencode.
Project URLs: totalmemory.dev · PyPI · npm · Docker GHCR · GitHub Release
Full migration notes (Docker volume names kept for backward-compat, brew formula
changes, etc.) live in CHANGELOG.md. The historical sections
below (v11.1, v11.0, …) are preserved for reference.
Two client-reported bugs fixed (2026-05-14):
Bug #1 — orphan + duplicate graph_nodes. The graph accumulated
case-variant duplicates (Vue / vue / VUE) and type-collision
duplicates (vue/concept vs vue/technology created by different
extractors), plus orphan nodes when an edge insert failed after both
nodes were already committed. Fixed by migration 026_graph_nodes_dedup
(name_norm column, triggers, indexes), a case-insensitive UPSERT
rewrite of add_node with type-collision detection, a new atomic
GraphStore.link_pair() helper, and a one-shot cleanup tool
src/tools/merge_duplicate_nodes.py (dry-run by default).
# After upgrade migration 026 applies automatically. Then optionally:
.venv/bin/python src/tools/merge_duplicate_nodes.py --dry-run
.venv/bin/python src/tools/merge_duplicate_nodes.py --apply --add-unique
Verified on a real production DB (8304 nodes): 102 duplicates merged, 1472 stale edges cleaned, UNIQUE constraint installed.
Bug #2 — model never calls memory_save on its own. Sonnet/Haiku
skip the priority-10 save rule when SessionStart context fades. v11.1
adds in-session nudges: a counter in ~/.claude-memory/state/
tracks writes-vs-saves per session, and hooks/post-tool-use.{sh,ps1}
emits a stdout line that Claude reads as system context on the next
turn. Soft nudge at 3 edits with 0 saves, hard at 7, and a
MEMORY_FINAL_WARNING on session stop. A new priority-10 rule
instructs the model to treat MEMORY_NUDGE as an immediate command.
Tunables: MEMORY_NUDGE_DISABLE=1 to silence; MEMORY_NUDGE_SOFT /
_HARD / _STEP to retune (defaults 3 / 7 / 3).
Test coverage: +24 graph tests, +12 nudge tests. Full details in
CHANGELOG.md.
v11.0 = production memory engine: fast deterministic memory core + async AI enrichment layer. Default mode is fast: zero LLM, zero Ollama, zero network in the save/search/recall hot path.
The codebase is now split into two layers:
src/memory_core/* — deterministic facade modules (storage, embeddings, vector_store, classifier, chunker, dedup, cache, graph_links, telemetry, health, embedding_spaces). No LLM imports allowed. Enforced by tests/test_no_llm_hot_path.py.src/ai_layer/* — every LLM-touching path (enrichment_worker, summarizer, keyword_extractor, question_generator, relation_extractor, contradiction_detector, reflection, self_improve, plus thin shims for quality_gate / coref_resolver / reranker / query_rewriter). Off-limits to memory_core.Architecture details and full hot-path audit: docs/v11/audit.md.
MEMORY_MODE selects the runtime profile. Default is fast.
| Mode | Hot-path LLM | Async enrichment | Reranker | Embed fallback | Use when |
|---|---|---|---|---|---|
ultrafast | off | off | off | FastEmbed only (vector index off, FTS-only) | Throughput stress / CI |
fast (default) | off | off | off | FastEmbed only, Ollama fallback gated | Production coding-agent loop |
balanced | off (sync) | on | off | FastEmbed only | You want LLM-derived facets, but never on the critical path |
deep | on (sync) | on | on (when rerank=true) | FastEmbed → Ollama ladder | v10.5 behaviour: quality gate / contradiction / coref / HyDE inline |
deep mode reproduces v10.5.0 defaults exactly. Set MEMORY_MODE=deep if you depended on synchronous quality_gate, contradiction_detector, or coref. balanced keeps the same ergonomics but moves enrichment off-thread.
Migration from v10.5: docs/v11/MIGRATION-FROM-V10.md.
Warm, in-memory SQLite, MacBook M-series, MEMORY_MODE=fast, MEMORY_ALLOW_OLLAMA_IN_HOT_PATH=false:
| metric | p50 | p95 | p99 |
|---|---|---|---|
save_fast | 6.5 | 9.0 | 27.8 |
save_fast cached | 0.3 | 0.4 | 1.1 |
search_fast | 3.7 | 4.0 | 6.2 |
cached_search | 0.0 | 0.0 | 0.0 |
llm_calls = 0, network_calls = 0 across the entire hot path. Reproduce: bin/memory-bench. CI gate: bin/memory-perf-gate. Raw artifact: docs/v11/benchmark.md.
The v10.5 native bench (benchmarks/v10_5_latency.py) re-run on v11 fast against the recorded v10.5 baseline (benchmarks/results/v10_5_latency.json):
| metric | v10.5 sync (with LLM) | v11.0 fast | speedup |
|---|---|---|---|
| save p95 | 2150.51 ms | 8.51 ms | 252× |
| save p99 | 2178.98 ms | 11.09 ms | 196× |
| recall p95 | 1424.26 ms | 5.81 ms | 245× |
| recall p99 | 1771.70 ms | 6.75 ms | 262× |
| LLM calls / save | 2-4 | 0 | gate |
| Network calls / save | 1-3 | 0 | gate |
Even versus v10.5 without LLM (23.3 ms p95), v11 fast is 2.7× faster — the deterministic-only stages (quality_gate probe, contradiction candidate fetch, episodic event creation, project_wiki refresh) are now fully bypassed in fast mode and queued only when MEMORY_ENRICHMENT_ENABLED=true.
Recall quality is preserved: LongMemEval R@5 = 100% on a 30-question sample; hybrid retrieval (FTS5 + dense + RRF + base graph) is identical to v10.5 except for HyDE / analyze_query LLM expansion which is opt-in via MEMORY_MODE=deep. See docs/v11/benchmark.md for the full table including LoCoMo and per-space embedding load characteristics.
memory_save_fast · memory_search_fast · memory_explain_search · memory_warmup · memory_perf_report · memory_rebuild_fts · memory_rebuild_embeddings · memory_eval_locomo · memory_eval_recall · memory_eval_temporal · memory_eval_entity_consistency · memory_eval_contradictions · memory_eval_long_context
All previous tool names (memory_save, memory_recall, ...) continue to work unchanged.
Every vector row now records embedding_provider / embedding_model / embedding_dimension / embedding_space / content_type / language. Spaces: text / code / log / config. Single Chroma backend; per-space model swap is one env flip:
MEMORY_TEXT_EMBED_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
MEMORY_CODE_EMBED_MODEL=jinaai/jina-embeddings-v2-base-code # optional
MEMORY_LOG_EMBED_MODEL= # falls back to TEXT
MEMORY_CONFIG_EMBED_MODEL= # falls back to TEXT
Old chunks stay searchable in their space; new chunks pick up the swapped model. Backfill one space at a time via memory_rebuild_embeddings.
v10.x sections below are preserved as legacy v10.5 behaviour — still available via
MEMORY_MODE=deep. The numbers, screenshots, and benchmark blocks dated 2026-04-19 / 2026-04-25 / 2026-04-27 (v10) describe the deep-mode pipeline. v11 replaces defaults, not capabilities.
lookup-memory for sub-agentsAI coding agents have amnesia. Every new Claude Code / Codex / Cursor session starts from zero. Yesterday's architectural decisions, bug fixes, stack choices, and hard-won lessons vanish the moment you close the terminal. You re-explain the same things, re-discover the same solutions, paste the same context into every new chat.
total-agent-memory gives the agent a persistent brain — on your machine, not in someone else's cloud.
Every decision, solution, error, fact, file change, and session summary is:
memory_save or implicitly via hooks on file edits / bash errors / session endYou: "remember we picked pgvector over ChromaDB because of multi-tenant RLS"
Claude: ✓ memory_save(type=decision, content="Chose pgvector over ChromaDB",
context="WHY: single Postgres, per-tenant RLS")
[3 days later, different session, possibly different project directory:]
You: "why did we pick pgvector again?"
Claude: ✓ memory_recall(query="vector database choice")
→ "Chose pgvector over ChromaDB for multi-tenant RLS. Single DB
instance, row-level security per tenant."
It's not just retrieval. It's procedural too:
You: "migrate auth middleware to JWT-only session tokens"
Claude: ✓ workflow_predict(task_description="migrate auth middleware...")
→ confidence 0.82, predicted steps:
1. read src/auth/middleware.go + tests
2. update session fixtures in tests/
3. run migration 0042
4. regenerate OpenAPI spec
similar past: wf#118 (success), wf#93 (success)
Public LongMemEval benchmark (xiaowu0162/longmemeval-cleaned, 470 questions, the dataset everyone publishes against):
R@5 (recall_any) on public LongMemEval
─────────────────────────────────────────
100% ─┤
│
96.2% ┤ ████ ← total-agent-memory v7.0 (LOCAL, 38.8 ms, MIT)
95.0% ┤ ████ ← Mastra "Observational" (cloud)
│ ████
│ ████
85.4% ┤ ████ ← Supermemory (cloud, $0.01/1k tok)
│ ████
│ ████
│ ████
80% ┤ ████
└──────────────────────────────────────────
Reproducible: evals/longmemeval-2026-04-17.json · Runner: benchmarks/longmemeval_bench.py
| Question type | Count | Our R@5 |
|---|---|---|
| knowledge-update | 72 | 100.0% |
| single-session-user | 64 | 100.0% |
| multi-session | 121 | 96.7% |
| single-session-assistant | 56 | 96.4% |
| temporal-reasoning | 127 | 95.3% ← bi-temporal KG pays off |
| single-session-preference | 30 | 80.0% ← weakest spot |
| TOTAL | 470 | 96.2% |
Public LoCoMo benchmark (snap-research/locomo, 1986 QA across 10 long-running conversations, the dataset Mem0 / Memobase / Zep / MemMachine publish against):
LoCoMo Acc (overall, no adversarial)
─────────────────────────────────────
85% ─┤ ████ ← MemMachine (commercial)
│ ████
80% ┤ ████
│ ████
75% ┤ ████ ← Memobase
│ ████ ← Zep / Graphiti
│ ████
70% ┤ ████
│ ████
67% ┤ ████ ← Mem0
│ ████
│ ████ ← total-agent-memory v9.0 (LOCAL, MIT, gpt-4o-mini)
60% ┤ ████
59% ┤ ████ ← total-agent-memory (0.596)
│ ████ ← LangMem (0.581)
55% ┤ ████
└──────────────────────────────────────────
| Rank | System | Overall (no adv) | License |
|---|---|---|---|
| 1 | MemMachine | 0.849 | Commercial |
| 2 | Memobase | 0.758 | Apache-2.0 |
| 3 | Zep / Graphiti | 0.751 | Apache-2.0 |
| 4 | Mem0 | 0.669 | Apache-2.0 |
| 5 | total-agent-memory v9.0 | 0.596 | MIT |
| 6 | LangMem | 0.581 | MIT |
Per-category breakdown (v9.0, gpt-4o-mini gen + judge):
| Category | N | Acc | R@5 |
|---|---|---|---|
| 1 — single-hop | 282 | 0.443 | 0.514 |
| 2 — temporal | 321 | 0.564 | 0.717 |
| 3 — multi-hop | 96 | 0.490 | 0.385 |
| 4 — open-domain | 841 | 0.661 | 0.601 |
| 5 — adversarial | 446 | 0.998 ← we lead | 0.421 |
| Overall (no adv) | 1540 | 0.596 | 0.622 |
We lead on adversarial (0.998 vs Memobase 0.90) thanks to judge-weighted ensemble + abstain logic. Top-3 leaders win on cat 1/2 via subject-aware profile retrieval — that's our v10 target.
Reproducible: benchmarks/results/v9_diag_v1_*.json · Runner: benchmarks/locomo_bench_llm.py (15 ablation flags). Cost on gpt-4o-mini: ~$5 for full 1986 QA run with ensemble=3.
p50 (warm) ▌ 0.065 ms
p95 (warm) ▌▌ 2.97 ms
LongMemEval ▌▌▌▌▌ 38.8 ms/query ← includes embedding + CrossEncoder rerank
p50 (cold) ▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌▌ 1333 ms ← first query after process start
Warm / cold reproducible from evals/results-2026-04-17.json.
We're not replacing chatbot memory — we're occupying the coding-agent + MCP + local niche.
| mem0 | Letta | Zep | Supermemory | Cognee | LangMem | total-agent-memory | |
|---|---|---|---|---|---|---|---|
| Funding / status | $24M YC | $10M seed | $12M seed | $2.6M seed | $7.5M seed | in LangChain | self-funded OSS |
| Runs 100% local | 🟡 | ✅ | 🟡 | ❌ | 🟡 | 🟡 | ✅ |
| MCP-native | via SDK | ❌ | 🟡 Graphiti | 🟡 | ❌ | ❌ | ✅ 60+ tools |
| Knowledge graph | 🔒 $249/mo | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ |
Temporal facts (kg_at) | ❌ | ❌ | ✅ | ❌ | 🟡 | ❌ | ✅ |
| Procedural memory | ❌ | ❌ | ❌ | ❌ | ❌ | 🟡 | ✅ workflow_predict |
| Cross-project analogy | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ analogize |
| Self-improving rules | ❌ | ❌ | ❌ | ❌ | 🟡 | ❌ | ✅ learn_error |
| AST codebase ingest | ❌ | ❌ | ❌ | ❌ | 🟡 | ❌ | ✅ tree-sitter 9 lang |
| Pre-edit risk warnings | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ file_context |
| 3D WebGL graph viewer | ❌ | ❌ | 🟡 | ✅ | ❌ | ❌ | ✅ |
| Price for graph features | $249/mo | free | cloud | usage | free | free | free |
Full side-by-side with pricing, latency, accuracy, "when to pick each" → docs/vs-competitors.md.
| Capability | Tool | One-liner |
|---|---|---|
| 🧠 Procedural memory | workflow_predict / workflow_track | "How did I solve this last time?" — predicts steps with confidence |
| 🔗 Cross-project analogy | analogize | "Was there something like this in another repo?" — Jaccard + Dempster-Shafer |
| ⚠️ Pre-edit risk warnings | file_context | Surfaces past errors / hot spots on the file you're about to edit |
| 🛡 Self-improving rules | learn_error + self_rules_context | Bash failures → patterns → auto-consolidated behavioral rules at N≥3 |
| 🕰 Temporal facts | kg_add_fact / kg_at | Append-only KG with valid_from/valid_to — query what was true at any point |
| 🎯 Task workflow phases | classify_task / phase_transition | Automatic L1-L4 complexity classification, state machine across van/plan/creative/build/reflect/archive |
| 🧩 Structured decisions | save_decision | Options + criteria matrix + rationale + discarded → searchable decision records with per-criterion embeddings |
| 💸 Token-efficient retrieval | memory_recall(mode="index") + memory_get | 3-layer workflow: compact IDs → timeline → batched full fetch. ~83% token saving on typical queries |
memory_save → LaunchAgent file-watch → graph edges appear ~30 s laterdesign-explore skill — drop-in Claude Code skill that walks L3-L4 tasks through options → criteria matrix → save_decision before code (see examples/skills/design-explore/SKILL.md)<private>...</private> inline redaction in any saved contentactiveContext.md Obsidian projection for human-readable session stateself_rules_context(phase="build")) — ~70% token reduction ┌─────────────────────────────────────────────────┐
│ Your AI coding agent │
│ (Claude Code · Codex CLI · Cursor · any MCP) │
└──────────────────────┬──────────────────────────┘
│ MCP (stdio or HTTP)
│ 60+ tools
┌──────────────────────▼──────────────────────────┐
│ total-agent-memory server │
│ ┌──────────────┐ ┌────────────────────┐ │
│ │ memory_save │ │ memory_recall │ │
│ │ memory_upd │ │ 6-stage pipeline: │ │
│ │ kg_add_fact │ │ BM25 (FTS5) │ │
│ │ learn_error │ │ + dense (FastEmbed)│ │
│ │ file_context │ │ + fuzzy │ │
│ │ workflow_* │ │ + graph expansion │ │
│ │ analogize │ │ + CrossEncoder † │ │
│ │ ingest_code │ │ + MMR diversity † │ │
│ └──────┬───────┘ │ → RRF fusion │ │
│ │ └──────────┬──────────┘ │
└───────────┼─────────────────────┼────────────────┘
│ │
┌───────────▼─────────────────────▼────────────────┐
│ Storage │
│ ┌────────────┐ ┌────────────┐ ┌─────────────┐ │
│ │ SQLite │ │ FastEmbed │ │ Ollama │ │
│ │ + FTS5 │ │ HNSW │ │ (optional) │ │
│ │ + KG tbls │ │ binary-q │ │ qwen2.5-7b │ │
│ └────────────┘ └────────────┘ └─────────────┘ │
└───────────────────────────────────────────────────┘
│
│ file-watch + debounce
┌───────────▼────────────────────────────────────┐
│ Auto-reflection pipeline (LaunchAgent) │
│ triple_extraction → deep_enrichment → reprs │
│ (async, 10s debounce, drains in background) │
└─────────────────────────────────────────────────┘
│
┌───────────▼─────────────────────────────────────┐
│ Dashboard (localhost:37737) │
│ / - stats, savings, queue depths │
│ /graph/live - 3D WebGL force-graph │
│ /graph/hive - D3 hive plot │
│ /graph/matrix - adjacency matrix │
└─────────────────────────────────────────────────┘
† CrossEncoder + MMR are on-demand via `rerank=true` / `diverse=true`
| Channel | Command | What it does |
|---|---|---|
| npx (Node) | npx -y total-agent-memory connect claude-code | Zero-install. Bootstraps a Python venv in ~/.tam/.venv via uv (or python3 fallback), pulls the PyPI server, wires the MCP entry into your IDE. Replace claude-code with codex / cursor / cline / continue / aider / windsurf / gemini-cli / opencode. |
| uvx (Python via uv) | uvx total-agent-memory | One-off run with no install. Best for trying without commitment. |
| pipx (Python isolated) | pipx install total-agent-memory | Installs the total-agent-memory, tam, tam-lookup, lookup-memory binaries on PATH in an isolated venv. |
| brew (macOS / Linuxbrew) | brew install vbcherepanov/tap/total-memory | Bottle-style install with tam and legacy claude-total-memory symlinks. |
| Docker (multi-arch) | docker run -p 37737:37737 -v ~/.tam:/data ghcr.io/vbcherepanov/total-agent-memory:12.2.0 | Containerized (linux/amd64 + linux/arm64). Dashboard on :37737. |
| Manual clone | git clone https://github.com/vbcherepanov/total-agent-memory ~/total-agent-memory && cd ~/total-agent-memory && ./install.sh --ide claude-code | Full control. Lets you hack on the server, run benchmarks, and pick which background services to enable. Detailed walkthrough below. |
All six channels land at the same MCP server. The npx and ./install.sh paths
additionally configure IDE-specific MCP entries and hooks. Other channels start
the server bare — you wire the IDE afterwards (see docs/installation.md).
Upgrade from v11.x? Whatever channel you pick will auto-migrate
~/.claude-memory/ → ~/.tam/ on first run and keep a symlink for backward
compat. No manual data move required.
Two manual paths. Same 60+ tools, same dashboard, different deployment shapes.
The same MCP server, same tools, same protocol — different installation
locations and hook wiring per IDE. The installer (install.sh --ide <name>)
automates all of it.
| IDE | Skill API | Hook API | Sub-agents | Install command |
|---|---|---|---|---|
| Claude Code | ✅ | ✅ full | ✅ | ./install.sh --ide claude-code |
| Codex CLI | ✅ | ✅ | ❌ | ./install.sh --ide codex |
| Cursor | rules-pane | ❌ | composer | ./install.sh --ide cursor |
| Cline (VS Code) | .clinerules/ | ❌ | ❌ | ./install.sh --ide cline |
| Continue | rules file | ❌ | ❌ | ./install.sh --ide continue |
| Aider | .aider.conf.yml read | ❌ ¹ | ❌ | ./install.sh --ide aider |
| Windsurf | .windsurfrules | ❌ | cascade | ./install.sh --ide windsurf |
| Gemini CLI | .gemini/rules/ | ⚠️ partial | ❌ | ./install.sh --ide gemini-cli |
| OpenCode | .opencode/skills/ | ✅ | custom | ./install.sh --ide opencode |
¹ Aider has no MCP yet — the bridge is via lookup_memory.sh /
save_memory.sh shell scripts.
Full per-IDE setup, manual fallbacks, and template snippets:
skills/memory-protocol/references/ide-setup.md.
| OS | Command | Background services |
|---|---|---|
| macOS 10.15+ | ./install.sh --ide claude-code | LaunchAgents (launchctl) |
| Linux (Ubuntu 22.04+, Debian 12+, Fedora 38+) | ./install.sh --ide claude-code | systemd --user |
| WSL2 (Windows 11 + Ubuntu/Debian) | ./install.sh --ide claude-code | systemd --user — requires /etc/wsl.conf with [boot] systemd=true; otherwise falls back to shell-loop autostart |
| Windows 10/11 native | .\install.ps1 -Ide claude-code | Task Scheduler |
Full per-platform walkthrough, WSL2 Windows-host-vs-WSL IDE nuances, the
wsl -e MCP-command pattern, IDE coverage matrix, and uninstall/diagnostic
flows: docs/installation.md.
git clone https://github.com/vbcherepanov/total-agent-memory.git ~/total-agent-memory
cd ~/total-agent-memory
bash install.sh --ide claude-code # or: cursor | gemini-cli | opencode | codex
The installer:
~/total-agent-memory/.venv/requirements.txt and requirements-dev.txtclaude mcp add-json memory ... (stored in ~/.claude.json, the canonical store Claude Code actually reads)session-*, user-prompt-submit.sh, post-tool-use.sh, pre-edit.sh, on-bash-error.sh, etc.) into ~/.claude/hooks/ and registers them in ~/.claude/settings.jsonpermissions.allow for 20+ mcp__memory__* tools so hook-driven calls don't prompt for confirmationreflection, orphan-backfill, check-updates, dashboard) under ~/Library/LaunchAgents/--user units (*.service, *.timer, *.path) under ~/.config/systemd/user/; gracefully degrades if systemd --user is unavailable (WSL without /etc/wsl.conf)memory.dbhttp://127.0.0.1:37737Restart Claude Code → /mcp → memory should show Connected with 60+ tools.
git clone https://github.com/vbcherepanov/total-agent-memory.git $HOME\total-agent-memory
cd $HOME\total-agent-memory
powershell -ExecutionPolicy Bypass -File install.ps1 -Ide claude-code
Same 9 steps as Unix, but:
%USERPROFILE%\.claude\settings.json (or .cursor\mcp.json, etc.)%USERPROFILE%\.claude\hooks\ — .ps1 versions (auto-capture, memory-trigger, user-prompt-submit, post-tool-use, pre-edit, on-bash-error, session-start/end, on-stop, codex-notify)total-agent-memory-reflection — every 5 min (no native FileSystemWatcher equivalent)total-agent-memory-orphan-backfill — daily 00:00 + 6h repetitiontotal-agent-memory-check-updates — weekly Mon 09:00TotalAgentMemoryDashboard — AtLogonAll installers preserve ~/.tam/memory.db (legacy installs: ~/.claude-memory/memory.db) and your config files; only services + hook registrations are removed.
./install.sh --uninstall # macOS/Linux/WSL2 — removes LaunchAgents OR systemd units
.\install.ps1 -Uninstall # Windows — unregisters Scheduled Tasks + cleans settings.json
One-shot health check — prints ✓/✗ for each subsystem (OS detect, venv, MCP import, services, dashboard HTTP, Ollama, DB migrations):
bash scripts/diagnose.sh # macOS / Linux / WSL2
.\scripts\diagnose.ps1 # Windows
Exit code 0 = all green, 1 = something broken.
git clone https://github.com/vbcherepanov/total-agent-memory.git
cd total-agent-memory
bash install-docker.sh --with-compose
Brings up 5 services:
| Service | Role | Exposed |
|---|---|---|
mcp | MCP server (HTTP transport) | 127.0.0.1:3737/mcp |
dashboard | Web UI | 127.0.0.1:37737 |
ollama | Local LLM runtime | 127.0.0.1:11434 |
reflection | File-watch queue drainer | internal |
scheduler | Ofelia cron (backfill + update check) | internal |
First run pulls qwen2.5-coder:7b (~4.7 GB) + nomic-embed-text (~275 MB) — 5–10 min cold start.
GPU note: Docker Desktop on macOS doesn't forward Metal. Native install is faster on Mac. On Linux with NVIDIA Container Toolkit, uncomment the deploy.resources.reservations.devices block in docker-compose.yml.
memory_save(content="install works", type="fact")
memory_stats()
Open http://127.0.0.1:37737/ — dashboard, knowledge graph, token savings.
v11 default is
MEMORY_MODE=fast. No LLM, no Ollama, no network in the save/search/recall hot path. To restore v10.5 synchronous-LLM behaviour setexport MEMORY_MODE=deep. Mode switching:LAUNCH.md§ Tuning.
Once installed, in any Claude Code / Codex CLI / Cursor session:
1. Resume where you left off (auto on session start, but you can also invoke)
session_init(project="my-api")
→ {summary: "yesterday: migrated auth middleware to JWT",
next_steps: ["update OpenAPI spec", "notify frontend team"],
pitfalls: ["don't revert migration 0042 — dev DB already migrated"]}
2. Save a decision (agent does this automatically after hooks are registered)
memory_save(
type="decision",
content="Chose pgvector over ChromaDB for multi-tenant RLS",
context="WHY: single Postgres instance, per-tenant row-level security",
project="my-api",
tags=["database", "multi-tenant"],
)
3. Recall across sessions / projects
memory_recall(query="vector database choice", project="my-api", limit=5)
→ RRF-fused results from 6 retrieval tiers
4. Predict approach before starting a task
workflow_predict(task_description="migrate auth middleware to JWT-only")
→ {confidence: 0.82, predicted_steps: [...], similar_past: [...]}
5. Check a file's risk before editing (auto via hook, also manual)
file_context(path="/Users/me/my-api/src/auth/middleware.go")
→ {risk_score: 0.71, warnings: ["last 3 edits caused test failures in ..."], hot_spots: [...]}
6. Get full stats
memory_stats()
→ {sessions: 515, knowledge: {active: 1859, ...}, storage_mb: 119.5, ...}
lookup-memory for sub-agentsNew in v9. Bash-friendly memory search for sub-agent workflows where launching the full MCP server would be overkill (e.g. Bash(lookup-memory "fix slow Wave query") from inside a Claude Code agent prompt).
Two equivalent commands ship with the package (registered as [project.scripts] entries — installed automatically by ./install.sh or ./update.sh):
lookup-memory "Caroline researched" # human-readable bullets
tam-lookup "Caroline researched" # short canonical alias
ctm-lookup "Caroline researched" # legacy alias (v11.x and earlier)
lookup-memory --project myproj --limit 5 "auth flow"
lookup-memory --type solution --tag reusable "fix bug"
lookup-memory --json "claude code hooks" # structured stdout for piping
How it works: opens the same $TAM_MEMORY_DIR/memory.db (legacy: $CLAUDE_MEMORY_DIR/memory.db) the running MCP server uses → BM25 ranking via FTS5 → falls back to LIKE on older DBs. Zero deps beyond the package. No Ollama, no rag_chat.py, no ChromaDB required for the CLI path. Works on macOS, Linux, Windows.
$ lookup-memory --project locomo_0 --limit 2 "adoption"
1. [synthesized_fact|locomo_0] Caroline is researching adoption agencies.
2. [synthesized_fact|locomo_0] Melanie congratulates Caroline on her adoption.
Why three names? lookup-memory matches the legacy bash script that older docs and sub-agent prompts reference (~/claude-memory-server/ollama/lookup_memory.sh, legacy install path). tam-lookup is the new project-prefixed canonical form (v12+). ctm-lookup is the v11.x prefixed name, kept as a legacy alias. All three call into total_agent_memory.lookup:main (v11.x and earlier: claude_total_memory.lookup:main, still importable via deprecation shim).
Migration note: v7/v8 docs that pointed at ~/claude-memory-server/ollama/lookup_memory.sh should be updated — the bash version still works for users with a manual install, but ./install.sh / ./update.sh clients on v9+ now get lookup-memory (and tam-lookup) on PATH directly via the package's [project.scripts] entry.
Core retrieval (9): memory_save, memory_recall, memory_get, memory_update, memory_delete, memory_history, memory_extract_session, memory_relate, memory_search_by_tag
Knowledge graph (8): kg_add_fact, kg_invalidate_fact, kg_at, kg_timeline, memory_graph, memory_graph_index, memory_graph_stats, memory_concepts
Episodic / session (6): memory_episode_save, memory_episode_recall, session_init, session_end, memory_timeline, memory_history
Procedural / workflows (4): workflow_learn, workflow_predict, workflow_track, classify_task
Task phases (4, v8.0): task_create, phase_transition, task_phases_list, complete_task
Decisions (1, v8.0): save_decision
Intents (3, v8.0): save_intent, list_intents, search_intents
Self-improvement (5): self_rules, self_rules_context, self_insight, self_patterns, self_error_log, rule_set_phase (v8.0)
Pre-edit guard / error learning (3): file_context, learn_error, self_error_log
Analogy / cross-project (2): analogize, ingest_codebase
Reflection / consolidation (4): memory_reflect_now, memory_consolidate, memory_forget, memory_observe
Stats / export (5): memory_stats, memory_export, memory_self_assess, memory_context_build, benchmark
Skills (3): memory_skill_get, memory_skill_update, file_context
Total: 60+ tools. Each is documented below with input schema and example.
When you only know the topic but not which records matter, use progressive disclosure:
memory_recall(query="auth refactor", mode="index", limit=20) → ~2 KB of {id, title, score, type, project, created_at} per hit. No content, no cognitive expansion.memory_recall(query="auth refactor", mode="timeline", limit=5, neighbors=2) → top-K hits padded with ±neighbours from the same session, sorted chronologically.memory_get(ids=[3622, 3606]) → full content for ONLY the IDs you chose (max 50 per call, detail="summary" truncates to 150 chars).Typical saving: 80-90 %% fewer tokens vs memory_recall(detail="full", limit=20) when you end up using 2-3 of the 20 hits.
memory_recall · memory_get · memory_save · memory_update · memory_delete · memory_search_by_tag · memory_history · memory_timeline · memory_stats · memory_consolidate · memory_export · memory_forget · memory_relate · memory_extract_session · memory_observe
memory_graph · memory_graph_index · memory_graph_stats · memory_concepts · memory_associate · memory_context_build
memory_episode_save · memory_episode_recall · memory_skill_get · memory_skill_update
memory_reflect_now · memory_self_assess · self_error_log · self_insight · self_patterns · self_reflect · self_rules · self_rules_context
kg_add_fact · kg_invalidate_fact · kg_at · kg_timeline
workflow_learn · workflow_predict · workflow_track
file_context (pre-edit risk scoring) · learn_error (auto-consolidating error capture) · session_init / session_end · ingest_codebase (AST, 9 languages) · analogize (cross-project analogy) · benchmark (regression gate)
Full JSON schemas: python -m total_agent_memory.cli tools --json or open the dashboard at localhost:37737/tools.
For Node.js / browser / any TS project that isn't an MCP-native agent:
npm i @vbch/total-agent-memory-client
import { connectStdio } from "@vbch/total-agent-memory-client";
const memory = await connectStdio();
await memory.save({
type: "decision",
content: "Picked pgvector over ChromaDB for multi-tenant RLS",
project: "my-api",
});
const hits = await memory.recallFlat({
query: "vector database choice",
project: "my-api",
limit: 5,
});
Also ships LangChain adapter example, procedural-memory integration, and HTTP transport (for team / serverless setups).
Package repo: github.com/vbcherepanov/total-agent-memory-client
/ — live stats, queue depths, token savings from filters, representation coverage/graph/live — 3D WebGL force-graph (Three.js), 3,500+ nodes / 120,000+ edges, click-to-focus, type filters, search/graph/hive — D3 hive plot, nodes on radial axes by type/graph/matrix — canvas adjacency matrix sorted by type/knowledge — paginated knowledge browser, tag filters/sessions — last 50 sessions with summaries + next steps/errors — consolidated error patterns/rules — active behavioral rules + fire countsScreenshots → docs/screenshots/ (coming)
cd ~/total-agent-memory # legacy clones: ~/claude-memory-server
./update.sh
7 stages:
pip install -r requirements.txt -r requirements-dev.txt (only if hash changed)python src/tools/version_status.py/mcp → memory → ReconnectManual equivalent:
cd ~/total-agent-memory # legacy clones: ~/claude-memory-server
git pull
.venv/bin/pip install -r requirements.txt -r requirements-dev.txt
.venv/bin/python src/tools/version_status.py
.venv/bin/python -m pytest tests/
# in Claude Code: /mcp → memory → Reconnect
v9 is backward compatible. Existing v8 calls and DB schema work unchanged — v9 is an infra release that adds pluggable backends, a public CLI for sub-agents, and LoCoMo benchmark wiring. Nothing is forcibly enabled.
cd ~/total-agent-memory && ./update.sh # legacy clones: ~/claude-memory-server
# pulls v9 src, installs new entry-points (tam, tam-lookup, lookup-memory; legacy: ctm-lookup),
# keeps existing memory.db untouched.
After upgrade, verify the new CLI is on PATH:
lookup-memory --limit 1 "any-query-from-your-history"
lookup-memory / tam-lookup / ctm-lookup (legacy) CLI now installed alongside total-agent-memory MCP server (registered as [project.scripts] so ./install.sh and ./update.sh put them on PATH automatically). Sub-agent prompts that reference the legacy ~/claude-memory-server/ollama/lookup_memory.sh script keep working; new prompts should prefer the package-installed name.fastembed by default. Switch via V9_EMBED_BACKEND=openai-3-large (set MEMORY_EMBED_API_KEY) — costs ~$0.10/5k rows for re-embed, expected R@5 lift on conversational data.ce-marco by default. V9_RERANKER_BACKEND=bge-v2-m3 (or off) switches at runtime.--subject-aware in benchmarks/locomo_bench_llm.py. Future: surface as MCP tool flag.python -m scripts.reembed --backend openai-3-large --confirm
~/claude-memory-server/ollama/lookup_memory.sh "query" will keep working. To ride the new package install, replace with lookup-memory "query".None. All v8 MCP tools, env vars, hooks, and DB tables behave identically.
v8.0 is backward compatible — your existing v7 installation keeps working unchanged. All new features are opt-in via MCP tool calls or env vars.
cd ~/total-agent-memory && ./update.sh # legacy clones: ~/claude-memory-server
# Applies migrations 011-013 idempotently, restarts LaunchAgents, updates dependencies
Then restart Claude Code: /mcp restart memory.
memory_save calls keep working — they now additionally strip <private>...</private> sections if present.memory_recall calls keep working — default mode is still "search". New mode="index" is opt-in.session_end calls keep working — auto_compress=False by default. Pass auto_compress=True to opt in.self_rules_context calls keep working — default returns all rules (no phase filter).1. Cloud providers (only if you want to replace/augment Ollama):
export MEMORY_LLM_PROVIDER=openai # or "anthropic"
export MEMORY_LLM_API_KEY=sk-...
export MEMORY_LLM_MODEL=gpt-4o-mini # or "claude-haiku-4-5"
See Cloud providers for OpenRouter / per-phase routing / Cohere examples.
2. Install additional hooks (for UserPromptSubmit capture + citation):
./install.sh --ide claude-code # re-run installer; it now registers user-prompt-submit.sh hook
The hook is additive — existing hooks keep working.
3. activeContext.md Obsidian integration (if you want markdown projection):
export MEMORY_ACTIVECONTEXT_VAULT=~/Documents/project/Projects # default
# Disable: export MEMORY_ACTIVECONTEXT_DISABLE=1
Each session_end writes <vault>/<project>/activeContext.md.
None. All v7 MCP tool signatures are preserved. New parameters are optional with safe defaults.
If you switch to a cloud embedding provider (MEMORY_EMBED_PROVIDER=openai/cohere), the server will refuse to start if existing DB embeddings have a different dimension than the new provider returns. This is deliberate — it prevents silent data corruption.
Either:
MEMORY_EMBED_PROVIDER=fastembed (default 384d) and only change the LLM provider, ORpython src/tools/reembed.py --provider openai --model text-embedding-3-smallQuick reference — see full docs in MCP tools reference:
| Tool | Purpose |
|---|---|
classify_task(description) | Returns {level 1-4, suggested_phases, estimated_tokens} |
task_create(task_id, description) | Starts state machine in "van" phase |
phase_transition(task_id, new_phase, artifacts?) | Moves task through van/plan/creative/build/reflect/archive |
task_phases_list(task_id) | Chronological phase history |
save_decision(title, options, criteria_matrix, selected, rationale, ...) | Structured decision with per-criterion indexing |
memory_get(ids, detail) | Batched full-content fetch for IDs from memory_recall(mode="index") |
save_intent / list_intents / search_intents | UserPromptSubmit-captured prompts |
rule_set_phase(rule_id, phase) | Tag a rule for phase-scoped loading |
Extended tools:
memory_recall(mode="index"|"timeline", decisions_only=False, ...) — 3-layer token-efficient workflowsession_end(auto_compress=True, transcript=None, ...) — LLM-generated summaryself_rules_context(phase="build"|"plan"|...) — phase filtersave_knowledge(...) — now strips <private>...</private> sections automaticallyv8.0 doesn't remove any v7 functionality. If you hit an issue, you can:
Set env var to revert behaviour:
export MEMORY_LLM_PROVIDER=ollama # revert to local LLM
export MEMORY_EMBED_PROVIDER=fastembed # revert to local embeddings
export MEMORY_ACTIVECONTEXT_DISABLE=1 # disable markdown projection
export MEMORY_POST_TOOL_CAPTURE=0 # disable opt-in capture (default anyway)
Migrations 011/012/013 are additive (no DROP / ALTER on existing tables), so DB downgrade is not destructive — old code continues reading older tables.
Worst case: git checkout v7.0.0 && ./update.sh --skip-migrations.
Without Ollama: works fully — raw content is saved, retrieval via BM25 + FastEmbed dense embeddings.
With Ollama: you also get LLM-generated summaries, keywords, question-forms, compressed representations, and deep enrichment (entities, intent, topics).
brew install ollama # or: curl -fsSL https://ollama.com/install.sh | sh
ollama serve &
ollama pull qwen2.5-coder:7b # default — best quality/speed on M-series
ollama pull nomic-embed-text # optional, alternative embedder
Use OpenAI, Anthropic, or any OpenAI-compat endpoint (OpenRouter, Together, Groq, DeepSeek, LM Studio, llama.cpp) instead of local Ollama.
OpenAI:
export MEMORY_LLM_PROVIDER=openai
export MEMORY_LLM_API_KEY=sk-...
export MEMORY_LLM_MODEL=gpt-4o-mini
Anthropic:
export MEMORY_LLM_PROVIDER=anthropic
export MEMORY_LLM_API_KEY=sk-ant-...
export MEMORY_LLM_MODEL=claude-haiku-4-5
OpenRouter (100+ models via one endpoint):
export MEMORY_LLM_PROVIDER=openai
export MEMORY_LLM_API_BASE=https://openrouter.ai/api/v1
export MEMORY_LLM_API_KEY=sk-or-...
export MEMORY_LLM_MODEL=anthropic/claude-haiku-4.5
Per-phase routing (cheap model for bulk, quality for compression):
export MEMORY_TRIPLE_PROVIDER=openai
export MEMORY_TRIPLE_MODEL=gpt-4o-mini
export MEMORY_ENRICH_PROVIDER=anthropic
export MEMORY_ENRICH_MODEL=claude-haiku-4-5
Embeddings (dimension must match existing DB or re-embed required):
export MEMORY_EMBED_PROVIDER=openai
export MEMORY_EMBED_MODEL=text-embedding-3-small # 1536d
# or Cohere:
export MEMORY_EMBED_PROVIDER=cohere
export MEMORY_EMBED_API_KEY=...
| Model | Size | Use case |
|---|---|---|
qwen2.5-coder:7b | 4.7 GB | default — best quality/speed ratio |
qwen2.5-coder:32b | 19 GB | highest quality, needs 32 GB+ RAM |
llama3.1:8b | 4.9 GB | general-purpose alternative |
phi3:mini | 2.3 GB | low-RAM machines |
Environment variables (all optional):
| Variable | Default | Purpose |
|---|---|---|
MEMORY_MODE | fast | ultrafast|fast|balanced|deep. Selects hot-path profile. See Modes. |
MEMORY_USE_LLM_IN_HOT_PATH | false | Master switch for sync LLM stages in save_knowledge / Recall.search. MEMORY_MODE=deep flips this to true. |
MEMORY_ALLOW_OLLAMA_IN_HOT_PATH | false | Re-enables the silent FastEmbed → Ollama fallback ladder when FastEmbed is unavailable. |
MEMORY_RERANK_ENABLED | false | Honour caller's rerank=true. When false, CrossEncoder rerank is hard-disabled even if a tool call requests it. |
MEMORY_ENRICHMENT_ENABLED | false | Run the async enrichment worker. Default-ON in balanced / deep. |
MEMORY_TEXT_EMBED_MODEL | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | Model for embedding_space=text. |
MEMORY_CODE_EMBED_MODEL | empty → falls back to TEXT model | Model for embedding_space=code. The row still records space=code so a future swap is config-only. |
MEMORY_LOG_EMBED_MODEL | empty → TEXT | Model for embedding_space=log. |
MEMORY_CONFIG_EMBED_MODEL | empty → TEXT | Model for embedding_space=config. |
MEMORY_DEFAULT_EMBEDDING_SPACE | text | Space for unclassified content. |
| Variable | Default | Purpose |
|---|---|---|
MEMORY_DB | ~/.tam/memory.db (legacy installs: ~/.claude-memory/memory.db) | SQLite location |
MEMORY_LLM_ENABLED | auto | auto|true|false|force — LLM enrichment toggle |
MEMORY_LLM_MODEL | qwen2.5-coder:7b | Ollama model for enrichment |
MEMORY_LLM_PROBE_TTL_SEC | 60 | Cache TTL for Ollama availability probe |
MEMORY_LLM_TIMEOUT_SEC | 60 | Global fallback timeout for Ollama requests (s) |
MEMORY_TRIPLE_TIMEOUT_SEC | 30 | Timeout for deep triple extraction (s) |
MEMORY_ENRICH_TIMEOUT_SEC | 45 | Timeout for deep enrichment (s) |
MEMORY_REPR_TIMEOUT_SEC | 60 | Timeout for representation generation (s) |
MEMORY_TRIPLE_MAX_PREDICT | 2048 | num_predict cap for triple extraction |
OLLAMA_URL | http://localhost:11434 | Ollama endpoint |
MEMORY_EMBED_MODE | fastembed | fastembed|sentence-transformers|ollama |
DASHBOARD_PORT | 37737 | HTTP dashboard port |
MEMORY_MCP_PORT | 3737 | HTTP MCP transport port (Docker path) |
MEMORY_ASYNC_ENRICHMENT | false | v10.1 — move quality gate / contradiction / entity dedup / episodic / wiki to a background worker. See Performance tuning |
MEMORY_ENRICH_TICK_SEC | 0.1 | Worker tick interval (clamp 0.01..5) |
MEMORY_ENRICH_BATCH | 5 | Rows claimed per tick (clamp 1..50) |
MEMORY_ENRICH_MAX_ATTEMPTS | 3 | Retries before flipping a row to failed |
MEMORY_ENRICH_STALE_AFTER_SEC | 60 | Seconds before a processing row is reclaimed (worker crash recovery) |
CPU-only / WSL hosts: if Ollama keeps timing out, lower
MEMORY_TRIPLE_MAX_PREDICTbefore raising timeouts.install-codex.shwrites conservative defaults automatically. For 30-40s save latency on WSL2 → setMEMORY_ASYNC_ENRICHMENT=true— see below.
Full config: see total_agent_memory/config.py.
When MEMORY_MODE=fast (default):
| metric | p50 | p95 | p99 |
|---|---|---|---|
save_fast | 6.2 | 8.9 | 11.4 |
save_fast cached | 0.3 | 0.4 | 1.4 |
search_fast | 3.4 | 4.7 | 6.0 |
cached_search | 3.1 | 3.4 | 3.6 |
llm_calls=0, network_calls=0. Reproduce: ./bin/memory-bench. Regression gate: ./bin/memory-perf-gate. Architecture rationale and per-stage audit: docs/v11/audit.md. Raw bench artifact: docs/v11/benchmark.md.
If your numbers do not match the table, run ./bin/memory-bench --warmup first — cold FastEmbed import dominates the first call.
memory_save latencyThe synchronous v10 hot path runs five LLM-bound stages inline so a drop verdict can block the INSERT and a contradiction supersede commits in the same transaction. On macOS with a warm Ollama that's ~340 ms median; on a WSL2 box without GPU/CoreML each LLM round-trip can stretch the same call into 30–40 seconds.
v10.1 ships an opt-in inbox/outbox worker that moves the heavy stages out of band:
sync : privacy → canonical_tags → INSERT → embed → enqueue → return
worker : quality_gate → entity_dedup_audit → contradiction → episodic → wiki
Enable it in your env:
export MEMORY_ASYNC_ENRICHMENT=true
# Optional knobs (defaults shown):
export MEMORY_ENRICH_TICK_SEC=0.1
export MEMORY_ENRICH_BATCH=5
export MEMORY_ENRICH_MAX_ATTEMPTS=3
export MEMORY_ENRICH_STALE_AFTER_SEC=60
Restart the MCP server. A background daemon thread now consumes enrichment_queue; you can watch it on the dashboard panel ⚡ v10.1 enrichment worker.
memory_save latency:
| min | p50 | p95 | p99 | max | mean | |
|---|---|---|---|---|---|---|
| sync (default) | 17.5 ms | 25.3 ms | 2150.5 ms | 2179.0 ms | 2186.1 ms | 348.0 ms |
async (MEMORY_ASYNC_ENRICHMENT=true) | 18.1 ms | 22.3 ms | 26.7 ms | 27.4 ms | 27.5 ms | 22.7 ms |
memory_recall latency: p50 ≈ 3-5 ms in both modes (steady state),
with cold-cache p95 outliers on the first warmup hit.
p95 collapses 80× with async (2150 ms → 27 ms). On WSL2 with a
slow Ollama, the same shape holds — sync p95 of 30-40 s becomes
async p95 of ~300-1000 ms (LLM moves out of the hot path entirely).
Reproduce: ./.venv/bin/python benchmarks/v10_5_latency.py --rounds 2 --with-llm.
Full report: benchmarks/v10_5_results.md.
When async is on, a quality_gate drop no longer prevents the INSERT (we already committed in the sync path). Instead the row is marked status='quality_dropped' after the worker scores it. memory_recall ignores that status (idx_knowledge_status_quality is added in migration 020). Audit history stays in quality_gate_log so nothing is lost.
If you need strict pre-INSERT gating (e.g. compliance), keep the default sync path.
Rows stuck in processing longer than MEMORY_ENRICH_STALE_AFTER_SEC (default 60 s) are flipped back to pending automatically — covers worker process kills mid-stage. The pre-existing write_intents outbox still covers a crash before INSERT.
MEMORY_MODE=fast — zero LLM, zero Ollama, zero network in save/search/recall hot path. Set MEMORY_MODE=deep to restore v10.5 behaviour.src/memory_core/* is deterministic; src/ai_layer/* owns every LLM-bound code path. Enforced by tests/test_no_llm_hot_path.py.ultrafast / fast / balanced / deep. Single env flag.text / code / log / config. Single Chroma backend; per-space model swap is config-only.Store.embed requires MEMORY_ALLOW_OLLAMA_IN_HOT_PATH=true.memory_save_fast, memory_search_fast, memory_explain_search, memory_warmup, memory_perf_report, memory_rebuild_fts, memory_rebuild_embeddings, memory_eval_locomo, memory_eval_recall, memory_eval_temporal, memory_eval_entity_consistency, memory_eval_contradictions, memory_eval_long_context.bin/memory-bench (artifact docs/v11/benchmark.md) + bin/memory-perf-gate for CI.memory-protocol skill — single canonical SKILL.md + 4 references (tool cheatsheet for all 60+ MCP tools, workflow recipes for 15 common situations, hooks reference, per-IDE setup) + 4 templates (Claude Code settings.json, Codex config.toml, Cursor .mdc, Cline .md). Same content for every IDE; only the wiring differs.install.sh --ide extended to 9 IDEs: claude-code, codex, cursor, cline, continue, aider, windsurf, gemini-cli, opencode. New helpers: register_mcp_cline / continue / aider / windsurf + _json_merge_mcp_nested for the dotted-key case (cline.mcpServers).bash -n under macOS bash 3.2 (default). Replaced ${var,,} lowercase bashism in update.sh with tr '[:upper:]' '[:lower:]'. Verified with shellcheck.php-pro, golang-pro, vue-expert, etc.) with mandatory memory_recall before / memory_save after. Full template in skills/memory-protocol/references/subagent-protocol.md.benchmarks/v10_5_latency.py with apples-to-apples sync vs async comparison. Demonstrates 80× p95 reduction (2150 ms → 27 ms) when async is enabled with LLM stages on.MEMORY_ASYNC_ENRICHMENT=true moves quality gate / entity dedup / contradiction detector / episodic linking / wiki refresh to a background thread. Drops max save latency 5.4× on macOS, 60–100× on WSL2. See Performance tuning.enrichment_queue table with stale-processing recovery (rows stuck >60 s in processing flip back to pending)._binary_search ValueError fix — np.argpartition requires kth STRICTLY < N; tiny test projects (pool ≤ 50) used to silently break contradiction_log.coref_resolver RU→EN translation fix — prompt explicitly pins output language (Do NOT translate).015–019) applied automatically on restart.lookup-memory / tam-lookup / ctm-lookup (legacy) CLI — bash entry-point for sub-agents, registered as [project.scripts] and installed by ./install.sh / ./update.sh (replaces manual ~/claude-memory-server/ollama/lookup_memory.sh)openai-3-small, openai-3-large (3072d), bge-m3, e5-large, locomo-tuned-minilm (fine-tuned on user data)ce-marco, bge-v2-m3, bge-large, off (env V9_RERANKER_BACKEND, hot-swap)scripts/finetune_embedding.py) — mine triplets from your data, train on top of MiniLM via sentence-transformersscripts/mine_locomo_fewshot.py) — augment per-category prompts with held-in (Q,A) pairsurllib requests now use certifi by defaultbenchmarks/locomo_bench_llm.py with 14 ablation flags)save_decision with criteria matrix + multi-representation criterion indexingsession_end(auto_compress=True) via LLM providermemory_recall(mode="index") + memory_get(ids)activeContext.md Obsidian live-doc projection<private>...</private> inline redaction/api/knowledge/{id} + /api/session/{id}install.sh --ide {claude-code|cursor|gemini-cli|opencode|codex}has_llm() per-phase provider cachingtotal-agent-memory is, and will always be, free and MIT-licensed. No paid tier, no gated features, no "enterprise edition". The benchmarks on this page are the entire product.
If it's saving you hours of context-pasting every week and you want to help keep development going — or just say thanks — a donation means a lot.
| Goal | |
|---|---|
| ☕ $5 — a coffee | One evening of focused OSS work |
| 🍕 $25 — a pizza | A new MCP tool end-to-end (design, code, tests, docs) |
| 🎧 $100 — a weekend | A major feature: e.g. the preference-tracking module that closes the 80% gap on LongMemEval |
| 💎 $500+ — a sprint | A release cycle: new subsystem + migrations + docs + benchmark artifact |
vbcherepanov@gmail.com — open to contract work and partnerships.MIT forever. No commercial-license switch, no VC money, no dark patterns. The memory layer belongs to the developers using it, not to a SaaS vendor.
Local-first is the product. If you want a cloud memory service, mem0 and Supermemory are great. If you want your data on your disk, untouched by anyone else — this.
Honest benchmarks. Every number on this page is reproducible from the artifacts in evals/ and the scripts in benchmarks/. If you can't reproduce a claim, open an issue — it's a bug.
pytest tests/ must stay green. Add tests for new tools.evals/scenarios/*.json if you change retrieval behavior.MIT — see LICENSE.
Built for coding agents. Runs on your machine. Free forever.
Compare to mem0 / Letta / Zep / Supermemory ·
Benchmark artifact ·
TypeScript SDK ·
Donate
Run Claude Code as an MCP server so any agent can delegate coding tasks to it
Browser automation using accessibility snapshots instead of screenshots
Secure MCP server for MySQL database interaction, queries, and schema management
English-first Korean equity intelligence MCP — DART filings, foreign-holder 5%-rule flows, activist filings, KRX news. F