mneme Architecture¶
Components¶
~/.claude/mneme/
├── capabilities.yaml # source of truth, hand-edited or scanner-discovered
├── semantic.sqlite # capability vectors (sqlite-vec)
├── procedural.jsonl # successful tool sequences (Voyager pattern)
├── reflections.jsonl # failure notes (Phase 2)
└── failures.log # raw failure log (Phase 1)
Hooks declared in ~/.claude/settings.json:
UserPromptSubmit->python -m mneme.hooks.user_prompt_submitinjects relevant capabilities at the start of the prompt.PostToolUse->python -m mneme.hooks.post_tool_userecords workflow success and failure outcomes.
Retrieval¶
Two-stage hierarchy per AnyTool (arXiv:2402.04253):
- Embed prompt with instruction prefix:
"Given this task intent, retrieve tools that enable it: <prompt>". - Cosine top-K categories (default
top_categories=10of 35 indexed at Retriever construction; tunable per registry size). - Cosine top-N capabilities filtered by those categories, threshold >= 0.65 (default
top_capabilities=5). - Optionally retrieve top-3 procedural workflows whose
situationcosine to the prompt also clears the threshold. - Format the injection block (EasyTool concise schema) and place it at the START of the augmented prompt.
If no result clears the threshold, no injection occurs (fail-safe). The prompt is never polluted with low-confidence matches.
Why these design choices¶
- Local-only. Zero cloud dependencies; runs on a developer laptop without an account.
- Instruction-tuned embeddings. Generic IR embeddings score nDCG@10 = 33.83 on tool retrieval ("Retrieval Models Aren't Tool-Savvy", ACL 2025). Instruction-tuning is the cheapest mitigation that preserves Phase 1 simplicity.
- Two-stage hierarchy. Flat retrieval at scale degrades; AnyTool reports +35.4% pass-rate over flat baselines.
- Procedural memory separate from semantic. Voyager (NeurIPS 2023) shows 3-15x improvement when successful sequences are stored as code, not as facts.
- Lost in the Middle placement. Injection at the START of the prompt (never the middle) per Liu et al., TACL 2024.
- Fail-safe everywhere. A hook crash never blocks Claude Code; missing Ollama falls back to regex matching on triggers when explicitly enabled (
MNEME_FALLBACK=1).