Appearance
ADR-0014: Vector-Memory Layer for Researcher Pre-flight and Cross-Phase Recall
- Status: Accepted
- Date: 2026-05-08
- Supersedes: None
- Relates-to: ADR-0001, ADR-0002, ADR-0005, ADR-0006, ADR-0010, ADR-0011, ADR-0013
Context and Problem Statement
The Nubosloop's Pre-flight cache (ADR-0010 Step 1, ADR-0011 §Pre-flight) bypasses the Researcher-Schwarm when a cached pattern matches the current ticket at similarity ≥ swarm.research.threshold. The match is computed today by the BM25 + n-gram fingerprint inside lib/learnings.cjs::matchExistingLearning. This works for lexically close repeats but is blind to semantically close patterns expressed with different vocabulary.
Three observed failure modes:
- Vocabulary drift between phases. A learning logged in M002 about "Filament Resource policy registration" does not match the M005 ticket "Resource autorisierung in admin panel", so the swarm re-derives the same conclusion at triple Researcher cost.
- Handoff-note bloat.
lib/handoff.cjsreturns the entire prior-phase note set as plain text regardless of relevance. Late-project phases load increasingly large irrelevant context. - Critic-finding rediscovery. A Critic in M005 cannot recall that the same finding category appeared in M001 with a known remediation; the routing engine re-explores the same dead ends.
A semantic memory layer addresses all three: indexed embeddings of past learnings, handoff-notes, critic-findings, and research-decisions, queryable by k-NN with optional filters, integrated into the existing Pre-flight without weakening provenance semantics (ADR-0011).
The Rule
A Vector-Memory layer at .nubos-pilot/memory/ indexes structured records (learnings, handoff-notes, critic-findings, research-decisions) by their embedding. It is queryable via lib/memory.cjs and the np:memory-* subcommands. The Pre-flight cache (ADR-0010 Step 1) consults it as a hybrid score with the existing BM25 fingerprint; the Researcher-Schwarm consults it before issuing external research; the Planner consults it for prior-phase decisions.
The layer is opt-in: disabled by default; enabled via memory.enabled = true in .nubos-pilot/config.json. When disabled, no embedding model is loaded, no index is built, and Pre-flight falls back to BM25-only as today.
Decision Drivers
- Hybrid pre-flight: BM25 catches lexical repeats; vector catches semantic repeats. Combining the two scores raises cache-hit rate without admitting vector-only false positives.
- Runtime agnostic: must function in Claude Code, OpenAI Agents, Codex, or any host. No host-specific hooks. (Driver carried over from ADR-0007.)
- Cheap to keep fresh: index updates piggyback on existing write paths (
logLearning,writeHandoffNote, post-Critic). No periodic rebuild; rebuild only on embedding-model change. - No daemon: the index is a persistent file artefact; queries open/read it on demand. No long-running process. (ADR-0001.)
- Pluggable embedding provider: the local default is one of multiple providers; the provider interface (
embed(texts) → vectors) is stable across providers. Remote / Pro providers are out-of-scope here and tracked as separate work.
Considered Options
- A: No vector memory. Status quo. Reject: BM25-only Pre-flight has a measurable false-negative rate on rephrased tickets; the cache-hit ceiling is artificially low.
- B: External vector DB (AgentDB / Weaviate / Pinecone). Reject: violates ADR-0001 (daemon shape) and ADR-0002 (heavy network dep). Inappropriate for a CLI shipping into arbitrary third-party projects.
- C: Pure-JS HNSW implementation, no native code, no WASM. Reject: 200–300 LoC of subtle index code is a maintenance burden disproportionate to the benefit; performance ceiling at O(10K) records is fragile.
- D:
usearch+@huggingface/transformersfor local embedding, lazy-loaded as opt-in deps. Chosen.
Decision Outcome
Chosen: Option D, local-first vector memory with usearch and @huggingface/transformers, lazy-loaded behind a config gate, because it preserves ADR-0001's no-daemon stance, integrates with the existing lib/learnings.cjs Pre-flight without weakening ADR-0011's [CACHED] provenance semantics, and leaves a clean seam for future remote providers.
Layout
.nubos-pilot/memory/ # strict sub-tree of Project-State (ADR-0005)
index.usearch # binary HNSW index, written via atomicWriteFileSync
index.usearch.keymap.json # BigInt-key ↔ string-uuid mapping
records.jsonl # 1:1 vector-id ↔ record, append-only
manifest.json # embedding model, dim, version, alpha, created_atrecords.jsonl schema (one record per line):
json
{ "id": "uuid",
"type": "learning|handoff|critic|research",
"phase": "M005-S007-T0002",
"title": "...",
"body": "...",
"tags": ["feature-flags", "filament"],
"provenance": "VERIFIED|CITED|ASSUMED|CACHED",
"created_at": "2026-05-08T..." }type and phase are exact-match filters at query time; tags is a set-overlap filter.
Library surface
lib/memory.cjs:
async function index(records: Record[]): Promise<void>
async function query(text: string, opts: { k=8, filter?: { type, phase, tags } }): Promise<Hit[]>
async function add(record: Record): Promise<void>
async function rebuild(): Promise<void>
async function stats(): Promise<{ count, dim, model }>Where Hit = { id, score, record }. Provider selection happens once at module init from .nubos-pilot/config.json::memory.provider (default "local").
Subcommands
np:memory-index: bulk-index fromlib/learnings.cjs,lib/handoff.cjs, post-Critic findings, andRESEARCH.mddecisions. Idempotent.np:memory-query <text>: top-k hits with score and provenance.np:memory-add: add a single record.np:memory-rebuild: force full re-embed; required onmanifest.json::modelchange.np:memory-stats: count, dim, model, last-rebuild timestamp.
Integration points
Pre-flight (ADR-0010 Step 1, ADR-0011 §Pre-flight) — hybrid score.
lib/knowledge-adapter.cjs::_localAdapter.matchruns BM25 first; whenmemory.enabled=trueit queries the vector index and the combined score isα·BM25 + (1−α)·vector, defaultα = 0.6. The threshold semantic is unchanged: a hit at combined ≥swarm.research.threshold(default 0.9) bypasses the swarm. Provenance of cache hits remains[CACHED]per ADR-0011.Researcher-Schwarm — pre-recall.
agents/np-researcher.mdincludes a Vector-Memory Pre-recall section. The agent's prompt instructs: query memory before issuing external research; if a[VERIFIED]or[CITED]decision matches the current ticket, surface it as part of the spawn output with provenance preserved.Planner — prior-decision context.
agents/np-planner.mdqueries memory for the parent milestone's prior phases and surfaces matching[VERIFIED]decisions as context-injection. Locked-decision conformance (ADR-0010 plan-checker route) is unaffected: memory hits are advisory; the locked-decisions file is canonical.Phase-completion hook — write-back. Post-atomic-commit (ADR-0004),
bin/np-tools/loop-run-round.cjs::_runCommitcallsmemory.addfor the just-logged learning. Each call is fire-and-forget with respect to the commit: index-write failures are surfaced asmemory_skip_reasonin the response but do not block the commit.
Embedding provider — local default
@huggingface/transformers (the successor to @xenova/transformers, maintained by the original author at Hugging Face) running Xenova/bge-small-en-v1.5 (or Xenova/bge-multilingual-base for non-English projects):
- Model size: ~70 MB downloaded on first run; cached under
~/.cache/nubos-pilot/models/. - Vector dim: 384 (bge-small) or 768 (bge-multilingual).
- Provider interface:
provider.embed(texts: string[]) → Promise<Float32Array[]>. - First-run UX:
np:memory-indexprints a one-time progress indicator while the model downloads. Later runs load the cached model. - CJS-compatible: v4 ships dual CJS+ESM, so
lib/memory-provider-local.cjskeeps usingrequire().
Index engine — usearch
usearch provides HNSW with cosine similarity:
- Why prebuilt binaries, not WASM: Both
@huggingface/transformers(viaonnxruntime-node) andusearchship platform-specific prebuilt binaries vianode-gyp-build/@img/sharp-*-style platform packages. Nonode-gypinvocation, no Python, no build chain on the consumer machine: same UX as WASM, faster runtime. The deprecatedprebuild-installpackage is not in the dependency tree of these pinned versions. - Capacity: O(100K) records mühelos. Per-project memory expectation is O(1K–10K) records over the project lifetime.
- Persistence:
index.usearchwritten viaatomicWriteFileSync; corruption recovery vianp:memory-rebuildfromrecords.jsonl.
ADR-0002 amendment
This ADR introduces two new runtime dependencies in package.json:
usearch@^2.25(prebuilt platform binaries vianode-gyp-build, no native compile on install)@huggingface/transformers@^4(~5 MB package; ~70 MB model downloaded on first use; prebuiltonnxruntime-nodebinaries, no native compile on install)
Both are lazy-loaded: the require() lives inside lib/memory.cjs factory functions and is reached only when memory.enabled=true AND a memory operation is invoked. The package.json declares them under optionalDependencies, so a Free-tier install with memory.enabled=false never resolves them at install time either.
This amendment is in the spirit of ADR-0006, which admitted yaml@^2.8 as the first runtime dep: a deliberately-named, version-pinned, scoped exception, not an opening of the floodgates.
Memory is never committed
Per User-Vorgabe: .nubos-pilot/memory/ is a runtime-state cache, not source-of-truth. The directory is added to the consumer-project's .gitignore by np:scan-codebase (alongside the existing .nubos-pilot/codebase/.hashes.json exclusion). Rebuild is deterministic from the source-of-truth files: lib/learnings.cjs learning-store, lib/handoff.cjs handoff-notes, milestone RESEARCH.md, and the Critic-report archive (per ADR-0010 §L5 <report_path>).
Privacy boundary
The local-default provider keeps everything on the consumer machine; no data leaves the workspace. A future Pro-tier remote provider (Jina Embeddings v3) is out-of-scope for this ADR; that ADR will specify what is sent and what is hashed-only.
Consequences
- Good, because Pre-flight cache-hit rate increases on semantic repeats: fewer redundant Researcher-Schwarm spawns, lower token cost as the project ages.
- Good, because Researcher and Planner surface prior-phase decisions without re-deriving them.
- Good, because the layer is opt-in: Free-tier users see no behaviour change, no model download, no install-time
optionalDependenciesresolution surprise. - Good, because the index is rebuildable from source-of-truth; corruption is not a data-loss event.
- Bad, because two new runtime deps widen the install surface from ADR-0002's strict zero. Mitigation: lazy-loaded, opt-in, narrowly scoped, with the ADR-0006 precedent.
- Bad, because the first-run model download is ~70 MB. Mitigation: gated behind explicit
memory.enabled=true; cached after first run; surfaced as one-time UX. - Bad, because hybrid scoring introduces a tunable (
α = 0.6) that affects cache-hit rate. Mitigation: default ships, override is config; cache-hit rate measured before/after on production milestones.
Update — 2026-05-17: implementation hardening
Post-acceptance hardening corrected three gaps between this ADR and the shipped code. The core decision (Option D, local-first usearch + @huggingface/transformers, opt-in, hybrid Pre-flight) stands; the following details are superseded:
- Hybrid retrieval is now actually executed.
_localAdapter.matchcarried a synchronous guard that discarded the (async)memory.queryresult on every call, so Pre-flight silently ran BM25-only.matchis nowasyncandawaits the vector query, so the hybrid score described in Integration point 1 genuinely runs. - Write-back hook replaced by a derived cache. Integration point 4 (
_runCommitcallingmemory.add, fire-and-forget,memory_added/memory_skip_reason) is removed. The vector index is a derived cache of the learnings store:_ensureLearningsIndexedindexes missing learnings lazily at Pre-flight, keyed by fingerprint (idempotent, no duplicates).learnings.jsonis the single source of truth; the commit phase no longer writes the index. - Vector-only hits and honest degradation. The "without admitting vector-only false positives" driver is refined: a vector-only hit is surfaced, but must independently clear
swarm.research.threshold. Each hit gained aretrievaltag (bm25/vector/hybrid). A result computed while the vector layer was enabled-but-unavailable carries adegradedmarker instead of masquerading as a normal lexical result.
Index concurrency and crash-safety were hardened in the same pass: withFileLockAsync around index() / rebuild(), atomic usearch index writes with a rolling .bak, and explicit memory-index-desync / memory-index-keymap-corrupt errors on a torn index. See Vector-Memory for the current behaviour.
More information
- Library:
lib/memory.cjs,lib/memory-provider-local.cjs,lib/memory-index-usearch.cjs,lib/learnings.cjs::matchExistingLearning(extended),bin/np-tools/loop-run-round.cjs::_runCommit(write-back hook). - Subcommands:
bin/np-tools/memory-{index,query,add,rebuild,stats}.cjsplusbin/np-tools/_memory-resolve.cjs(config gate + factory wiring). - Concept: Vector-Memory.
- Agents:
agents/np-researcher.md(memory-query reference),agents/np-planner.md(prior-decision recall). - Config schema:
.nubos-pilot/config.json::memory = { enabled: false, provider: "local", alpha: 0.6, model: "Xenova/bge-small-en-v1.5" }.
This ADR specifies the local-default Vector-Memory. The Pro-tier remote provider via Jina is out-of-scope here and will be tracked under a separate task and (if accepted) a successor ADR.
