Skip to content

ADR-0014: Vector-Memory Layer for Researcher Pre-flight and Cross-Phase Recall

Context and Problem Statement

The Nubosloop's Pre-flight cache (ADR-0010 Step 1, ADR-0011 §Pre-flight) bypasses the Researcher-Schwarm when a cached pattern matches the current ticket at similarity ≥ swarm.research.threshold. The match is computed today by the BM25 + n-gram fingerprint inside lib/learnings.cjs::matchExistingLearning. This works for lexically close repeats but is blind to semantically close patterns expressed with different vocabulary.

Three observed failure modes:

  1. Vocabulary drift between phases. A learning logged in M002 about "Filament Resource policy registration" does not match the M005 ticket "Resource autorisierung in admin panel", so the swarm re-derives the same conclusion at triple Researcher cost.
  2. Handoff-note bloat. lib/handoff.cjs returns the entire prior-phase note set as plain text regardless of relevance. Late-project phases load increasingly large irrelevant context.
  3. Critic-finding rediscovery. A Critic in M005 cannot recall that the same finding category appeared in M001 with a known remediation; the routing engine re-explores the same dead ends.

A semantic memory layer addresses all three: indexed embeddings of past learnings, handoff-notes, critic-findings, and research-decisions, queryable by k-NN with optional filters, integrated into the existing Pre-flight without weakening provenance semantics (ADR-0011).

The Rule

A Vector-Memory layer at .nubos-pilot/memory/ indexes structured records (learnings, handoff-notes, critic-findings, research-decisions) by their embedding. It is queryable via lib/memory.cjs and the np:memory-* subcommands. The Pre-flight cache (ADR-0010 Step 1) consults it as a hybrid score with the existing BM25 fingerprint; the Researcher-Schwarm consults it before issuing external research; the Planner consults it for prior-phase decisions.

The layer is opt-in: disabled by default; enabled via memory.enabled = true in .nubos-pilot/config.json. When disabled, no embedding model is loaded, no index is built, and Pre-flight falls back to BM25-only as today.

Decision Drivers

  • Hybrid pre-flight: BM25 catches lexical repeats; vector catches semantic repeats. Combining the two scores raises cache-hit rate without admitting vector-only false positives.
  • Runtime agnostic: must function in Claude Code, OpenAI Agents, Codex, or any host. No host-specific hooks. (Driver carried over from ADR-0007.)
  • Cheap to keep fresh: index updates piggyback on existing write paths (logLearning, writeHandoffNote, post-Critic). No periodic rebuild; rebuild only on embedding-model change.
  • No daemon: the index is a persistent file artefact; queries open/read it on demand. No long-running process. (ADR-0001.)
  • Pluggable embedding provider: the local default is one of multiple providers; the provider interface (embed(texts) → vectors) is stable across providers. Remote / Pro providers are out-of-scope here and tracked as separate work.

Considered Options

  • A: No vector memory. Status quo. Reject: BM25-only Pre-flight has a measurable false-negative rate on rephrased tickets; the cache-hit ceiling is artificially low.
  • B: External vector DB (AgentDB / Weaviate / Pinecone). Reject: violates ADR-0001 (daemon shape) and ADR-0002 (heavy network dep). Inappropriate for a CLI shipping into arbitrary third-party projects.
  • C: Pure-JS HNSW implementation, no native code, no WASM. Reject: 200–300 LoC of subtle index code is a maintenance burden disproportionate to the benefit; performance ceiling at O(10K) records is fragile.
  • D: usearch + @huggingface/transformers for local embedding, lazy-loaded as opt-in deps. Chosen.

Decision Outcome

Chosen: Option D, local-first vector memory with usearch and @huggingface/transformers, lazy-loaded behind a config gate, because it preserves ADR-0001's no-daemon stance, integrates with the existing lib/learnings.cjs Pre-flight without weakening ADR-0011's [CACHED] provenance semantics, and leaves a clean seam for future remote providers.

Layout

.nubos-pilot/memory/                 # strict sub-tree of Project-State (ADR-0005)
  index.usearch                      # binary HNSW index, written via atomicWriteFileSync
  index.usearch.keymap.json          # BigInt-key ↔ string-uuid mapping
  records.jsonl                      # 1:1 vector-id ↔ record, append-only
  manifest.json                      # embedding model, dim, version, alpha, created_at

records.jsonl schema (one record per line):

json
{ "id": "uuid",
  "type": "learning|handoff|critic|research",
  "phase": "M005-S007-T0002",
  "title": "...",
  "body": "...",
  "tags": ["feature-flags", "filament"],
  "provenance": "VERIFIED|CITED|ASSUMED|CACHED",
  "created_at": "2026-05-08T..." }

type and phase are exact-match filters at query time; tags is a set-overlap filter.

Library surface

lib/memory.cjs:

async function index(records: Record[]): Promise<void>
async function query(text: string, opts: { k=8, filter?: { type, phase, tags } }): Promise<Hit[]>
async function add(record: Record): Promise<void>
async function rebuild(): Promise<void>
async function stats(): Promise<{ count, dim, model }>

Where Hit = { id, score, record }. Provider selection happens once at module init from .nubos-pilot/config.json::memory.provider (default "local").

Subcommands

  • np:memory-index: bulk-index from lib/learnings.cjs, lib/handoff.cjs, post-Critic findings, and RESEARCH.md decisions. Idempotent.
  • np:memory-query <text>: top-k hits with score and provenance.
  • np:memory-add: add a single record.
  • np:memory-rebuild: force full re-embed; required on manifest.json::model change.
  • np:memory-stats: count, dim, model, last-rebuild timestamp.

Integration points

  1. Pre-flight (ADR-0010 Step 1, ADR-0011 §Pre-flight) — hybrid score.lib/knowledge-adapter.cjs::_localAdapter.match runs BM25 first; when memory.enabled=true it queries the vector index and the combined score is α·BM25 + (1−α)·vector, default α = 0.6. The threshold semantic is unchanged: a hit at combined ≥ swarm.research.threshold (default 0.9) bypasses the swarm. Provenance of cache hits remains [CACHED] per ADR-0011.

  2. Researcher-Schwarm — pre-recall.agents/np-researcher.md includes a Vector-Memory Pre-recall section. The agent's prompt instructs: query memory before issuing external research; if a [VERIFIED] or [CITED] decision matches the current ticket, surface it as part of the spawn output with provenance preserved.

  3. Planner — prior-decision context.agents/np-planner.md queries memory for the parent milestone's prior phases and surfaces matching [VERIFIED] decisions as context-injection. Locked-decision conformance (ADR-0010 plan-checker route) is unaffected: memory hits are advisory; the locked-decisions file is canonical.

  4. Phase-completion hook — write-back. Post-atomic-commit (ADR-0004), bin/np-tools/loop-run-round.cjs::_runCommit calls memory.add for the just-logged learning. Each call is fire-and-forget with respect to the commit: index-write failures are surfaced as memory_skip_reason in the response but do not block the commit.

Embedding provider — local default

@huggingface/transformers (the successor to @xenova/transformers, maintained by the original author at Hugging Face) running Xenova/bge-small-en-v1.5 (or Xenova/bge-multilingual-base for non-English projects):

  • Model size: ~70 MB downloaded on first run; cached under ~/.cache/nubos-pilot/models/.
  • Vector dim: 384 (bge-small) or 768 (bge-multilingual).
  • Provider interface: provider.embed(texts: string[]) → Promise<Float32Array[]>.
  • First-run UX: np:memory-index prints a one-time progress indicator while the model downloads. Later runs load the cached model.
  • CJS-compatible: v4 ships dual CJS+ESM, so lib/memory-provider-local.cjs keeps using require().

Index engine — usearch

usearch provides HNSW with cosine similarity:

  • Why prebuilt binaries, not WASM: Both @huggingface/transformers (via onnxruntime-node) and usearch ship platform-specific prebuilt binaries via node-gyp-build / @img/sharp-*-style platform packages. No node-gyp invocation, no Python, no build chain on the consumer machine: same UX as WASM, faster runtime. The deprecated prebuild-install package is not in the dependency tree of these pinned versions.
  • Capacity: O(100K) records mühelos. Per-project memory expectation is O(1K–10K) records over the project lifetime.
  • Persistence: index.usearch written via atomicWriteFileSync; corruption recovery via np:memory-rebuild from records.jsonl.

ADR-0002 amendment

This ADR introduces two new runtime dependencies in package.json:

  • usearch@^2.25 (prebuilt platform binaries via node-gyp-build, no native compile on install)
  • @huggingface/transformers@^4 (~5 MB package; ~70 MB model downloaded on first use; prebuilt onnxruntime-node binaries, no native compile on install)

Both are lazy-loaded: the require() lives inside lib/memory.cjs factory functions and is reached only when memory.enabled=true AND a memory operation is invoked. The package.json declares them under optionalDependencies, so a Free-tier install with memory.enabled=false never resolves them at install time either.

This amendment is in the spirit of ADR-0006, which admitted yaml@^2.8 as the first runtime dep: a deliberately-named, version-pinned, scoped exception, not an opening of the floodgates.

Memory is never committed

Per User-Vorgabe: .nubos-pilot/memory/ is a runtime-state cache, not source-of-truth. The directory is added to the consumer-project's .gitignore by np:scan-codebase (alongside the existing .nubos-pilot/codebase/.hashes.json exclusion). Rebuild is deterministic from the source-of-truth files: lib/learnings.cjs learning-store, lib/handoff.cjs handoff-notes, milestone RESEARCH.md, and the Critic-report archive (per ADR-0010 §L5 <report_path>).

Privacy boundary

The local-default provider keeps everything on the consumer machine; no data leaves the workspace. A future Pro-tier remote provider (Jina Embeddings v3) is out-of-scope for this ADR; that ADR will specify what is sent and what is hashed-only.

Consequences

  • Good, because Pre-flight cache-hit rate increases on semantic repeats: fewer redundant Researcher-Schwarm spawns, lower token cost as the project ages.
  • Good, because Researcher and Planner surface prior-phase decisions without re-deriving them.
  • Good, because the layer is opt-in: Free-tier users see no behaviour change, no model download, no install-time optionalDependencies resolution surprise.
  • Good, because the index is rebuildable from source-of-truth; corruption is not a data-loss event.
  • Bad, because two new runtime deps widen the install surface from ADR-0002's strict zero. Mitigation: lazy-loaded, opt-in, narrowly scoped, with the ADR-0006 precedent.
  • Bad, because the first-run model download is ~70 MB. Mitigation: gated behind explicit memory.enabled=true; cached after first run; surfaced as one-time UX.
  • Bad, because hybrid scoring introduces a tunable (α = 0.6) that affects cache-hit rate. Mitigation: default ships, override is config; cache-hit rate measured before/after on production milestones.

Update — 2026-05-17: implementation hardening

Post-acceptance hardening corrected three gaps between this ADR and the shipped code. The core decision (Option D, local-first usearch + @huggingface/transformers, opt-in, hybrid Pre-flight) stands; the following details are superseded:

  • Hybrid retrieval is now actually executed. _localAdapter.match carried a synchronous guard that discarded the (async) memory.query result on every call, so Pre-flight silently ran BM25-only. match is now async and awaits the vector query, so the hybrid score described in Integration point 1 genuinely runs.
  • Write-back hook replaced by a derived cache. Integration point 4 (_runCommit calling memory.add, fire-and-forget, memory_added / memory_skip_reason) is removed. The vector index is a derived cache of the learnings store: _ensureLearningsIndexed indexes missing learnings lazily at Pre-flight, keyed by fingerprint (idempotent, no duplicates). learnings.json is the single source of truth; the commit phase no longer writes the index.
  • Vector-only hits and honest degradation. The "without admitting vector-only false positives" driver is refined: a vector-only hit is surfaced, but must independently clear swarm.research.threshold. Each hit gained a retrieval tag (bm25 / vector / hybrid). A result computed while the vector layer was enabled-but-unavailable carries a degraded marker instead of masquerading as a normal lexical result.

Index concurrency and crash-safety were hardened in the same pass: withFileLockAsync around index() / rebuild(), atomic usearch index writes with a rolling .bak, and explicit memory-index-desync / memory-index-keymap-corrupt errors on a torn index. See Vector-Memory for the current behaviour.

More information

  • Library: lib/memory.cjs, lib/memory-provider-local.cjs, lib/memory-index-usearch.cjs, lib/learnings.cjs::matchExistingLearning (extended), bin/np-tools/loop-run-round.cjs::_runCommit (write-back hook).
  • Subcommands: bin/np-tools/memory-{index,query,add,rebuild,stats}.cjs plus bin/np-tools/_memory-resolve.cjs (config gate + factory wiring).
  • Concept: Vector-Memory.
  • Agents: agents/np-researcher.md (memory-query reference), agents/np-planner.md (prior-decision recall).
  • Config schema: .nubos-pilot/config.json::memory = { enabled: false, provider: "local", alpha: 0.6, model: "Xenova/bge-small-en-v1.5" }.

This ADR specifies the local-default Vector-Memory. The Pro-tier remote provider via Jina is out-of-scope here and will be tracked under a separate task and (if accepted) a successor ADR.