Vector-Memory

Vector-Memory is the opt-in semantic-recall layer that augments the BM25 pre-flight cache with embedding-based similarity. It lives at .nubos-pilot/memory/ and is consulted by the Nubosloop's Pre-flight (hybrid score), the Researcher-Schwarm (pre-recall before web search), and the Planner (prior-decision context).

ADR-0014 ratifies the design. The orchestration lives in lib/memory.cjs; the local provider in lib/memory-provider-local.cjs (lazy-loaded @huggingface/transformers); the index in lib/memory-index-usearch.cjs (lazy-loaded usearch).

Why a vector layer

The BM25 / Jaccard pre-flight in lib/learnings.cjs::matchExistingLearning is blind to semantically close patterns expressed with different vocabulary. Three failure modes recur:

Vocabulary drift between phases. A learning logged in M002 about "Filament Resource policy registration" does not match the M005 ticket "Resource autorisierung in admin panel", so the swarm re-derives the same conclusion at triple Researcher cost.
Handoff-note bloat. lib/handoff.cjs returns the entire prior-phase note set as plain text regardless of relevance. Late-project phases load increasingly large irrelevant context.
Critic-finding rediscovery. A Critic in M005 cannot recall that the same finding category appeared in M001 with a known remediation, so the routing engine re-explores the same dead ends.

The hybrid α·BM25 + (1−α)·vector score (default α = 0.6) catches both lexical and semantic repeats. A missing vector signal is treated as absent, so the lexical score stands; it is never treated as a zero that would drag a strong lexical hit down. A purely semantic match is also surfaced, but only when it clears swarm.research.threshold on its own.

Layout

.nubos-pilot/memory/                 # strict sub-tree of Project-State (ADR-0005)
  index.usearch                      # binary HNSW index (cosine, dim from provider)
  index.usearch.keymap.json          # BigInt-key ↔ string-uuid mapping
  records.jsonl                      # 1:1 vector-id ↔ record, append-only, source-of-truth
  manifest.json                      # embedding model, dim, version, alpha

records.jsonl schema (one record per line):

json

{ "id": "uuid",
  "type": "learning|handoff|critic|research",
  "phase": "M005-S007-T0002",
  "title": "...",
  "body": "...",
  "tags": ["feature-flags", "filament"],
  "provenance": "VERIFIED|CITED|ASSUMED|CACHED",
  "created_at": "2026-05-08T..." }

type and phase are exact-match filters at query time; tags is a set-overlap filter.

Activation

Disabled by default. Enable via .nubos-pilot/config.json:

json

{
  "memory": {
    "enabled": true,
    "provider": "local",
    "model": "Xenova/bge-small-en-v1.5",
    "alpha": 0.6
  }
}

When memory.enabled = false, no embedding model is loaded, no index is built, and Pre-flight falls back to BM25-only. The optional dependencies (@huggingface/transformers, usearch) are not resolved at install time. Run npm install --include=optional to pull them when activating the layer.

Where it plugs in

1. Pre-flight hybrid score (Step 1 of the Nubosloop)

lib/knowledge-adapter.cjs::_localAdapter.match is async: it runs BM25/Jaccard first, then — when memory.enabled = true — queries memory.query(text, { type: 'learning', k: limit }) and merges via _hybridMerge(bm25Hits, vectorHits, alpha, byFp), keyed by learning fingerprint. Every hit carries a retrieval tag — bm25, vector, or hybrid. Threshold gating (swarm.research.threshold, default 0.9): lexical and hybrid hits were already gated by matchExistingLearning and pass through; a vector-only hit must clear the threshold on its own before it becomes a cache hit. If memory.enabled = true but the vector layer cannot be built or queried, match returns a lexical-only result with a non-null degraded marker rather than silently pretending the hybrid path ran. Provenance of cache hits remains [CACHED] per ADR-0011.

2. Researcher pre-recall

The np-researcher agent prompt instructs each spawn to query memory before issuing external research:

bash

node .nubos-pilot/bin/np-tools.cjs memory-query --text "<ticket-summary>" --k 5 --type research
node .nubos-pilot/bin/np-tools.cjs memory-query --text "<ticket-summary>" --k 3 --type learning

Hits with [VERIFIED] / [CITED] provenance enter the merged RESEARCH.md as [CACHED:VERIFIED] / [CACHED:CITED], with no duplicate web round-trip. memory-disabled is silently swallowed; the section is opt-in and additive.

3. Planner context-injection

np-planner queries memory before the reality-check pass and surfaces matching [VERIFIED] decisions in the slice plan's <context> block as prior-art. Locked-decisions in M<NNN>-CONTEXT.md remain canonical — memory hits are advisory only.

4. Derived-cache indexing

The vector index is a derived cache of the learnings store, not an independently-written store. lib/knowledge-adapter.cjs::_ensureLearningsIndexed runs at Pre-flight (inside _localAdapter.match): it embeds and indexes every learning whose fingerprint is not yet present, keyed by id = fingerprint. Because the key is the fingerprint, a re-logged pattern maps to the same record, so there is no duplication and no re-embedding.

lib/learnings.cjs (learnings.json) is the single source of truth. The commit phase does not write to the vector index; if the index is lost or stale, it is rebuilt deterministically, lazily at the next Pre-flight or explicitly via np:memory-rebuild.

Provider — local default

@huggingface/transformers (the successor to @xenova/transformers, maintained by the original author at Hugging Face) running Xenova/bge-small-en-v1.5 (or Xenova/bge-multilingual-base for non-English projects):

Field	Value
Model size	~70 MB downloaded on first run; cached under `~/.cache/nubos-pilot/models/`
Vector dim	384 (bge-small) or 768 (bge-multilingual)
Provider interface	`provider.embed(texts: string[]) → Promise<Float32Array[]>`
First-run UX	one-time progress indicator while the model downloads; subsequent runs load the cached model
Runtime	dual CJS+ESM in v4; `lib/memory-provider-local.cjs` keeps `require()`

A future Pro-tier remote provider (Jina Embeddings v3, multilingual + code-aware) will be specified in a successor ADR; it is out of scope here.

Index engine — `usearch`

usearch provides HNSW with cosine similarity:

Why prebuilt binaries, not WASM: Both @huggingface/transformers (via onnxruntime-node) and usearch ship platform-specific prebuilt binaries via node-gyp-build / @img/sharp-*-style platform packages. No node-gyp invocation, no Python, and no build chain on the consumer machine. The install UX matches WASM, the runtime is faster. The deprecated prebuild-install package is not in the dependency tree of these pinned versions.
Capacity: O(100K) records without strain. Per-project memory is expected to hold O(1K–10K) records over the project lifetime.
Persistence: index.usearch is written via atomicWriteFileSync; corruption recovery runs via np:memory-rebuild from records.jsonl.

Memory is never committed

.nubos-pilot/memory/ is a runtime-state cache, not source-of-truth. The directory belongs in the consumer-project's .gitignore. Rebuild is deterministic from the source-of-truth files: lib/learnings.cjs learning-store, lib/handoff.cjs handoff-notes, milestone RESEARCH.md, and the Critic-report archive (per ADR-0010 §L5 <report_path>).

CLI

bash

# Bulk-index from a JSON-array or JSONL file (initial seeding)
node .nubos-pilot/bin/np-tools.cjs memory-index \
  --records-file .nubos-pilot/memory/seed.jsonl

# Add a single record
node .nubos-pilot/bin/np-tools.cjs memory-add \
  --type learning --title "use jose for jwt" --body "..." \
  --tags filament,auth --provenance VERIFIED

# Query with optional filters
node .nubos-pilot/bin/np-tools.cjs memory-query \
  --text "filament resource policy" \
  --k 5 --type learning --tags filament

# Force full re-embed (after embedding-model change)
node .nubos-pilot/bin/np-tools.cjs memory-rebuild

# Print stats: count, dim, model, schema_version, created_at, rebuilt_at
node .nubos-pilot/bin/np-tools.cjs memory-stats

Each verb returns its primary result as JSON on stdout. memory.enabled = false makes every verb refuse with a memory-disabled envelope (S-2).

Configuration reference

json

{
  "memory": {
    "enabled": false,
    "provider": "local",
    "model": "Xenova/bge-small-en-v1.5",
    "alpha": 0.6
  }
}

Key	Default	Purpose
`memory.enabled`	`false`	gates the entire layer; refuses every verb with `memory-disabled` when off
`memory.provider`	`"local"`	provider implementation; only `"local"` ships today
`memory.model`	`"Xenova/bge-small-en-v1.5"`	embedding model; mismatch with manifest triggers `memory-rebuild-required`
`memory.alpha`	`0.6`	hybrid-score weight: `α·BM25 + (1−α)·vector`

Nubosloop — Pre-flight (Step 1) hybrid score.
Researcher-Schwarm — pre-recall before external search.
ADR-0014 — full architectural decision record.
ADR-0006 — the precedent for optionalDependencies amendments to ADR-0002.

Vector-Memory ​

Why a vector layer ​

Layout ​

Activation ​

Where it plugs in ​

1. Pre-flight hybrid score (Step 1 of the Nubosloop) ​

2. Researcher pre-recall ​

3. Planner context-injection ​

4. Derived-cache indexing ​

Provider — local default ​

Index engine — usearch ​

Memory is never committed ​

CLI ​

Configuration reference ​

Related ​

Vector-Memory

Why a vector layer

Layout

Activation

Where it plugs in

1. Pre-flight hybrid score (Step 1 of the Nubosloop)

2. Researcher pre-recall

3. Planner context-injection

4. Derived-cache indexing

Provider — local default

Index engine — `usearch`

Memory is never committed

CLI

Configuration reference

Related