ADR-0013: Learnings-Store Schema Evolution Policy

Status: Accepted
Date: 2026-05-03
Supersedes: None
Relates-to: ADR-0010, ADR-0011

Context and Problem Statement

The learnings store at .nubos-pilot/knowledge/learnings.json is the persistent surface ADR-0010's pre-flight cache lookup hits. It accumulates value over project lifetime: every successful task commit appends or merges a learning, and a v1-trained store is the user's only protection against re-paying the Researcher-Schwarm cost.

A STORE_VERSION constant exists (lib/learnings.cjs), but the policy around bumping it was undefined. Without a policy:

Silent wipe risk: a future v2 reader on a v1 store could (in an earlier draft) silently treat the version mismatch as "empty store" and overwrite the user's accumulated patterns.
No migration contract: neither the developer bumping the version nor the user upgrading nubos-pilot has a written guarantee about what happens to existing data.
No per-version recovery procedure: the user has no documented path to recover from a corrupt or out-of-version store.

The Rule

The learnings store is forward-incompatible by default. Reading a store whose version is not the running release's STORE_VERSION:

Throws learnings-store-version-mismatch (NubosPilotError) with details: { expected, got, hint }. Never silently wipe.
Before the throw, the loader runs the MIGRATORS registry from version got upward to expected. If a complete migration chain exists, the migrated store is returned. If any link is missing, the throw fires.
The user can recover by either upgrading nubos-pilot to a release that ships the missing migrator, or by backing up learnings.json and removing it (clean slate).

Bumping `STORE_VERSION`

Each version bump ships:

A new entry in MIGRATORS keyed by the outgoing version (e.g. MIGRATORS[1] = function(v1Store) { return v2Store; }).
A test case in lib/learnings.test.cjs that round-trips a frozen v1 fixture through the migrator and asserts every legacy field maps to the v2 shape.
A note in this ADR appended under "## Version History" (no rewrite; append-only).
A release note in the changelog explaining what changed and what users should do (typically nothing; migration is automatic).

Adding a field forward-compatibly

When the change is additive (e.g. the tokens cache field added in the second review), do not bump STORE_VERSION:

Add the field with sensible defaults at write time (logLearning lazy-fills it).
Make the reader tolerate its absence (Array.isArray(l.tokens) ? l.tokens : _tokenize(l.pattern)).
Document the additive field in this ADR's "## Field Index" table.

This is the cheapest evolution path and should be preferred whenever it preserves the read invariant.

Decision Drivers

Preserve user data: accumulated learnings are the project's compound interest. A bug that wipes them is a permanent loss.
Loud failure: unknown versions throw with hints, not silent recovery. The user finds out immediately.
Migration as code: every breaking change ships its migrator with the release. Deferring migration to "manual cleanup" is a regression.
Forward-compat additive changes free: no version bump for additive fields keeps small improvements cheap.

Considered Options

Silent wipe on mismatch: original behaviour. Rejected (FAIL-4 second review).
Refuse to start until manual remediation: Rejected. Too coarse for additive changes.
Migrator-registry + hard-throw on missing migrator + additive-changes-no-bump policy: chosen.

Decision Outcome

Chosen as documented above. Implementation in lib/learnings.cjs (_readStore, _migrate, MIGRATORS).

Fresh-repo behaviour

When learnings.json does not exist, _readStore returns the empty current-version store ({ version: STORE_VERSION, learnings: [] }) without throwing. The next logLearning creates the file via atomicWriteFileSync. This is the expected onboarding path; every project starts cold.

Outcome semantics

learning.outcome is last-known, not append-only. When a pattern is re-logged, the latest outcome overwrites the prior value. This is documented in lib/learnings.cjs::logLearning JSDoc and surfaced here so callers know not to treat the field as a journal. If a pattern's outcome flips between re-logs, last_seen advances and occurrence increments; those two fields ARE append-stamped.

Field Index (v1)

Field	Type	Required	Notes
`version`	integer	yes	matches `STORE_VERSION`
`learnings[]`	array	yes	learning records below
`learnings[].fingerprint`	string (16-hex)	yes	sha1 of normalized tokens
`learnings[].pattern`	string (≤ `MAX_PATTERN_BYTES`)	yes	original prescriptive sentence
`learnings[].tokens`	array of strings	no (additive)	normalized + sorted unique tokens; lazy-filled
`learnings[].outcome`	string (≤ `MAX_OUTCOME_BYTES`)	yes	last-known outcome
`learnings[].occurrence`	integer	yes	counter incremented on every re-log
`learnings[].first_seen`	ISO-8601 string	yes
`learnings[].last_seen`	ISO-8601 string	yes
`learnings[].task_ids[]`	array of TASK_ID strings	yes	union of every task that triggered the log
`learnings[].milestone_ids[]`	array of MILESTONE_ID strings	yes	union of every milestone

Bounded Store + Eviction

The store is capped to prevent unbounded growth from auto_log_learning:

MAX_LEARNINGS = 1000 — hard limit on entry count.
MAX_STORE_BYTES = 8 * 1024 * 1024 (8 MB) — hard limit on serialized size.

When logLearning would push the store past either cap, eviction runs inside the same lock window as the write:

Sort candidates by (occurrence asc, last_seen asc) so the least-used + oldest entries leave first.
Trim to MAX_LEARNINGS entries.
If the serialized size still exceeds MAX_STORE_BYTES, drop one entry at a time (in eviction order) until it fits.

The newly-added entry is always retained because it has the most-recent last_seen. High-occurrence entries are protected; they evict only after every single-use entry is gone. The eviction is silent (no error, no warning) because hitting the cap is expected steady-state behaviour, not a fault.

learning-list (CLI) returns entries sorted by (occurrence desc, last_seen desc), the inverse of the eviction order, so the most-trusted patterns surface first.

Locking Semantics

logLearning and clearLearnings acquire withFileLock for read-modify-write. Atomic.
matchExistingLearning and listLearnings read without holding a lock. Safe on POSIX (rename-into-place is atomic). A transient race on filesystems where rename is not atomic (NFS, some Windows shares) surfaces as learnings-store-corrupt rather than silent corruption. That is the intended behaviour: fail loud, never silently corrupt.
_writeStore overwrites destructively with no read-merge step. It is intentionally low-level and only used by clearLearnings + tests; new callers wanting upsert semantics MUST use logLearning.

Version History

v1 (2026-05-03) — initial schema. Additive: tokens field (introduced same day, no bump).

More Information

Library: lib/learnings.cjs
Linter: none (the schema is enforced at parse time by _readStore).
Tests: lib/learnings.test.cjs (LRN-12..14 + LRN-VAL-1..5 cover version-mismatch, corruption, and field validation).

ADR-0013: Learnings-Store Schema Evolution Policy ​

Context and Problem Statement ​

The Rule ​

Bumping STORE_VERSION ​

Adding a field forward-compatibly ​

Decision Drivers ​

Considered Options ​

Decision Outcome ​

Fresh-repo behaviour ​

Outcome semantics ​

Field Index (v1) ​

Bounded Store + Eviction ​

Locking Semantics ​

Version History ​

More Information ​