Researcher-Schwarm

np:research-phase and np:plan-phase --research spawn swarm.research.k=3 independent np-researcher agents in parallel, lint each output against the researcher-output schema, run a deterministic Mehrheit/Union/Schnittmenge merge, and then spawn np-researcher-reconciler to weigh the per-spawn reasoning traces and write the consumed M<NNN>-RESEARCH.md. A disagreement hard-gate keyed on agreement_score and contested_count blocks workflow completion via askuser when the swarm has not converged.

ADR-0011 ratifies the deterministic merge; ADR-0018 adds the per-spawn schema, the reconciler stage, the disagreement gate, and the Reasoning-Trace classification on top. The merge engine lives in lib/researcher-swarm.cjs; the reconciler helpers live in lib/researcher-reconciler.cjs.

Why a swarm

A single research pass has two failure modes:

Hallucination with confidence. The agent commits to a wrong library version, an outdated pattern, or a fictional method, and presents it with high confidence because nothing contradicts it.
Group-think on pre-existing knowledge. The agent retrieves what it already "knows" and skips searching when the topic feels familiar.

Both fail silently. A k=3 swarm with deterministic merge surfaces the disagreement as a FLAGGED decision (no Mehrheit), and the plan-checker reads the flag and routes verification accordingly.

The 7 steps (ADR-0011 deterministic core + ADR-0018 schema/reconciler/gate)

Steps 1-3 are the original ADR-0011 stack. Step 4 (per-spawn lint), Step 5.5 (reconciler), and Step 5.7 (disagreement gate) come from ADR-0018.

Step 1   — Pre-flight cache lookup (bypass swarm on high-similarity hit)
Step 2   — Spawn k parallel researchers (each Writes spawn-<i>.md, schema-bound)
Step 3   — Deterministic mergeConsensus → research/merge.md (proposal)
Step 4   — Per-spawn output-lint (--enforce) ← ADR-0018 / ADR-0017 hard-gate
Step 5   — Reconciler spawn (sees all k outputs + merge + Reasoning traces)
Step 5.6 — Reconciler-output lint (--enforce) ← ADR-0018 / ADR-0017 hard-gate
Step 5.7 — Disagreement hard-gate (askuser if agreement_score < 0.5 OR contested > 2)

The 4 deterministic-merge steps

Step 1 — Pre-flight cache

lib/knowledge-adapter.cjs::match against the configured store. Defaults:

Knob	Default	Source
`swarm.research.threshold`	`0.9`	Jaccard similarity of token sets (combined score when Vector-Memory is on)
`swarm.research.minOccurrence`	`3`	minimum occurrence count to count as a hit
`swarm.knowledge_adapter`	`"local"`	`local` = BM25 over `.nubos-pilot/knowledge/learnings.json`; the only adapter shipped
`memory.enabled`	`false`	when `true`, the local adapter additionally queries `.nubos-pilot/memory/` and merges via `α·BM25 + (1−α)·vector` (default `α = 0.6`) — see Vector-Memory

A hit short-circuits the swarm: the cached pattern is rendered as RESEARCH.md with provenance [CACHED] and a <consensus_meta> block citing adapter, fingerprint, and occurrence.

Vector pre-recall (agent-side). Each spawned researcher additionally queries np:memory-query (when memory.enabled = true) before issuing external research; matching [VERIFIED] / [CITED] decisions enter the spawn output as [CACHED:VERIFIED] / [CACHED:CITED] without a duplicate web round-trip. See Vector-Memory § Researcher pre-recall.

Step 2 — Parallel spawn

k researchers spawn in parallel. Each receives:

The same <task_query>, word-for-word identical for every spawn. This is the load-bearing property: an identical question across the swarm is what makes the merge a CONSENSUS rather than a divide-and-conquer.
A unique seed_delta from lib/researcher-swarm.cjs::SEED_DELTAS, a perspectival nudge rather than a thematic preference. Nudges vary HOW the spawn investigates (methodology, evidence weighting, contrarian stance, breadth-vs-depth, gap surfacing), never WHAT the answer should prefer.

Why perspectival, not thematic. A thematic seed_delta like "prefer libraries that ship native TypeScript types" makes the spawn rank its answer along that axis. Three thematic deltas produce three rank orders over different axes, the patterns Schnittmenge collapses, and the consensus becomes a fiction. The Spawn Contract in ADR-0011 calls this out as the canonical bypass class: bin/researcher-merge.cjs reports agreement_score near 0 and an empty pattern intersection when it happens.

Litmus test for adding a new entry to SEED_DELTAS: rephrase it as "what does this researcher optimise FOR in their final answer?". If the answer names a concrete solution attribute (TypeScript, smallest deps, latest version), it is thematic and belongs in the planner or architect, not the swarm.

No researcher knows it is one of k. Each believes itself the sole spawn, and that belief is what prevents group-think.

Step 3 — Merge

Each spawn produces a structured object:

json

{
  "decisions":      [{ "claim": "...", "confidence": "HIGH|MEDIUM|LOW", "provenance": "[VERIFIED]|[CITED:url]|[ASSUMED]" }],
  "risks":          [{ "description": "...", "severity": "HIGH|MEDIUM|LOW" }],
  "patterns":       [{ "name": "...", "description": "..." }],
  "open_questions": ["..." | { "question": "...", "blocking_for": "..." }],
  "sources":        [{ "url": "...", "credibility": "HIGH|MEDIUM|LOW", "note": "..." }]
}

mergeConsensus(outputs) runs four rules:

Field	Rule	Why
`decisions`	Mehrheit — `⌈k/2⌉` agreements ⇒ consensus, else `FLAGGED`	Decisions are commitments. Disagreement is a plan-checker signal, not an average to fudge.
`risks`	Union, dedupe by semantic fingerprint, severity = max	Risks are fail-open. Losing a valid risk by majority-vote is a regression.
`patterns`	Schnittmenge ≥ 2 spawns; solo → demoted `[ASSUMED]`	Patterns are fail-closed. A pattern only one spawn saw invites hallucination.
`open_questions` / `sources`	Union with dedupe; credibility = max	Questions and sources are inclusive. Max credibility wins.

Step 4 — Render

lib/researcher-swarm.cjs::renderConsensusToMarkdown writes the merged output to <milestone_dir>/<milestone>-RESEARCH.md with a <consensus_meta> block:

markdown

<consensus_meta>
k: 3
agreement_score: 0.875
flagged_count: 1
</consensus_meta>

np-plan-checker reads <consensus_meta> to weight downstream verdicts. A high flagged_count triggers extra plan-checker scrutiny.

Worked example

Three spawns research "JWT verification stack":

Spawn	Decisions	Risks	Patterns
A	use `jose@6.0.10`	"rotation breaks sessions" (HIGH)	Repository pattern
B	use `jose@6.0.10`	"rate-limit token endpoint" (MEDIUM)	Repository pattern
C	use `jsonwebtoken@9`	"rotation breaks sessions" (MEDIUM)	Service-locator pattern

Merge produces:

Decisions: use jose@6.0.10 (Mehrheit 2/3, accepted); use jsonwebtoken@9 (FLAGGED, solo).
Risks: Union: rotation breaks sessions (HIGH, seen by 2), rate-limit token endpoint (MEDIUM, seen by 1).
Patterns: Repository pattern (Schnittmenge 2/3, accepted); Service-locator (demoted, [ASSUMED]).

The plan-checker sees one accepted decision + one flagged candidate; it can either ask the user or run a follow-up research round.

k-of-1 / k-of-5

swarm.research.k = 1 is supported; it degrades to legacy single-spawn behaviour with no merge metadata.

swarm.research.k > 5 is rejected by lib/researcher-swarm.cjs (MAX_K = 5). Beyond five spawns, the marginal information gain does not justify the token cost.

Cache adapter

swarm.knowledge_adapter:

"local" (default) — lib/knowledge-adapter.cjs routes to lib/learnings.cjs. Storage: .nubos-pilot/knowledge/learnings.json. Similarity: Jaccard over token sets. This is the only adapter shipped.

lib/knowledge-adapter.cjs keeps the adapter seam so additional adapters can be added without touching the swarm logic. Unsupported values fall back to "local" silently.

Agent-native CLI

The cache is also queryable / writable from any runtime that can shell out:

bash

# Match: "have we seen this before?"
node .nubos-pilot/bin/np-tools.cjs learning-match --query "use jose for jwt" \
  --threshold 0.9 --min-occurrence 3

# List: top entries sorted by occurrence (descending)
node .nubos-pilot/bin/np-tools.cjs learning-list --limit 20

# Log: persist a verified pattern (auto-runs on commit when auto_log_learning=true)
node .nubos-pilot/bin/np-tools.cjs learning-log \
  --pattern "use jose for jwt verification" --outcome verified \
  --task-id M001-S001-T0001 --milestone-id M001

learning-log payload carries fingerprint, was_new, occurrence — the agent can confirm the write outcome without re-reading the store. See CLI Commands.

Configuration

.nubos-pilot/config.json:

json

{
  "swarm": {
    "research":         { "k": 3, "threshold": 0.9, "minOccurrence": 3 },
    "knowledge_adapter": "local"
  }
}

CLI overrides per invocation are not currently supported — config is read at spawn time.

Reconciler stage (ADR-0018)

After the deterministic merge produces a research/merge.md proposal, np-researcher-reconciler (tier=sonnet, READ-ONLY on inputs) gets:

All k per-spawn outputs (verbatim).
The merge.md proposal.
The structured merged JSON from node .nubos-pilot/bin/np-tools.cjs researcher-reconcile prepare <N> (so it can read from_spawns, agreement_count, and pre-computed reasoning-trace classifications without re-parsing).
The milestone CONTEXT.md for grounding.

It writes one file: M<NNN>-RESEARCH.md against the research-final schema. The frontmatter exposes agreement_score, contested_count, and reconciler_verdict ∈ {clean, issues_flagged, needs_re_spawn}. The disagreement hard-gate (Step 5.7) reads these to decide whether to askuser.

Reasoning-Trace classification

Per consensus decision, the reconciler compares the **Reasoning:** fields of the spawns that agree on the decision text:

Class	Trigger	Effect on consolidated confidence
orthogonal	distinct reasoning across spawns (Jaccard < 0.6)	promoted to `high`
overlapping	partial reasoning overlap	`max(confidences)` of cited spawns
identical	normalized reasoning matches across spawns	demoted one notch (groupthink)
unknown	< 2 spawns provided a Reasoning field	not promoted; reconciler cites missing data

The Reasoning field is mandatory in the per-spawn schema: output-lint check --schema researcher-output --enforce rejects any spawn output that omits it. The reconciler can therefore always perform the classification on consensus decisions.

Disagreement hard-gate

Defaults (configurable via CLI flags):

min_agreement_score: 0.5 — below this, the reconciler-promoted decisions are too thin to trust.
max_contested: 2 — above this, the swarm is split on too many points to converge mechanically.

When either threshold is violated, node .nubos-pilot/bin/np-tools.cjs researcher-reconcile gate <N> returns needs_askuser: true and the workflow shows:

Researcher-Schwarm konvergiert nicht. Wie weiter?
  1. Re-spawn mit schärferer task_query
  2. Fortfahren mit Reconciler-Pick (Risikoprofil in Frontmatter)
  3. Manuell entscheiden (per Contested Decision picken)

Silent continuation through a low-agreement merge is explicitly forbidden.

Nubosloop — Step 1+2 of the loop maps to this swarm.
Vector-Memory — opt-in semantic layer that augments the BM25 pre-flight via hybrid score.
Output Schemas — strict-enforcement layer powering Step 4 and Step 5.6.
Completeness Doctrine — Rule 9 (Search before building) is enforced by the cache.
ADR-0011 — original swarm architecture.
ADR-0014 — the hybrid-score amendment to Step 1.
ADR-0017 — output-schema enforcement pattern.
ADR-0018 — per-spawn schema + reconciler + disagreement gate + Reasoning-Trace.

Researcher-Schwarm ​

Why a swarm ​

The 7 steps (ADR-0011 deterministic core + ADR-0018 schema/reconciler/gate) ​

The 4 deterministic-merge steps ​

Step 1 — Pre-flight cache ​

Step 2 — Parallel spawn ​

Step 3 — Merge ​

Step 4 — Render ​

Worked example ​

k-of-1 / k-of-5 ​

Cache adapter ​

Agent-native CLI ​

Configuration ​

Reconciler stage (ADR-0018) ​

Reasoning-Trace classification ​

Disagreement hard-gate ​

Related ​