Appearance
Researcher-Schwarm
np:research-phase and np:plan-phase --research spawn swarm.research.k=3 independent np-researcher agents in parallel, lint each output against the researcher-output schema, run a deterministic Mehrheit/Union/Schnittmenge merge, and then spawn np-researcher-reconciler to weigh the per-spawn reasoning traces and write the consumed M<NNN>-RESEARCH.md. A disagreement hard-gate keyed on agreement_score and contested_count blocks workflow completion via askuser when the swarm has not converged.
ADR-0011 ratifies the deterministic merge; ADR-0018 adds the per-spawn schema, the reconciler stage, the disagreement gate, and the Reasoning-Trace classification on top. The merge engine lives in lib/researcher-swarm.cjs; the reconciler helpers live in lib/researcher-reconciler.cjs.
Why a swarm
A single research pass has two failure modes:
- Hallucination with confidence. The agent commits to a wrong library version, an outdated pattern, or a fictional method, and presents it with high confidence because nothing contradicts it.
- Group-think on pre-existing knowledge. The agent retrieves what it already "knows" and skips searching when the topic feels familiar.
Both fail silently. A k=3 swarm with deterministic merge surfaces the disagreement as a FLAGGED decision (no Mehrheit), and the plan-checker reads the flag and routes verification accordingly.
The 7 steps (ADR-0011 deterministic core + ADR-0018 schema/reconciler/gate)
Steps 1-3 are the original ADR-0011 stack. Step 4 (per-spawn lint), Step 5.5 (reconciler), and Step 5.7 (disagreement gate) come from ADR-0018.
Step 1 — Pre-flight cache lookup (bypass swarm on high-similarity hit)
Step 2 — Spawn k parallel researchers (each Writes spawn-<i>.md, schema-bound)
Step 3 — Deterministic mergeConsensus → research/merge.md (proposal)
Step 4 — Per-spawn output-lint (--enforce) ← ADR-0018 / ADR-0017 hard-gate
Step 5 — Reconciler spawn (sees all k outputs + merge + Reasoning traces)
Step 5.6 — Reconciler-output lint (--enforce) ← ADR-0018 / ADR-0017 hard-gate
Step 5.7 — Disagreement hard-gate (askuser if agreement_score < 0.5 OR contested > 2)The 4 deterministic-merge steps
Step 1 — Pre-flight cache
lib/knowledge-adapter.cjs::match against the configured store. Defaults:
| Knob | Default | Source |
|---|---|---|
swarm.research.threshold | 0.9 | Jaccard similarity of token sets (combined score when Vector-Memory is on) |
swarm.research.minOccurrence | 3 | minimum occurrence count to count as a hit |
swarm.knowledge_adapter | "local" | local = BM25 over .nubos-pilot/knowledge/learnings.json; the only adapter shipped |
memory.enabled | false | when true, the local adapter additionally queries .nubos-pilot/memory/ and merges via α·BM25 + (1−α)·vector (default α = 0.6) — see Vector-Memory |
A hit short-circuits the swarm: the cached pattern is rendered as RESEARCH.md with provenance [CACHED] and a <consensus_meta> block citing adapter, fingerprint, and occurrence.
Vector pre-recall (agent-side). Each spawned researcher additionally queries np:memory-query (when memory.enabled = true) before issuing external research; matching [VERIFIED] / [CITED] decisions enter the spawn output as [CACHED:VERIFIED] / [CACHED:CITED] without a duplicate web round-trip. See Vector-Memory § Researcher pre-recall.
Step 2 — Parallel spawn
k researchers spawn in parallel. Each receives:
- The same
<task_query>, word-for-word identical for every spawn. This is the load-bearing property: an identical question across the swarm is what makes the merge a CONSENSUS rather than a divide-and-conquer. - A unique
seed_deltafromlib/researcher-swarm.cjs::SEED_DELTAS, a perspectival nudge rather than a thematic preference. Nudges vary HOW the spawn investigates (methodology, evidence weighting, contrarian stance, breadth-vs-depth, gap surfacing), never WHAT the answer should prefer.
Why perspectival, not thematic. A thematic seed_delta like "prefer libraries that ship native TypeScript types" makes the spawn rank its answer along that axis. Three thematic deltas produce three rank orders over different axes, the patterns Schnittmenge collapses, and the consensus becomes a fiction. The Spawn Contract in ADR-0011 calls this out as the canonical bypass class: bin/researcher-merge.cjs reports agreement_score near 0 and an empty pattern intersection when it happens.
Litmus test for adding a new entry to SEED_DELTAS: rephrase it as "what does this researcher optimise FOR in their final answer?". If the answer names a concrete solution attribute (TypeScript, smallest deps, latest version), it is thematic and belongs in the planner or architect, not the swarm.
No researcher knows it is one of k. Each believes itself the sole spawn, and that belief is what prevents group-think.
Step 3 — Merge
Each spawn produces a structured object:
json
{
"decisions": [{ "claim": "...", "confidence": "HIGH|MEDIUM|LOW", "provenance": "[VERIFIED]|[CITED:url]|[ASSUMED]" }],
"risks": [{ "description": "...", "severity": "HIGH|MEDIUM|LOW" }],
"patterns": [{ "name": "...", "description": "..." }],
"open_questions": ["..." | { "question": "...", "blocking_for": "..." }],
"sources": [{ "url": "...", "credibility": "HIGH|MEDIUM|LOW", "note": "..." }]
}mergeConsensus(outputs) runs four rules:
| Field | Rule | Why |
|---|---|---|
decisions | Mehrheit — ⌈k/2⌉ agreements ⇒ consensus, else FLAGGED | Decisions are commitments. Disagreement is a plan-checker signal, not an average to fudge. |
risks | Union, dedupe by semantic fingerprint, severity = max | Risks are fail-open. Losing a valid risk by majority-vote is a regression. |
patterns | Schnittmenge ≥ 2 spawns; solo → demoted [ASSUMED] | Patterns are fail-closed. A pattern only one spawn saw invites hallucination. |
open_questions / sources | Union with dedupe; credibility = max | Questions and sources are inclusive. Max credibility wins. |
Step 4 — Render
lib/researcher-swarm.cjs::renderConsensusToMarkdown writes the merged output to <milestone_dir>/<milestone>-RESEARCH.md with a <consensus_meta> block:
markdown
<consensus_meta>
k: 3
agreement_score: 0.875
flagged_count: 1
</consensus_meta>np-plan-checker reads <consensus_meta> to weight downstream verdicts. A high flagged_count triggers extra plan-checker scrutiny.
Worked example
Three spawns research "JWT verification stack":
| Spawn | Decisions | Risks | Patterns |
|---|---|---|---|
| A | use jose@6.0.10 | "rotation breaks sessions" (HIGH) | Repository pattern |
| B | use jose@6.0.10 | "rate-limit token endpoint" (MEDIUM) | Repository pattern |
| C | use jsonwebtoken@9 | "rotation breaks sessions" (MEDIUM) | Service-locator pattern |
Merge produces:
- Decisions:
use jose@6.0.10(Mehrheit 2/3, accepted);use jsonwebtoken@9(FLAGGED, solo). - Risks: Union: rotation breaks sessions (HIGH, seen by 2), rate-limit token endpoint (MEDIUM, seen by 1).
- Patterns: Repository pattern (Schnittmenge 2/3, accepted); Service-locator (demoted, [ASSUMED]).
The plan-checker sees one accepted decision + one flagged candidate; it can either ask the user or run a follow-up research round.
k-of-1 / k-of-5
swarm.research.k = 1 is supported; it degrades to legacy single-spawn behaviour with no merge metadata.
swarm.research.k > 5 is rejected by lib/researcher-swarm.cjs (MAX_K = 5). Beyond five spawns, the marginal information gain does not justify the token cost.
Cache adapter
swarm.knowledge_adapter:
"local"(default) —lib/knowledge-adapter.cjsroutes tolib/learnings.cjs. Storage:.nubos-pilot/knowledge/learnings.json. Similarity: Jaccard over token sets. This is the only adapter shipped.
lib/knowledge-adapter.cjs keeps the adapter seam so additional adapters can be added without touching the swarm logic. Unsupported values fall back to "local" silently.
Agent-native CLI
The cache is also queryable / writable from any runtime that can shell out:
bash
# Match: "have we seen this before?"
node .nubos-pilot/bin/np-tools.cjs learning-match --query "use jose for jwt" \
--threshold 0.9 --min-occurrence 3
# List: top entries sorted by occurrence (descending)
node .nubos-pilot/bin/np-tools.cjs learning-list --limit 20
# Log: persist a verified pattern (auto-runs on commit when auto_log_learning=true)
node .nubos-pilot/bin/np-tools.cjs learning-log \
--pattern "use jose for jwt verification" --outcome verified \
--task-id M001-S001-T0001 --milestone-id M001learning-log payload carries fingerprint, was_new, occurrence — the agent can confirm the write outcome without re-reading the store. See CLI Commands.
Configuration
.nubos-pilot/config.json:
json
{
"swarm": {
"research": { "k": 3, "threshold": 0.9, "minOccurrence": 3 },
"knowledge_adapter": "local"
}
}CLI overrides per invocation are not currently supported — config is read at spawn time.
Reconciler stage (ADR-0018)
After the deterministic merge produces a research/merge.md proposal, np-researcher-reconciler (tier=sonnet, READ-ONLY on inputs) gets:
- All
kper-spawn outputs (verbatim). - The
merge.mdproposal. - The structured
mergedJSON fromnode .nubos-pilot/bin/np-tools.cjs researcher-reconcile prepare <N>(so it can readfrom_spawns,agreement_count, and pre-computed reasoning-trace classifications without re-parsing). - The milestone CONTEXT.md for grounding.
It writes one file: M<NNN>-RESEARCH.md against the research-final schema. The frontmatter exposes agreement_score, contested_count, and reconciler_verdict ∈ {clean, issues_flagged, needs_re_spawn}. The disagreement hard-gate (Step 5.7) reads these to decide whether to askuser.
Reasoning-Trace classification
Per consensus decision, the reconciler compares the **Reasoning:** fields of the spawns that agree on the decision text:
| Class | Trigger | Effect on consolidated confidence |
|---|---|---|
| orthogonal | distinct reasoning across spawns (Jaccard < 0.6) | promoted to high |
| overlapping | partial reasoning overlap | max(confidences) of cited spawns |
| identical | normalized reasoning matches across spawns | demoted one notch (groupthink) |
| unknown | < 2 spawns provided a Reasoning field | not promoted; reconciler cites missing data |
The Reasoning field is mandatory in the per-spawn schema: output-lint check --schema researcher-output --enforce rejects any spawn output that omits it. The reconciler can therefore always perform the classification on consensus decisions.
Disagreement hard-gate
Defaults (configurable via CLI flags):
min_agreement_score: 0.5 — below this, the reconciler-promoted decisions are too thin to trust.max_contested: 2 — above this, the swarm is split on too many points to converge mechanically.
When either threshold is violated, node .nubos-pilot/bin/np-tools.cjs researcher-reconcile gate <N> returns needs_askuser: true and the workflow shows:
Researcher-Schwarm konvergiert nicht. Wie weiter?
1. Re-spawn mit schärferer task_query
2. Fortfahren mit Reconciler-Pick (Risikoprofil in Frontmatter)
3. Manuell entscheiden (per Contested Decision picken)Silent continuation through a low-agreement merge is explicitly forbidden.
Related
- Nubosloop — Step 1+2 of the loop maps to this swarm.
- Vector-Memory — opt-in semantic layer that augments the BM25 pre-flight via hybrid score.
- Output Schemas — strict-enforcement layer powering Step 4 and Step 5.6.
- Completeness Doctrine — Rule 9 (Search before building) is enforced by the cache.
- ADR-0011 — original swarm architecture.
- ADR-0014 — the hybrid-score amendment to Step 1.
- ADR-0017 — output-schema enforcement pattern.
- ADR-0018 — per-spawn schema + reconciler + disagreement gate + Reasoning-Trace.
