Appearance
ADR-0018: Researcher-Schwarm — Schema-Bound Outputs, Reconciler Stage, Disagreement Hard-Gate, Reasoning-Trace
- Status: Accepted
- Date: 2026-05-12
- Supersedes: None
- Amends: ADR-0011
- Relates-to: ADR-0001, ADR-0002, ADR-0010, ADR-0012, ADR-0017
Context and Problem Statement
ADR-0011 defines the Researcher-Schwarm: k parallel np-researcher spawns over the same task_query with deterministic seed_delta nudges, merged by a deterministic consensus pass (majority for decisions, union for risks/sources, intersection for patterns). The doctrine: robust investigations converge on the same conclusions; the intersection picks the robust answers and the workflow proceeds.
In production this didn't hold. Three failure modes, observed by users running real research phases:
- Topic-split, not seed-nudge. Each researcher came back with different decisions, different risks, different sources. The intended seed-nudge turned into three spawns each ranking a different axis: intersection ≈ 0, consensus a fiction.
- No same-shape contract. The deterministic merge operates on parsed objects (
decisions[],risks[], …), but the parse-from-markdown step was the orchestrator's job, unenforced. A spawn that returned prose paragraphs vs structured headings vs tables silently bucketed into hashes that didn't match. The merge appeared to run; the intersection appeared zero; nobody knew whether the swarm disagreed or whether the parser failed. - No deliberation, no disagreement signal. The deterministic merge produced a result regardless. Low agreement ended up as a buried meta-field. Downstream consumers (the planner) trusted the merged output as if it were unanimous.
The user's verbatim observation: "jeder Researcher im Schwarm macht was anderes. Es sollten ja aber alle drei das selbe machen, damit man die beste Lösung findet. Idealerweise diskutieren sie noch miteinander."
This is two related complaints: same-shape investigation, AND a discussion mechanism. ADR-0011's seed_delta was supposed to handle the first via prompt nudges alone; that wasn't enough. The second wasn't even attempted.
The Rule
The Researcher-Schwarm runs in two stages. Stage 1 produces k schema-bound spawn-<i>.md files (researcher-output schema, every Decision / Risk / Pattern carries a mandatory **Reasoning:** field). Stage 2 spawns np-researcher-reconciler, which sees all k spawns plus the deterministic merge proposal and writes the final M<NNN>-RESEARCH.md (research-final schema). A disagreement hard-gate keyed on agreement_score and contested_count blocks workflow completion via askuser when the swarm has not converged.
Together: same-shape inputs (Stage 1), deliberated outputs (Stage 2), visible disagreement (gate). Reasoning-traces let the reconciler classify agreement as orthogonal (strong) vs identical (groupthink).
Decision Drivers
- Fail at the cause. Drift in per-spawn output must break at the spawn site, not at merge.
output-lint --enforceon eachspawn-<i>.mdis the only way the merge gets honest inputs (ADR-0017). - Visible disagreement. The workflow must surface low convergence as an askuser, not as a buried meta-field. Silent merges of disagreeing investigations are worse than no merge.
- Reasoning over conclusions. Two researchers agreeing on a conclusion via different evidence chains is a stronger signal than two agreeing via the same chain (groupthink). The reconciler classifies and the final artefact records the classification.
- Single Write per stage. Stage 1 is one Write per spawn (
spawn-<i>.md). Stage 2 is one Write (M<NNN>-RESEARCH.md). No agent writes outside its scope. - No daemon, no chat-stream. "Diskussion" is implemented as one structured second pass that sees all the file evidence, not as a real-time dialogue. Compatible with ADR-0001.
- Zero deps. All new code uses
node:fsplus the existinglib/frontmatter.cjs. The schema enforcement reuses theoutput-lintengine from ADR-0017. No new package.json entry.
Considered Options
- A: Status quo, ADR-0011 alone. Reject: documented production failures.
- B: Schema-only (Stage 1). Add the
researcher-outputschema; enforce per spawn. Reject as incomplete: fixes same-shape but leaves silent disagreement merges unaddressed. - C: Reconciler-only (Stage 2). Add a second pass without schema enforcement. Reject: reconciler reasoning collapses if inputs are not parseable; groupthink is undetectable without per-entry reasoning fields.
- D: Schema + Reconciler + Disagreement-Gate + Reasoning-Trace. Chosen.
Decision Outcome
Chosen: Option D, all four mechanisms in one stack, because:
- The schema (Stage 1) is the only way to get honest merge inputs and the only place to enforce the Reasoning-Trace field.
- The reconciler (Stage 2) is the user's requested "discussion": file-based, structured, READ-ONLY on per-spawn outputs, single Write on
M<NNN>-RESEARCH.md. - The hard-gate is the only way silent disagreement becomes visible: an askuser with three options.
- Reasoning-trace classification is what makes the reconciler's verdict trustworthy. Without it, agreement is mere word-overlap; with it, the reconciler distinguishes robust evidence (orthogonal) from groupthink (identical).
Layout
lib/
researcher-reconciler.cjs # parseSpawnOutput, classifyReasoningAgreement,
# reconcileSpawns, prepareReconcilerInput,
# disagreementGate, gateFromFinalFrontmatter
researcher-reconciler.test.cjs # parse, classify, reconcile, gate cases
schemas/
researcher-output.cjs # per-spawn contract (Stage 1)
research-final.cjs # reconciler contract (Stage 2)
bin/np-tools/
researcher-reconcile.cjs # CLI driver
agents/
np-researcher.md # cites the researcher-output schema + Reasoning field
np-researcher-reconciler.md # reads all spawns, writes the final RESEARCH.md
workflows/
research-phase.md # Stage-1 → lint → merge → Stage-2 → lint → gatePer-milestone artefacts live under .nubos-pilot/milestones/M<NNN>/research/: spawn-<i>.md (one per spawn), merge.md (the deterministic proposal), and the final M<NNN>-RESEARCH.md one level up in the milestone directory.
Per-spawn output contract (researcher-output schema)
Each spawn writes spawn-<i>.md with frontmatter schema_version, agent (enum np-researcher), spawn_index, seed_delta, task_query_hash, and the five count fields decision_count / risk_count / pattern_count / open_question_count / source_count. The body carries the sections ## Decisions, ## Risks, ## Patterns, ## Open Questions, ## Sources (empty sections use the _None._ marker). Each ### D-N / ### R-N / ### P-N block must contain a **Reasoning:** field, mechanically required by the schema.
The Reasoning field is the crux. classifyReasoningAgreement compares normalized Reasoning text across the spawns that share a bucket:
- identical: same normalized text across all traces → groupthink risk.
- overlapping: pairwise Jaccard > 0.6 → the default classification.
- orthogonal: distinct reasoning, no pair over the Jaccard threshold → strongest signal.
- unknown: fewer than two spawns provided Reasoning → cannot promote.
- single: fewer than two entries in the bucket.
Reconciler output contract (research-final schema)
np-researcher-reconciler writes M<NNN>-RESEARCH.md with frontmatter schema_version, milestone, type (enum research), agent (enum np-researcher-reconciler), k, agreement_score, contested_count, reconciler_verdict (enum clean | issues_flagged | needs_re_spawn), and the five count fields. The body carries ## Reconciler Summary, ## Final Decisions, ## Contested Decisions (_None._ if all agree), ## Final Risks, ## Final Patterns, ## Final Open Questions, ## Sources.
Reconciliation logic
reconcileSpawns buckets every parsed entry by normalized text. A bucket seen by ≥ min(2, k) spawns is consolidated; the rest are contested. Decisions split into final_decisions (consolidated) and contested; risks, open questions, and sources keep both consolidated and contested entries; patterns keep only the consolidated set (intersection semantics, per ADR-0011). The agreement_score for decisions is consolidated / (consolidated + contested), or 1 when there are no decisions.
Disagreement hard-gate
disagreementGate is keyed on two thresholds (DEFAULTS: min_agreement_score = 0.5, max_contested = 2). It raises needs_askuser when agreement_score < min_agreement_score (violation agreement-score-low) OR contested_count > max_contested (violation too-many-contested). The workflow then presents an askuser with three options:
- Re-spawn with a sharper
task_query: the user refines the query and re-runs/np:research-phase. - Continue with the reconciler picks: the workflow proceeds; the agreement metrics stay in frontmatter for the plan-checker to weight.
- Decide manually: the workflow surfaces each contested decision; the user picks per item.
gateFromFinalFrontmatter runs the same gate against an already-written M<NNN>-RESEARCH.md, reading agreement_score and contested_count straight from frontmatter, used by the workflow's post-write step and by np:doctor.
Cache-bypass interaction
When the Pre-flight learning-cache hits at high similarity, Stages 1–2 are skipped entirely: the cache hit is rendered straight into M<NNN>-RESEARCH.md. The schema enforcement still applies via that rendering path, which emits research-final-conforming frontmatter (reconciler_verdict: clean, agreement_score: 1.0, contested_count: 0, k: 0) and a [CACHED] provenance marker in the body, preserving ADR-0011's provenance semantics.
Consequences
- Good, because real-world drift fails at the producing spawn, not at merge or at a downstream consumer.
- Good, because disagreement is visible: users see the askuser dialog instead of a silently-merged inconsistent
RESEARCH.md. - Good, because groupthink is detectable: reasoning-trace classification flags identical-evidence consensus as weak; the planner gets a calibrated signal.
- Good, because contested decisions are first-class: they get their own section with per-spawn verdicts;
contested_countis machine-readable in frontmatter. - Good, because per-spawn outputs persist under
research/spawn-<i>.md, an audit surface for retrospectives. - Bad, because one additional spawn per research phase (the reconciler) is a real cost increment. Mitigated by cache-bypass when applicable.
- Bad, because two new schemas (
researcher-output,research-final) must evolve in lockstep with the agent prompts. Test fixtures cover the common shapes. - Bad, because the reconciler prompt is bigger: it sees all
kper-spawn outputs plus the merge proposal. Mitigated by running the reconciler at a lower tier than the researchers. - Neutral, because the deterministic merge stays in the loop. It is now the proposal, not the final answer; its agreement score is one input to the reconciler's verdict and to the gate.
Pattern Conformance
- S-2 NubosPilotError envelope —
researcher-spawn-missing,researcher-spawn-frontmatter,researcher-reconcile-no-research-dir,researcher-reconcile-no-spawn-files. - S-5 sandboxed tests —
lib/researcher-reconciler.test.cjsuses in-memory spawn fixtures; no shared state. - S-6 CJS module footer —
lib/researcher-reconciler.cjsand both newlib/schemas/*.cjsend withmodule.exports.
Migration plan
- Land the schemas,
lib/researcher-reconciler.cjs, the CLI, the agent, and the workflow updates. - Existing in-flight research phases (old single-spawn merged
RESEARCH.md, no per-spawn artefacts) remain valid. Schema enforcement applies to new spawns only. np:doctorsurfaces oldRESEARCH.mdfiles lacking the new frontmatter asoutput-schema-violation(covered by ADR-0017).min_agreement_scoreandmax_contestedare exposed via theresearcher-reconcileCLI flags; persistent overrides via.nubos-pilot/config.jsoncome in a follow-up if real-world data shows the defaults need tuning.
More Information
- Library:
lib/researcher-reconciler.cjs, testslib/researcher-reconciler.test.cjs. - Schemas:
lib/schemas/researcher-output.cjs,lib/schemas/research-final.cjs(registered inlib/schemas/index.cjs). - CLI verb:
bin/np-tools/researcher-reconcile.cjs. - Agents:
agents/np-researcher.md(updated),agents/np-researcher-reconciler.md(new). - Workflow:
workflows/research-phase.md— Stage-1 spawn → per-spawn lint → deterministic merge → reconciler spawn → research-final lint → disagreement gate. - Related ADRs:
- ADR-0011: amended, so the deterministic consensus is now the proposal, not the final answer.
- ADR-0017: the
researcher-outputandresearch-finalschemas run on its validator engine. - ADR-0001: the "discussion" is a structured file-based second pass, not a live dialogue.
- ADR-0002: preserved; pure Node built-ins.
Origin: user feedback 2026-05-12 — "jeder Researcher im Schwarm macht was anderes … Idealerweise diskutieren sie noch miteinander." — observed after running real research phases under ADR-0011. The root cause split into four mechanisms; all four are addressed in this single ADR to avoid leaving half-fixes that re-introduce the same drift.
