ADR-0018: Researcher-Schwarm — Schema-Bound Outputs, Reconciler Stage, Disagreement Hard-Gate, Reasoning-Trace

Status: Accepted
Date: 2026-05-12
Supersedes: None
Amends: ADR-0011
Relates-to: ADR-0001, ADR-0002, ADR-0010, ADR-0012, ADR-0017

Context and Problem Statement

ADR-0011 defines the Researcher-Schwarm: k parallel np-researcher spawns over the same task_query with deterministic seed_delta nudges, merged by a deterministic consensus pass (majority for decisions, union for risks/sources, intersection for patterns). The doctrine: robust investigations converge on the same conclusions; the intersection picks the robust answers and the workflow proceeds.

In production this didn't hold. Three failure modes, observed by users running real research phases:

Topic-split, not seed-nudge. Each researcher came back with different decisions, different risks, different sources. The intended seed-nudge turned into three spawns each ranking a different axis: intersection ≈ 0, consensus a fiction.
No same-shape contract. The deterministic merge operates on parsed objects (decisions[], risks[], …), but the parse-from-markdown step was the orchestrator's job, unenforced. A spawn that returned prose paragraphs vs structured headings vs tables silently bucketed into hashes that didn't match. The merge appeared to run; the intersection appeared zero; nobody knew whether the swarm disagreed or whether the parser failed.
No deliberation, no disagreement signal. The deterministic merge produced a result regardless. Low agreement ended up as a buried meta-field. Downstream consumers (the planner) trusted the merged output as if it were unanimous.

The user's verbatim observation: "jeder Researcher im Schwarm macht was anderes. Es sollten ja aber alle drei das selbe machen, damit man die beste Lösung findet. Idealerweise diskutieren sie noch miteinander."

This is two related complaints: same-shape investigation, AND a discussion mechanism. ADR-0011's seed_delta was supposed to handle the first via prompt nudges alone; that wasn't enough. The second wasn't even attempted.

The Rule

The Researcher-Schwarm runs in two stages. Stage 1 produces k schema-bound spawn-.md files (researcher-output schema, every Decision / Risk / Pattern carries a mandatory **Reasoning:** field). Stage 2 spawns np-researcher-reconciler, which sees all k spawns plus the deterministic merge proposal and writes the final M<NNN>-RESEARCH.md (research-final schema). A disagreement hard-gate keyed on agreement_score and contested_count blocks workflow completion via askuser when the swarm has not converged.

Together: same-shape inputs (Stage 1), deliberated outputs (Stage 2), visible disagreement (gate). Reasoning-traces let the reconciler classify agreement as orthogonal (strong) vs identical (groupthink).

Decision Drivers

Fail at the cause. Drift in per-spawn output must break at the spawn site, not at merge. output-lint --enforce on each spawn-.md is the only way the merge gets honest inputs (ADR-0017).
Visible disagreement. The workflow must surface low convergence as an askuser, not as a buried meta-field. Silent merges of disagreeing investigations are worse than no merge.
Reasoning over conclusions. Two researchers agreeing on a conclusion via different evidence chains is a stronger signal than two agreeing via the same chain (groupthink). The reconciler classifies and the final artefact records the classification.
Single Write per stage. Stage 1 is one Write per spawn (spawn-.md). Stage 2 is one Write (M<NNN>-RESEARCH.md). No agent writes outside its scope.
No daemon, no chat-stream. "Diskussion" is implemented as one structured second pass that sees all the file evidence, not as a real-time dialogue. Compatible with ADR-0001.
Zero deps. All new code uses node:fs plus the existing lib/frontmatter.cjs. The schema enforcement reuses the output-lint engine from ADR-0017. No new package.json entry.

Considered Options

A: Status quo, ADR-0011 alone. Reject: documented production failures.
B: Schema-only (Stage 1). Add the researcher-output schema; enforce per spawn. Reject as incomplete: fixes same-shape but leaves silent disagreement merges unaddressed.
C: Reconciler-only (Stage 2). Add a second pass without schema enforcement. Reject: reconciler reasoning collapses if inputs are not parseable; groupthink is undetectable without per-entry reasoning fields.
D: Schema + Reconciler + Disagreement-Gate + Reasoning-Trace. Chosen.

Decision Outcome

Chosen: Option D, all four mechanisms in one stack, because:

The schema (Stage 1) is the only way to get honest merge inputs and the only place to enforce the Reasoning-Trace field.
The reconciler (Stage 2) is the user's requested "discussion": file-based, structured, READ-ONLY on per-spawn outputs, single Write on M<NNN>-RESEARCH.md.
The hard-gate is the only way silent disagreement becomes visible: an askuser with three options.
Reasoning-trace classification is what makes the reconciler's verdict trustworthy. Without it, agreement is mere word-overlap; with it, the reconciler distinguishes robust evidence (orthogonal) from groupthink (identical).

Layout

lib/
  researcher-reconciler.cjs       # parseSpawnOutput, classifyReasoningAgreement,
                                  #   reconcileSpawns, prepareReconcilerInput,
                                  #   disagreementGate, gateFromFinalFrontmatter
  researcher-reconciler.test.cjs  # parse, classify, reconcile, gate cases
  schemas/
    researcher-output.cjs         # per-spawn contract (Stage 1)
    research-final.cjs            # reconciler contract (Stage 2)
bin/np-tools/
  researcher-reconcile.cjs        # CLI driver
agents/
  np-researcher.md                # cites the researcher-output schema + Reasoning field
  np-researcher-reconciler.md     # reads all spawns, writes the final RESEARCH.md
workflows/
  research-phase.md               # Stage-1 → lint → merge → Stage-2 → lint → gate

Per-milestone artefacts live under .nubos-pilot/milestones/M<NNN>/research/: spawn-.md (one per spawn), merge.md (the deterministic proposal), and the final M<NNN>-RESEARCH.md one level up in the milestone directory.

Per-spawn output contract (`researcher-output` schema)

Each spawn writes spawn-.md with frontmatter schema_version, agent (enum np-researcher), spawn_index, seed_delta, task_query_hash, and the five count fields decision_count / risk_count / pattern_count / open_question_count / source_count. The body carries the sections ## Decisions, ## Risks, ## Patterns, ## Open Questions, ## Sources (empty sections use the _None._ marker). Each ### D-N / ### R-N / ### P-N block must contain a **Reasoning:** field, mechanically required by the schema.

The Reasoning field is the crux. classifyReasoningAgreement compares normalized Reasoning text across the spawns that share a bucket:

identical: same normalized text across all traces → groupthink risk.
overlapping: pairwise Jaccard > 0.6 → the default classification.
orthogonal: distinct reasoning, no pair over the Jaccard threshold → strongest signal.
unknown: fewer than two spawns provided Reasoning → cannot promote.
single: fewer than two entries in the bucket.

Reconciler output contract (`research-final` schema)

np-researcher-reconciler writes M<NNN>-RESEARCH.md with frontmatter schema_version, milestone, type (enum research), agent (enum np-researcher-reconciler), k, agreement_score, contested_count, reconciler_verdict (enum clean | issues_flagged | needs_re_spawn), and the five count fields. The body carries ## Reconciler Summary, ## Final Decisions, ## Contested Decisions (_None._ if all agree), ## Final Risks, ## Final Patterns, ## Final Open Questions, ## Sources.

Reconciliation logic

reconcileSpawns buckets every parsed entry by normalized text. A bucket seen by ≥ min(2, k) spawns is consolidated; the rest are contested. Decisions split into final_decisions (consolidated) and contested; risks, open questions, and sources keep both consolidated and contested entries; patterns keep only the consolidated set (intersection semantics, per ADR-0011). The agreement_score for decisions is consolidated / (consolidated + contested), or 1 when there are no decisions.

Disagreement hard-gate

disagreementGate is keyed on two thresholds (DEFAULTS: min_agreement_score = 0.5, max_contested = 2). It raises needs_askuser when agreement_score < min_agreement_score (violation agreement-score-low) OR contested_count > max_contested (violation too-many-contested). The workflow then presents an askuser with three options:

Re-spawn with a sharper task_query: the user refines the query and re-runs /np:research-phase.
Continue with the reconciler picks: the workflow proceeds; the agreement metrics stay in frontmatter for the plan-checker to weight.
Decide manually: the workflow surfaces each contested decision; the user picks per item.

gateFromFinalFrontmatter runs the same gate against an already-written M<NNN>-RESEARCH.md, reading agreement_score and contested_count straight from frontmatter, used by the workflow's post-write step and by np:doctor.

Cache-bypass interaction

When the Pre-flight learning-cache hits at high similarity, Stages 1–2 are skipped entirely: the cache hit is rendered straight into M<NNN>-RESEARCH.md. The schema enforcement still applies via that rendering path, which emits research-final-conforming frontmatter (reconciler_verdict: clean, agreement_score: 1.0, contested_count: 0, k: 0) and a [CACHED] provenance marker in the body, preserving ADR-0011's provenance semantics.

Consequences

Good, because real-world drift fails at the producing spawn, not at merge or at a downstream consumer.
Good, because disagreement is visible: users see the askuser dialog instead of a silently-merged inconsistent RESEARCH.md.
Good, because groupthink is detectable: reasoning-trace classification flags identical-evidence consensus as weak; the planner gets a calibrated signal.
Good, because contested decisions are first-class: they get their own section with per-spawn verdicts; contested_count is machine-readable in frontmatter.
Good, because per-spawn outputs persist under research/spawn-.md, an audit surface for retrospectives.
Bad, because one additional spawn per research phase (the reconciler) is a real cost increment. Mitigated by cache-bypass when applicable.
Bad, because two new schemas (researcher-output, research-final) must evolve in lockstep with the agent prompts. Test fixtures cover the common shapes.
Bad, because the reconciler prompt is bigger: it sees all k per-spawn outputs plus the merge proposal. Mitigated by running the reconciler at a lower tier than the researchers.
Neutral, because the deterministic merge stays in the loop. It is now the proposal, not the final answer; its agreement score is one input to the reconciler's verdict and to the gate.

Pattern Conformance

S-2 NubosPilotError envelope — researcher-spawn-missing, researcher-spawn-frontmatter, researcher-reconcile-no-research-dir, researcher-reconcile-no-spawn-files.
S-5 sandboxed tests — lib/researcher-reconciler.test.cjs uses in-memory spawn fixtures; no shared state.
S-6 CJS module footer — lib/researcher-reconciler.cjs and both new lib/schemas/*.cjs end with module.exports.

Migration plan

Land the schemas, lib/researcher-reconciler.cjs, the CLI, the agent, and the workflow updates.
Existing in-flight research phases (old single-spawn merged RESEARCH.md, no per-spawn artefacts) remain valid. Schema enforcement applies to new spawns only.
np:doctor surfaces old RESEARCH.md files lacking the new frontmatter as output-schema-violation (covered by ADR-0017).
min_agreement_score and max_contested are exposed via the researcher-reconcile CLI flags; persistent overrides via .nubos-pilot/config.json come in a follow-up if real-world data shows the defaults need tuning.

More Information

Library: lib/researcher-reconciler.cjs, tests lib/researcher-reconciler.test.cjs.
Schemas: lib/schemas/researcher-output.cjs, lib/schemas/research-final.cjs (registered in lib/schemas/index.cjs).
CLI verb: bin/np-tools/researcher-reconcile.cjs.
Agents: agents/np-researcher.md (updated), agents/np-researcher-reconciler.md (new).
Workflow: workflows/research-phase.md — Stage-1 spawn → per-spawn lint → deterministic merge → reconciler spawn → research-final lint → disagreement gate.
Related ADRs:
- ADR-0011: amended, so the deterministic consensus is now the proposal, not the final answer.
- ADR-0017: the researcher-output and research-final schemas run on its validator engine.
- ADR-0001: the "discussion" is a structured file-based second pass, not a live dialogue.
- ADR-0002: preserved; pure Node built-ins.

Origin: user feedback 2026-05-12 — "jeder Researcher im Schwarm macht was anderes … Idealerweise diskutieren sie noch miteinander." — observed after running real research phases under ADR-0011. The root cause split into four mechanisms; all four are addressed in this single ADR to avoid leaving half-fixes that re-introduce the same drift.

ADR-0018: Researcher-Schwarm — Schema-Bound Outputs, Reconciler Stage, Disagreement Hard-Gate, Reasoning-Trace ​

Context and Problem Statement ​

The Rule ​

Decision Drivers ​

Considered Options ​

Decision Outcome ​

Layout ​

Per-spawn output contract (researcher-output schema) ​

Reconciler output contract (research-final schema) ​

Reconciliation logic ​

Disagreement hard-gate ​

Cache-bypass interaction ​

Consequences ​

Pattern Conformance ​

Migration plan ​

More Information ​