Skip to content

ADR-0011: Researcher-Schwarm with Deterministic Consensus Merge

Context and Problem Statement

np-researcher is a single-agent spawn that produces RESEARCH.md for a milestone. A single LLM research pass has two well-documented failure modes:

  1. Halluzination with confidence: the agent commits to a library version, an architectural pattern, or a "best practice" that is wrong, and presents it with high confidence because no other voice contradicts it.
  2. Group-think on pre-existing knowledge: the agent retrieves what it already "knows" and skips searching, especially when the topic feels familiar.

Both fail silently: the planner consumes the bad RESEARCH.md and writes a milestone plan around the wrong claim. The plan-checker can spot inconsistencies but cannot validate factual claims; it has the same training data as the researcher.

The Nubos AI agent-harness brief specifies the cure: spawn k=3 independent researchers in parallel, then merge their outputs deterministically. Mehrheit for decisions; Union for risks; Schnittmenge for patterns. Independence is enforced by withholding the swarm-membership fact from each spawn, so each researcher believes itself the sole spawn.

The Rule

np:research-phase and np:plan-phase --research invoke lib/researcher-swarm.cjs spawnSwarm({ k }). The default is k=3. The orchestrator:

  1. Pre-flight — call lib/learnings.cjs::matchExistingLearning (via lib/knowledge-adapter.cjs) against .nubos-pilot/knowledge/learnings.json. If a hit at similarity ≥ swarm.research.threshold (default 0.9) and occurrence ≥ swarm.research.minOccurrence (default 3) exists, the cached pattern is rendered as RESEARCH.md with provenance [CACHED] and the swarm is bypassed.
  2. Parallel Spawn — spawn k np-researcher instances in parallel; each receives the same Ticket / CONTEXT / Stack input plus a seed_delta field that nudges its prompt without disclosing the swarm. Each spawn produces a structured output matching the existing RESEARCH.md schema plus a <consensus_meta> block carrying seed_delta and agreement_score.
  3. MergemergeConsensus(outputs) runs three deterministic merge rules over the parsed outputs:
    • Decisions: Mehrheit, at least ⌈k/2⌉ outputs agree on the same prescriptive decision (library, version, pattern). Disagreement → FLAGGED with all candidates listed for plan-checker review.
    • Risks: Union, every risk surfaced by any spawn enters the merged list, deduplicated by semantic-fingerprint normalisation.
    • Patterns: Schnittmenge, only patterns mentioned by ≥ 2 spawns enter the merged output. Solo-spawn patterns are demoted to [ASSUMED] in the merged output.
    • Open Questions / Sources: Union with credibility = max(seen).
  4. Render — the merged output is written to <milestone_dir>/<milestone>-RESEARCH.md with a <consensus_meta> block listing k, agreement_score, flagged_decisions[], and per-spawn seed_delta[] for audit.

agreement_score is the mean of per-decision agreement ratios across all merged decisions: for each decision bucket, compute min(1, agreement / k), then average over all buckets. 1.0 means every decision was unanimous; 1/k means every decision was solo-held. The plan-checker reads this as a single quality scalar; low agreement triggers extra scrutiny on flagged_decisions[].

The k-of-1 case (swarm.research.k = 1) is supported and degrades gracefully to the legacy single-spawn behaviour. k > 5 is rejected by the library: beyond 5 the cost outweighs the marginal information gain.

Decision Drivers

  • Halluzination Resistance: three independent spawns are unlikely to halluzinate the same wrong fact. Disagreement is the signal that triggers Plan-Checker review.
  • Determinism: same input + same k + same merge rules produce the same merged output. Not stochastic.
  • Bounded Cost: k=3 triples the research-token cost over single-spawn but is bounded; the cache-bypass at Pre-flight reclaims most of the cost as the project ages.
  • Independence: each spawn does not know about the others. No shared scratchpad. No iterative refinement. That is the property that prevents group-think.

Considered Options

  • Single-spawn Researcher: current behaviour. Rejected. Hallucinates with confidence; no cross-check.
  • Sequential refinement (Researcher A → B → C, each refining the prior): Rejected. Group-think; later spawns anchor on the first.
  • k=3 with consensus voting on every finding: Rejected. Risks are not opinions; voting drops valid solo findings. Use Union with prefix-tagging instead.
  • k=3 with Mehrheit / Union / Schnittmenge merge: chosen.

Decision Outcome

Chosen: k=3 parallel researchers with the Mehrheit / Union / Schnittmenge merge rules, because it produces a deterministic, halluzination-resistant RESEARCH.md whose <consensus_meta> block is a first-class plan-checker input.

Cache Adapter

The cache lookup goes through lib/knowledge-adapter.cjs. The local adapter (BM25 index over .nubos-pilot/knowledge/) is the only one shipped. The swarm.knowledge_adapter seam keeps the door open for additional adapters without touching the swarm logic.

Consequences

  • Good, because every prescriptive decision in RESEARCH.md carries explicit Mehrheit-vs-FLAGGED provenance.
  • Good, because the cache short-circuits the swarm for repeat patterns. Token cost decays asymptotically.
  • Good, because <consensus_meta> is a deterministic plan-checker input, not a guess.
  • Bad, because k=3 triples first-time research cost. Accepted; cache reclaim + Rule 9 (Search before building) make the trade-off favourable.
  • Bad, because spawn coordination requires the orchestrator to hold three concurrent agent contexts. Accepted; this is exactly the kind of orchestration ADR-0001 frames as in-session.

More Information