Skip to content

ADR-0010: Nubosloop — Build / Verify / Critic / Route

Context and Problem Statement

np:execute-phase historically runs each task as a single Executor → Verifier hop. That is too generous to the Executor: the only adversarial check happens after the commit is in flight, and the verifier audits at milestone scope, not at task scope. Two failure modes follow:

  1. Single-blink failures: the Executor produces code that compiles and verifies in isolation but violates conventions, misses acceptance criteria, or silently introduces dangling threads. There is no per-task adversarial reviewer to catch this before the commit.
  2. No Build → Critic loop: once a task verifies green, the workflow advances. There is no recovery path inside the same task that closes critic findings without rolling the whole milestone back.

The Nubos AI whitepaper (v6.0) and the agent-harness brief specify the cure: every task runs through a Nubosloop. The Executor builds, mechanical checks run, a Critic reviews along three orthogonal axes, findings are routed back to the right agent, and the loop iterates until the Critic reports zero findings or the loop hits a stuck threshold.

The Rule

Every task in np:execute-phase runs through a 6-step Nubosloop. The loop terminates only on (a) zero Critic findings followed by an Atomic Commit, or (b) the orchestrator-enforced maxRounds cap, in which case the task transitions to stuck state and the orchestrator escalates via askuser.

The loop's six steps are:

  1. Pre-flight: lib/learnings.cjs::matchExistingLearning (reached via lib/knowledge-adapter.cjs) checks for cached patterns at similarity ≥ swarm.research.threshold (default 0.9) with occurrence ≥ swarm.research.minOccurrence (default 3). A hit short-circuits the Researcher-Schwarm. Note: lib/knowledge.cjs is a separate BM25 index over project Markdown, a different concern and a different store.
  2. Researcher-Schwarm (on demand): when no cached pattern exists, the orchestrator spawns swarm.research.k=3 independent np-researcher agents and merges their outputs through lib/researcher-swarm.cjs (majority for decisions, union for risks, intersection for patterns). The merged consensus enters the Executor's prompt.
  3. Executor (or Build-Fixer on Round ≥ 2): single np-executor spawn writes code in scope. Round 2+ uses np-build-fixer with the prior Critic findings and verify output appended to its prompt.
  4. Mechanical Checks: the orchestrator runs the task's verify command and stack-specific linters (phpstan / pint / tsc / etc.), then forwards each spawn's tool-use log to node .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use --agent <name> --tool-use-log <json>. The audit (lib/nubosloop.cjs::auditToolUse) emits a rule-9-violation finding when an np-researcher/np-executor/np-build-fixer spawn shipped without invoking any of SEARCH_TOOLS (search-knowledge, match-existing-learning, …). Red verify or any rule-9-violation loops back to Step 3.
  5. Critic: one np-critic agent (sonnet) spawns and emits a single structured findings JSON covering all three axes: style (naming, conventions, dead code, dangling imports), tests (coverage, edge cases, assertion quality), and acceptance (success_criteria from the task plan satisfied with concrete evidence). The full findings JSON is written to disk by the agent; its final-message envelope is small. See §Single-Critic Revision and §Cost Layer L5.
  6. Route + Loop or Commit: lib/nubosloop.cjs runs the routing engine over the merged Critic findings:
    • Style / Bug findings → Executor (np-build-fixer on Round ≥ 2).
    • Information-missing findings → Researcher-Schwarm (next round).
    • Customer-facing question findings → askuser (Temporal-style signal-wait when integrated). Loop until zero findings (→ Atomic Commit per ADR-0004) or loop.maxRounds=3 (→ stuck, escalate, write STATE.md stuck marker).

Auto-log-learning runs on Commit: the merged consensus + the Executor's final diff are persisted via lib/learnings.cjs (through lib/knowledge-adapter.cjs) so future similar tasks bypass the swarm via Step 1.

Decision Drivers

  • Mechanically enforced completeness: Rule 5 / Rule 10 / Rule 12 of the Completeness Doctrine require evidence-backed verdicts and "no silent downgrades". The 3-round loop with stuck escalation enforces this.
  • Orthogonal review axes: three single-axis personas in one critic prompt, surfaced as one structured JSON. Routing is driven by the union of findings + per-criterion verdicts.
  • Determinism: the routing engine maps finding categories to next-spawn destinations by lookup table, not heuristic. The same finding in two different tasks routes the same way.
  • Cost control: the Pre-flight cache bypass, the loop.maxRounds=3 cap, and the L5 Verdict-Only Contract keep the loop's token budget bounded.

Considered Options

  • Single-pass Executor → Verifier: historical behaviour. Rejected. Insufficient adversarial pressure per task; relies on milestone-scope Verifier to catch task-scope flaws.
  • 3-Critic Schwarm with consensus voting: three Critics vote on each finding, majority wins. Rejected. Findings are not opinions; a single Critic's true positive is still a true positive even if the others miss it. We use union-with-priority instead.
  • 3-Critic Schwarm with axis split (np-critic-style haiku, np-critic-tests sonnet, np-critic-acceptance sonnet, parallel): chosen 2026-05-03, superseded 2026-05-05. Production runs against a 46-task milestone showed three parallel spawns added latency without proportional finding-quality gains; the haiku style-critic stalled at the 600s watchdog repeatedly; most findings overlapped across axes and were deduplicated anyway.
  • Single-Critic with three-axis prompt + Verdict-Only Contract: chosen 2026-05-05.
  • Nubosloop with maxRounds=3: chosen.

Decision Outcome

Chosen: Nubosloop with maxRounds=3 and one np-critic agent (sonnet) per round that covers all three audit axes via three module files (agents/np-critic-{style,tests,acceptance}.md, module: true, not spawnable). The loop provides per-task adversarial review, deterministic routing, bounded cost, and a clean escalation path when it cannot converge.

Defaults

  • loop.maxRounds = 3 (configurable in .nubos-pilot/config.json, range [1, 100]).
  • swarm.research.k = 3, swarm.research.threshold = 0.9, swarm.research.minOccurrence = 3.
  • swarm.critic.tier = sonnet (replaces the deprecated style_tier / tests_tier / acceptance_tier triple; legacy keys still honoured by resolve-model for back-compat).
  • spawn.headless.enabled = false, spawn.headless.agents = ['np-critic', 'np-researcher'] (see §L6).
  • auto_log_learning = true.
  • Knowledge source: .nubos-pilot/knowledge/, resolved through the local knowledge adapter (lib/knowledge-adapter.cjs).

Failure Mode (amended 2026-05-05)

When loop.maxRounds is hit, the doctrine is "this task may be mis-planned, discuss with the user", not "give up". The orchestrator's stuck handler in execute-phase.md calls askuser with four concrete options:

  1. Weitermachen (+5 Runden): Loop-Cap wird in-flight um 5 erhöht; Critic bekommt mehr Versuche. Persistiert als nubosloop.max_rounds_override (T3) so /np:resume-work survives a crash with the operator's decision.
  2. Task neu planen (plan-checker): Task wird als Plan-Bug markiert, plan-checker wird aufgerufen, PLAN.md korrigiert, Task neu gestartet.
  3. Task als stuck markieren: Task wird in STATE.md als stuck persistiert, Wave wird abgebrochen.
  4. Manuell fixen, dann resumen: Workflow pausiert hier. Operator editiert Code/Plan und ruft /np:execute-phase nochmal auf.

next_action=plan-checker (locked-decision-violation or infrastructure-mismatch) gets the same askuser treatment with three recovery options (plan-checker re-run, stuck, manual-fix).

Consequences

  • Good, because every task carries an in-loop adversarial check before commit. Rule 5 / Rule 10 / Rule 12 are mechanically enforceable.
  • Good, because cached patterns short-circuit the swarm, so token cost decays as the project ages.
  • Good, because routing is deterministic; the same finding category always lands in the same agent's queue.
  • Good, because stuck is a first-class state, not a silent downgrade.
  • Good, because the L5 Verdict-Only Contract drops critic-axis token cost by ~95% without weakening any audit invariant.
  • Bad, because per-task token cost grows compared to the single-pass model. Accepted: that cost is the price of completeness, and the cache + cap + L5 bound it.
  • Bad, because the orchestrator must coordinate 1 Executor + 1 Critic + occasional Researcher-Schwarm per task. Accepted: that coordination is what makes per-task adversarial review possible.

Single-Critic Revision (amended 2026-05-05)

Earlier revisions specified a Critic-Schwarm of three parallel single-axis critics. Production runs showed three parallel spawns added latency without proportional finding-quality gains. Effective 2026-05-05, the schwarm collapses to one np-critic agent (sonnet) per round that covers all three axes (style + tests + acceptance) in a single structured JSON output.

Modular Critic — four files, one spawn

The single-critic refactor preserves the original three audit-surface specifications as modules (not spawnable agents):

  • agents/np-critic.md (spawnable, sonnet): thin entry point. Defines role, output schema, completeness mandate, Trust-Layer audit.
  • agents/np-critic-style.md (module: true, haiku-tier metadata): Style-axis audit surface.
  • agents/np-critic-tests.md (module: true, sonnet-tier metadata): Tests-axis audit surface.
  • agents/np-critic-acceptance.md (module: true, sonnet-tier metadata): Acceptance-axis audit surface.

The orchestrator's spawn for np-critic MUST inject all three modules into the spawn's <files_to_read> block. The critic loads them via Read, treats their content as canonical audit-truth, and emits ONE merged findings JSON.

The module: true frontmatter flag distinguishes audit-surface modules from spawnable agents. lib/agents.cjs::loadAgent rejects module files with agent-not-spawnable; loadAgentModule rejects non-module files with agent-not-a-module. Tests AG-26..AG-32 enforce the split. loop-audit-tool-use --agent <module-name> is also rejected; modules cannot be audited as if they were spawnable.

Trust Layer (amended 2026-05-04)

The original spec assumed a cooperative orchestrator: each loop-run-round --phase X call was treated as evidence that the corresponding work happened. Multiple production runs proved that assumption wrong. Three failure modes observed in the wild:

  1. Single-pass bypass: executor → commit-task directly, skipping the loop. (Closed by commit-task Layer-A gate.)
  2. Stamp bypass: loop-run-round --phase commit invoked directly without prior phases. (Closed by Layer-B precondition in _runCommit.)
  3. Synthetic-evidence bypass: orchestrator invokes every loop-run-round phase but with hand-written --critic-outputs '[…]' JSON, never actually spawning the critic agent. (Closed by Layer-C audit-trail gate.)

Layer-C — Spawn-evidence audit-trail

Each LLM spawn (researcher, executor, build-fixer, critic) MUST be stamped into the per-task nubosloop.tool_use_audit log via loop-audit-tool-use --task-id … --agent <name> --tool-use-log <json>. The round number is sourced automatically from nubosloop.round.

Phase verbs that consult the log:

  • loop-run-round --phase post-researcher requires swarm.research.k np-researcher audit entries for the current round (k-of-k gate; default k=3). Refuses with loop-post-researcher-missing-spawn-audit otherwise.
  • loop-run-round --phase post-executor requires an audit entry for np-executor (round 1) or np-build-fixer (round ≥ 2). Refuses with loop-post-executor-missing-spawn-audit.
  • loop-run-round --phase post-critics requires an audit entry for np-critic in the current round (single-critic, not the legacy three). Refuses with loop-post-critics-missing-critic-audit.

All four phases (post-researcher, post-executor, post-critics, commit) accept explicit overrides (--force-post-* / --force-commit-phase) for legitimate test fixtures and migration. Each override stamps a corresponding flag on the checkpoint so dashboards can count them.

Round-scope of audits. Layer-C matches agent + round exactly. The checkpoint round advances mechanically:

  • _runPostExecutor stamps round + 1 on verify-red.
  • _runPostCritics stamps round + 1 when next_action ∈ {executor, researcher, askuser}.
  • next_action ∈ {commit, plan-checker, stuck} does NOT advance the round.

Cache-hit auto-log skip (L4). _runCommit skips autoLogLearning when (a) the checkpoint carries cache_hit: true, or (b) --learning-pattern is a sentinel placeholder (<...>), or (c) --learning-pattern is empty.

max_rounds_override (T3). The operator's "+5 Runden" choice persists nubosloop.max_rounds_override so /np:resume-work after a crash survives the decision. _runCommit clears the override unconditionally; _runStuck clears only on --reason ∈ {user-requested-replan, manual-fix-pending}.

Rule-9 audit carry-forward (Gap #1). A Rule-9 violation stamped during a verify-red round is now carried forward by auditFindingsForRound until consumed; markAuditsRoutedForRound stamps routed_in_round after consumption to keep idempotency.

askuser advances the round (Gap #2). next_action=askuser bumps the checkpoint round so the post-reply executor re-spawn must produce a fresh round-N+1 audit.

Defense-in-depth summary

LayerWhereWhat it provesBypass cost
Acommit-task.cjsThe full sequence signature is on the checkpointLie at all five evidence fields
B_runCommitVerify-green AND a post-critics findings array preceded the commit phasePre-write fake verify_exit_code=0 and findings: [] to the checkpoint manually
C_runPostExecutor + _runPostCritics + _runPostResearcherEach declared spawn appears in the per-round audit log; researcher gate requires swarm.research.k entriesIssue extra loop-audit-tool-use calls naming agents that didn't actually run — one bash line per missing agent

What the Trust Layer cannot prove

Layer C still cannot prove that the agent named in an audit entry actually ran. The orchestrator could call loop-audit-tool-use --agent np-critic … without spawning the critic. Closing this gap requires runtime instrumentation; that is "Stufe 2" and tracked separately.

Cost Layer (added 2026-05-05)

The Trust Layer raises the price of dishonesty; the Cost Layer raises the price of honesty. Two failure modes observed alongside the Trust Layer rollout:

  1. Verbose critic returns dominate the per-round token bill. The critic's structured findings JSON (criteria + findings + per-finding remediation prose) routinely runs 2–5 kB. Returning it as the spawn's final message replays it into the parent context every round.
  2. Sub-agent "context isolation" is not context auslagerung. The runtime's native Agent tool isolates the child's context window, but the agent's final message lands verbatim in the parent history.

The Cost Layer addresses both without weakening the Trust Layer: spawn-evidence auditing is unchanged, the routing engine is unchanged, only the transport of critic/researcher output between child and parent contexts changes.

Layer L5 — Verdict-Only Critic Contract

Critics now emit their full findings JSON to a path the orchestrator hands them in the spawn prompt (<report_path>, typically ${TMPDIR}/nubos-pilot/critic-reports/critic-<task-id>-r<round>.json). The spawn's final message, the artefact that lands in parent context, is a small envelope:

json
{ "critic": "critic", "task_id": "M001-S001-T0001", "round": 1,
  "verdict": "passed | issues_found", "blockers_count": 0,
  "report_path": "...", "run_id": "..." }

bin/np-tools/loop-run-round.cjs::_runPostCritics accepts a new --critic-outputs-path <file> flag that reads the on-disk findings JSON directly. Inline --critic-outputs <json> remains accepted (legacy fallback for runtimes without Write capability and migration fixtures), but exactly one of the two MUST be passed; both at once is loop-run-round-post-critics-conflicting-outputs. _runStuck gets the symmetric --findings-path for the stuck-with-findings escalation paths.

The critic's tools frontmatter gains Write. Write is only permitted on the orchestrator-supplied <report_path>. Touching anything else is a Layer-A bypass class.

Failure modes:

  • <report_path> missing in prompt OR Write fails → envelope sets report_path: null, verdict: "issues_found", blockers_count: 1, with an error field. Routing engine treats this as critic-error → stuck.
  • File written but unreadable / not valid JSON / shape mismatch → typed loop-run-round-critic-outputs-path-{unreadable,invalid-json,invalid-shape} codes.
  • Inline-JSON-with-no-file fallback is still routable.

Layer-C audit semantics are unchanged: the orchestrator still calls loop-audit-tool-use --agent np-critic after the spawn returns. The audit doesn't care whether the spawn delivered its findings inline or via file.

Layer L6 — Headless-Subprocess Mode (opt-in)

When the runtime's native Agent tool is the wrong shape, the orchestrator can route critic and researcher spawns through bin/np-tools/spawn-headless.cjs instead. This shells out to claude -p --output-format json as a child process; the spawn's conversation lives entirely outside the parent session and only the final-message JSON is captured to disk.

Config:

json
{
  "spawn": {
    "headless": {
      "enabled": false,
      "agents": ["np-critic", "np-researcher"],
      "timeout_ms": 600000,
      "fallback_on_error": true
    }
  }
}

enabled defaults to false so existing installs see no behaviour change. fallback_on_error: true makes a failed claude -p spawn fall back to the runtime's Agent tool; the fallback is stamped on the checkpoint (nubosloop.spawn_headless_fallbacks[]) so dashboards can count fallback rate.

Trade-offs (intentionally accepted): no shared prompt cache with parent; separate auth (the claude CLI must be on $PATH and authenticated independently, with the NUBOS_PILOT_CLAUDE_BIN env var overriding the binary path); cold-start latency per spawn; no streaming feedback.

Trust-Layer compatibility: loop-audit-tool-use stamp is identical in both paths. The orchestrator MUST call it after spawn-headless returns.

What L6 deliberately does NOT do: it does not headless-spawn the executor (file mutations would not surface through the parent runtime's diff/edit telemetry), it does not move the audit log (audits are still appended by the parent orchestrator), it does not collapse the multiple researcher spawns of the swarm.

Cost Layer summary

LayerWhereWhat it removesCost
L5agents/np-critic.md + loop-run-round.cjsVerbatim findings JSON in parent context every roundCritic now requires Write on its report path
L6bin/np-tools/spawn-headless.cjs + workflow dispatcherWhole spawn conversation in parent context for critic/researcherCold-start per spawn, no shared prompt cache, separate auth

L5 alone is a ~95% reduction on the critic axis with no operational cost; it is the recommended default. L6 stacks on top for installs where parent context is the binding constraint despite L5, and is opt-in.

More Information