Appearance
ADR-0010: Nubosloop — Build / Verify / Critic / Route
- Status: Accepted
- Date: 2026-05-03 (last amended 2026-05-05)
- Supersedes: None
- Relates-to: ADR-0001, ADR-0004, ADR-0011, ADR-0012
Context and Problem Statement
np:execute-phase historically runs each task as a single Executor → Verifier hop. That is too generous to the Executor: the only adversarial check happens after the commit is in flight, and the verifier audits at milestone scope, not at task scope. Two failure modes follow:
- Single-blink failures: the Executor produces code that compiles and verifies in isolation but violates conventions, misses acceptance criteria, or silently introduces dangling threads. There is no per-task adversarial reviewer to catch this before the commit.
- No Build → Critic loop: once a task verifies green, the workflow advances. There is no recovery path inside the same task that closes critic findings without rolling the whole milestone back.
The Nubos AI whitepaper (v6.0) and the agent-harness brief specify the cure: every task runs through a Nubosloop. The Executor builds, mechanical checks run, a Critic reviews along three orthogonal axes, findings are routed back to the right agent, and the loop iterates until the Critic reports zero findings or the loop hits a stuck threshold.
The Rule
Every task in np:execute-phase runs through a 6-step Nubosloop. The loop terminates only on (a) zero Critic findings followed by an Atomic Commit, or (b) the orchestrator-enforced maxRounds cap, in which case the task transitions to stuck state and the orchestrator escalates via askuser.
The loop's six steps are:
- Pre-flight:
lib/learnings.cjs::matchExistingLearning(reached vialib/knowledge-adapter.cjs) checks for cached patterns at similarity ≥swarm.research.threshold(default0.9) withoccurrence ≥ swarm.research.minOccurrence(default3). A hit short-circuits the Researcher-Schwarm. Note:lib/knowledge.cjsis a separate BM25 index over project Markdown, a different concern and a different store. - Researcher-Schwarm (on demand): when no cached pattern exists, the orchestrator spawns
swarm.research.k=3independentnp-researcheragents and merges their outputs throughlib/researcher-swarm.cjs(majority for decisions, union for risks, intersection for patterns). The merged consensus enters the Executor's prompt. - Executor (or Build-Fixer on Round ≥ 2): single
np-executorspawn writes code in scope. Round 2+ usesnp-build-fixerwith the prior Critic findings and verify output appended to its prompt. - Mechanical Checks: the orchestrator runs the task's
verifycommand and stack-specific linters (phpstan/pint/tsc/ etc.), then forwards each spawn's tool-use log tonode .nubos-pilot/bin/np-tools.cjs loop-audit-tool-use --agent <name> --tool-use-log <json>. The audit (lib/nubosloop.cjs::auditToolUse) emits arule-9-violationfinding when annp-researcher/np-executor/np-build-fixerspawn shipped without invoking any ofSEARCH_TOOLS(search-knowledge,match-existing-learning, …). Red verify or any rule-9-violation loops back to Step 3. - Critic: one
np-criticagent (sonnet) spawns and emits a single structured findings JSON covering all three axes: style (naming, conventions, dead code, dangling imports), tests (coverage, edge cases, assertion quality), and acceptance (success_criteriafrom the task plan satisfied with concrete evidence). The full findings JSON is written to disk by the agent; its final-message envelope is small. See §Single-Critic Revision and §Cost Layer L5. - Route + Loop or Commit:
lib/nubosloop.cjsruns the routing engine over the merged Critic findings:- Style / Bug findings → Executor (
np-build-fixeron Round ≥ 2). - Information-missing findings → Researcher-Schwarm (next round).
- Customer-facing question findings →
askuser(Temporal-style signal-wait when integrated). Loop until zero findings (→ Atomic Commit per ADR-0004) orloop.maxRounds=3(→stuck, escalate, writeSTATE.mdstuckmarker).
- Style / Bug findings → Executor (
Auto-log-learning runs on Commit: the merged consensus + the Executor's final diff are persisted via lib/learnings.cjs (through lib/knowledge-adapter.cjs) so future similar tasks bypass the swarm via Step 1.
Decision Drivers
- Mechanically enforced completeness: Rule 5 / Rule 10 / Rule 12 of the Completeness Doctrine require evidence-backed verdicts and "no silent downgrades". The 3-round loop with
stuckescalation enforces this. - Orthogonal review axes: three single-axis personas in one critic prompt, surfaced as one structured JSON. Routing is driven by the union of findings + per-criterion verdicts.
- Determinism: the routing engine maps finding categories to next-spawn destinations by lookup table, not heuristic. The same finding in two different tasks routes the same way.
- Cost control: the Pre-flight cache bypass, the
loop.maxRounds=3cap, and the L5 Verdict-Only Contract keep the loop's token budget bounded.
Considered Options
- Single-pass Executor → Verifier: historical behaviour. Rejected. Insufficient adversarial pressure per task; relies on milestone-scope Verifier to catch task-scope flaws.
- 3-Critic Schwarm with consensus voting: three Critics vote on each finding, majority wins. Rejected. Findings are not opinions; a single Critic's true positive is still a true positive even if the others miss it. We use union-with-priority instead.
- 3-Critic Schwarm with axis split (
np-critic-stylehaiku,np-critic-testssonnet,np-critic-acceptancesonnet, parallel): chosen 2026-05-03, superseded 2026-05-05. Production runs against a 46-task milestone showed three parallel spawns added latency without proportional finding-quality gains; the haiku style-critic stalled at the 600s watchdog repeatedly; most findings overlapped across axes and were deduplicated anyway. - Single-Critic with three-axis prompt + Verdict-Only Contract: chosen 2026-05-05.
- Nubosloop with
maxRounds=3: chosen.
Decision Outcome
Chosen: Nubosloop with maxRounds=3 and one np-critic agent (sonnet) per round that covers all three audit axes via three module files (agents/np-critic-{style,tests,acceptance}.md, module: true, not spawnable). The loop provides per-task adversarial review, deterministic routing, bounded cost, and a clean escalation path when it cannot converge.
Defaults
loop.maxRounds = 3(configurable in.nubos-pilot/config.json, range[1, 100]).swarm.research.k = 3,swarm.research.threshold = 0.9,swarm.research.minOccurrence = 3.swarm.critic.tier = sonnet(replaces the deprecatedstyle_tier/tests_tier/acceptance_tiertriple; legacy keys still honoured byresolve-modelfor back-compat).spawn.headless.enabled = false,spawn.headless.agents = ['np-critic', 'np-researcher'](see §L6).auto_log_learning = true.- Knowledge source:
.nubos-pilot/knowledge/, resolved through the local knowledge adapter (lib/knowledge-adapter.cjs).
Failure Mode (amended 2026-05-05)
When loop.maxRounds is hit, the doctrine is "this task may be mis-planned, discuss with the user", not "give up". The orchestrator's stuck handler in execute-phase.md calls askuser with four concrete options:
- Weitermachen (+5 Runden): Loop-Cap wird in-flight um 5 erhöht; Critic bekommt mehr Versuche. Persistiert als
nubosloop.max_rounds_override(T3) so/np:resume-worksurvives a crash with the operator's decision. - Task neu planen (plan-checker): Task wird als Plan-Bug markiert, plan-checker wird aufgerufen, PLAN.md korrigiert, Task neu gestartet.
- Task als stuck markieren: Task wird in STATE.md als stuck persistiert, Wave wird abgebrochen.
- Manuell fixen, dann resumen: Workflow pausiert hier. Operator editiert Code/Plan und ruft
/np:execute-phasenochmal auf.
next_action=plan-checker (locked-decision-violation or infrastructure-mismatch) gets the same askuser treatment with three recovery options (plan-checker re-run, stuck, manual-fix).
Consequences
- Good, because every task carries an in-loop adversarial check before commit. Rule 5 / Rule 10 / Rule 12 are mechanically enforceable.
- Good, because cached patterns short-circuit the swarm, so token cost decays as the project ages.
- Good, because routing is deterministic; the same finding category always lands in the same agent's queue.
- Good, because
stuckis a first-class state, not a silent downgrade. - Good, because the L5 Verdict-Only Contract drops critic-axis token cost by ~95% without weakening any audit invariant.
- Bad, because per-task token cost grows compared to the single-pass model. Accepted: that cost is the price of completeness, and the cache + cap + L5 bound it.
- Bad, because the orchestrator must coordinate 1 Executor + 1 Critic + occasional Researcher-Schwarm per task. Accepted: that coordination is what makes per-task adversarial review possible.
Single-Critic Revision (amended 2026-05-05)
Earlier revisions specified a Critic-Schwarm of three parallel single-axis critics. Production runs showed three parallel spawns added latency without proportional finding-quality gains. Effective 2026-05-05, the schwarm collapses to one np-critic agent (sonnet) per round that covers all three axes (style + tests + acceptance) in a single structured JSON output.
Modular Critic — four files, one spawn
The single-critic refactor preserves the original three audit-surface specifications as modules (not spawnable agents):
agents/np-critic.md(spawnable, sonnet): thin entry point. Defines role, output schema, completeness mandate, Trust-Layer audit.agents/np-critic-style.md(module: true, haiku-tier metadata): Style-axis audit surface.agents/np-critic-tests.md(module: true, sonnet-tier metadata): Tests-axis audit surface.agents/np-critic-acceptance.md(module: true, sonnet-tier metadata): Acceptance-axis audit surface.
The orchestrator's spawn for np-critic MUST inject all three modules into the spawn's <files_to_read> block. The critic loads them via Read, treats their content as canonical audit-truth, and emits ONE merged findings JSON.
The module: true frontmatter flag distinguishes audit-surface modules from spawnable agents. lib/agents.cjs::loadAgent rejects module files with agent-not-spawnable; loadAgentModule rejects non-module files with agent-not-a-module. Tests AG-26..AG-32 enforce the split. loop-audit-tool-use --agent <module-name> is also rejected; modules cannot be audited as if they were spawnable.
Trust Layer (amended 2026-05-04)
The original spec assumed a cooperative orchestrator: each loop-run-round --phase X call was treated as evidence that the corresponding work happened. Multiple production runs proved that assumption wrong. Three failure modes observed in the wild:
- Single-pass bypass:
executor → commit-taskdirectly, skipping the loop. (Closed bycommit-taskLayer-A gate.) - Stamp bypass:
loop-run-round --phase commitinvoked directly without prior phases. (Closed by Layer-B precondition in_runCommit.) - Synthetic-evidence bypass: orchestrator invokes every
loop-run-roundphase but with hand-written--critic-outputs '[…]'JSON, never actually spawning the critic agent. (Closed by Layer-C audit-trail gate.)
Layer-C — Spawn-evidence audit-trail
Each LLM spawn (researcher, executor, build-fixer, critic) MUST be stamped into the per-task nubosloop.tool_use_audit log via loop-audit-tool-use --task-id … --agent <name> --tool-use-log <json>. The round number is sourced automatically from nubosloop.round.
Phase verbs that consult the log:
loop-run-round --phase post-researcherrequiresswarm.research.knp-researcheraudit entries for the current round (k-of-k gate; default k=3). Refuses withloop-post-researcher-missing-spawn-auditotherwise.loop-run-round --phase post-executorrequires an audit entry fornp-executor(round 1) ornp-build-fixer(round ≥ 2). Refuses withloop-post-executor-missing-spawn-audit.loop-run-round --phase post-criticsrequires an audit entry fornp-criticin the current round (single-critic, not the legacy three). Refuses withloop-post-critics-missing-critic-audit.
All four phases (post-researcher, post-executor, post-critics, commit) accept explicit overrides (--force-post-* / --force-commit-phase) for legitimate test fixtures and migration. Each override stamps a corresponding flag on the checkpoint so dashboards can count them.
Round-scope of audits. Layer-C matches agent + round exactly. The checkpoint round advances mechanically:
_runPostExecutorstampsround + 1onverify-red._runPostCriticsstampsround + 1whennext_action ∈ {executor, researcher, askuser}.next_action ∈ {commit, plan-checker, stuck}does NOT advance the round.
Cache-hit auto-log skip (L4). _runCommit skips autoLogLearning when (a) the checkpoint carries cache_hit: true, or (b) --learning-pattern is a sentinel placeholder (<...>), or (c) --learning-pattern is empty.
max_rounds_override (T3). The operator's "+5 Runden" choice persists nubosloop.max_rounds_override so /np:resume-work after a crash survives the decision. _runCommit clears the override unconditionally; _runStuck clears only on --reason ∈ {user-requested-replan, manual-fix-pending}.
Rule-9 audit carry-forward (Gap #1). A Rule-9 violation stamped during a verify-red round is now carried forward by auditFindingsForRound until consumed; markAuditsRoutedForRound stamps routed_in_round after consumption to keep idempotency.
askuser advances the round (Gap #2). next_action=askuser bumps the checkpoint round so the post-reply executor re-spawn must produce a fresh round-N+1 audit.
Defense-in-depth summary
| Layer | Where | What it proves | Bypass cost |
|---|---|---|---|
| A | commit-task.cjs | The full sequence signature is on the checkpoint | Lie at all five evidence fields |
| B | _runCommit | Verify-green AND a post-critics findings array preceded the commit phase | Pre-write fake verify_exit_code=0 and findings: [] to the checkpoint manually |
| C | _runPostExecutor + _runPostCritics + _runPostResearcher | Each declared spawn appears in the per-round audit log; researcher gate requires swarm.research.k entries | Issue extra loop-audit-tool-use calls naming agents that didn't actually run — one bash line per missing agent |
What the Trust Layer cannot prove
Layer C still cannot prove that the agent named in an audit entry actually ran. The orchestrator could call loop-audit-tool-use --agent np-critic … without spawning the critic. Closing this gap requires runtime instrumentation; that is "Stufe 2" and tracked separately.
Cost Layer (added 2026-05-05)
The Trust Layer raises the price of dishonesty; the Cost Layer raises the price of honesty. Two failure modes observed alongside the Trust Layer rollout:
- Verbose critic returns dominate the per-round token bill. The critic's structured findings JSON (criteria + findings + per-finding remediation prose) routinely runs 2–5 kB. Returning it as the spawn's final message replays it into the parent context every round.
- Sub-agent "context isolation" is not context auslagerung. The runtime's native Agent tool isolates the child's context window, but the agent's final message lands verbatim in the parent history.
The Cost Layer addresses both without weakening the Trust Layer: spawn-evidence auditing is unchanged, the routing engine is unchanged, only the transport of critic/researcher output between child and parent contexts changes.
Layer L5 — Verdict-Only Critic Contract
Critics now emit their full findings JSON to a path the orchestrator hands them in the spawn prompt (<report_path>, typically ${TMPDIR}/nubos-pilot/critic-reports/critic-<task-id>-r<round>.json). The spawn's final message, the artefact that lands in parent context, is a small envelope:
json
{ "critic": "critic", "task_id": "M001-S001-T0001", "round": 1,
"verdict": "passed | issues_found", "blockers_count": 0,
"report_path": "...", "run_id": "..." }bin/np-tools/loop-run-round.cjs::_runPostCritics accepts a new --critic-outputs-path <file> flag that reads the on-disk findings JSON directly. Inline --critic-outputs <json> remains accepted (legacy fallback for runtimes without Write capability and migration fixtures), but exactly one of the two MUST be passed; both at once is loop-run-round-post-critics-conflicting-outputs. _runStuck gets the symmetric --findings-path for the stuck-with-findings escalation paths.
The critic's tools frontmatter gains Write. Write is only permitted on the orchestrator-supplied <report_path>. Touching anything else is a Layer-A bypass class.
Failure modes:
<report_path>missing in prompt ORWritefails → envelope setsreport_path: null,verdict: "issues_found",blockers_count: 1, with anerrorfield. Routing engine treats this ascritic-error → stuck.- File written but unreadable / not valid JSON / shape mismatch → typed
loop-run-round-critic-outputs-path-{unreadable,invalid-json,invalid-shape}codes. - Inline-JSON-with-no-file fallback is still routable.
Layer-C audit semantics are unchanged: the orchestrator still calls loop-audit-tool-use --agent np-critic after the spawn returns. The audit doesn't care whether the spawn delivered its findings inline or via file.
Layer L6 — Headless-Subprocess Mode (opt-in)
When the runtime's native Agent tool is the wrong shape, the orchestrator can route critic and researcher spawns through bin/np-tools/spawn-headless.cjs instead. This shells out to claude -p --output-format json as a child process; the spawn's conversation lives entirely outside the parent session and only the final-message JSON is captured to disk.
Config:
json
{
"spawn": {
"headless": {
"enabled": false,
"agents": ["np-critic", "np-researcher"],
"timeout_ms": 600000,
"fallback_on_error": true
}
}
}enabled defaults to false so existing installs see no behaviour change. fallback_on_error: true makes a failed claude -p spawn fall back to the runtime's Agent tool; the fallback is stamped on the checkpoint (nubosloop.spawn_headless_fallbacks[]) so dashboards can count fallback rate.
Trade-offs (intentionally accepted): no shared prompt cache with parent; separate auth (the claude CLI must be on $PATH and authenticated independently, with the NUBOS_PILOT_CLAUDE_BIN env var overriding the binary path); cold-start latency per spawn; no streaming feedback.
Trust-Layer compatibility: loop-audit-tool-use stamp is identical in both paths. The orchestrator MUST call it after spawn-headless returns.
What L6 deliberately does NOT do: it does not headless-spawn the executor (file mutations would not surface through the parent runtime's diff/edit telemetry), it does not move the audit log (audits are still appended by the parent orchestrator), it does not collapse the multiple researcher spawns of the swarm.
Cost Layer summary
| Layer | Where | What it removes | Cost |
|---|---|---|---|
| L5 | agents/np-critic.md + loop-run-round.cjs | Verbatim findings JSON in parent context every round | Critic now requires Write on its report path |
| L6 | bin/np-tools/spawn-headless.cjs + workflow dispatcher | Whole spawn conversation in parent context for critic/researcher | Cold-start per spawn, no shared prompt cache, separate auth |
L5 alone is a ~95% reduction on the critic axis with no operational cost; it is the recommended default. L6 stacks on top for installs where parent context is the binding constraint despite L5, and is opt-in.
More Information
- Concept: Nubosloop, Findings Routing.
- Library:
lib/nubosloop.cjsin the source tree. - Source ADR:
docs/adr/0010-nubosloop.md. - Related: ADR-0011, ADR-0012.
