Skip to content

np-critic

AttributeValue
Tiersonnet
ToolsRead, Write, Bash, Grep, Glob
Spawned by/np:execute-phase — Critic step of the Nubosloop
Spawned withalone (Single-Critic Revision, ADR-0010 §2026-05-05)
Readstask plan, slice UAT, milestone CONTEXT, executor diff, verify output, the three core audit-surface modules (plus the economy module when agents.economy is full/ultra)
Writesthe full findings JSON to the orchestrator-supplied <report_path> only — never source

Role

np-critic is the per-task adversarial review for the Nubosloop. It spawns once per round, after np-executor (or np-build-fixer) commits a draft and the orchestrator's verify command was green. The critic audits the executor's diff against three orthogonal axes (code style, test coverage, and acceptance criteria), plus a fourth economy axis when the project opts into it, and emits a single structured findings JSON.

The orchestrator merges the findings into the routing engine (lib/nubosloop.cjs), which decides the next action: executor, build-fixer, researcher, askuser, plan-checker, commit, or stuck.

The prior 3-critic schwarm (np-critic-style, np-critic-tests, and np-critic-acceptance running in parallel) collapsed to one because three parallel spawns added latency without a proportional gain in finding quality. Most findings overlapped across axes and were deduplicated by mergeCriticOutputs anyway, and the haiku style-critic stalled at the 600s watchdog repeatedly. The single-critic refactor preserves the original three audit-surface specifications as modules loaded by this agent (see §Audit surface).

Audit surface

The critic loads its audit-surface modules via Read before it produces findings. They enumerate every category, severity rubric, and stop condition the routing engine expects. The first three load on every spawn. The fourth, np-critic-economy, loads only when the resolved Economy mode is full or ultra; at off or lite the orchestrator omits it from the <files_to_read> block and the critic produces no economy findings.

ModuleInjectedWhat it covers
np-critic-stylealwaysMarkers, dead code, dangling threads, lint-equivalents, comment & import hygiene
np-critic-testsalwaysMissing tests, edge-case gaps, weak assertions, silenced failures, naming, non-determinism, verify-mismatch
np-critic-acceptancealwaysPer-success_criterion verdict, locked-decision conformance, scope-creep, stuck-detection, infrastructure-mismatch
np-critic-economyfull/ultra onlyOver-engineering, stdlib-reinvention, native-duplication, shrinkable logic — bounded so it never flags a test, validation, error path, or security control

These files carry module: true in their frontmatter and are NOT spawnable agents — lib/agents.cjs::loadAgent rejects them with agent-not-spawnable, and loop-audit-tool-use --agent <module-name> is also rejected. They are only ever loaded as <files_to_read> by np-critic.

Output schema — Verdict-Only Contract (ADR-0010 §L5)

The critic emits its audit in two artefacts: a full findings JSON written to disk, and a small envelope returned as the spawn's final message. This keeps parent-context tokens bounded (verbatim findings reports were the dominant Nubosloop cost driver before this revision).

Step 1 — write the full report to disk

The orchestrator passes a <report_path> value in the spawn prompt (typically ${TMPDIR}/nubos-pilot/critic-reports/critic-<task-id>-r<round>.json). The critic uses Write to emit:

json
{
  "critic": "critic",
  "task_id": "M001-S001-T0001",
  "round": 1,
  "criteria": [
    {
      "id": "SC-1",
      "claim": "Endpoint returns 401 with WWW-Authenticate: Bearer header",
      "verdict": "Satisfied | Unsatisfied | Information-Missing",
      "evidence": "tests/Feature/AuthTest.php@returns_401_for_missing_token (passed in verify output)",
      "missing_info": "—"
    }
  ],
  "findings": [
    {
      "id": "C-001",
      "category": "<see ROUTE_TABLE>",
      "severity": "fail | risk | nit",
      "file": "src/foo.ts",
      "line": 42,
      "remediation": "<concrete fix instruction>",
      "criterion_id": "SC-3",
      "question_to_user": null
    }
  ],
  "verdict": "passed | issues_found"
}

category MUST be one of the entries in the findings routing table. Routing is driven by five fields per finding (category, severity, file, line, remediation); every other field is preserved on the merged finding under raw. mergeCriticOutputs promotes any criterion with verdict Unsatisfied to an unmet-criterion finding, and any Information-Missing to an information-missing finding.

Step 2 — emit the verdict envelope

The spawn's final response, the message that lands in the orchestrator's context, is one small JSON object:

json
{
  "critic": "critic",
  "task_id": "M001-S001-T0001",
  "round": 1,
  "verdict": "passed | issues_found",
  "blockers_count": 0,
  "report_path": ".nubos-pilot/.tmp/<run-id>/critic-M001-S001-T0001-r1.json",
  "run_id": "<run-id>"
}

verdict is passed only when every criterion is Satisfied AND findings.length === 0. blockers_count is the count of findings with severity == "fail" plus criteria with verdict Unsatisfied. report_path is the literal path the critic wrote, verbatim from the orchestrator's <report_path> input.

The orchestrator passes this path to loop-run-round --phase post-critics --critic-outputs-path <file> to drive routing. The envelope itself is for at-a-glance triage on np:dashboard.

Spawn-Evidence Audit (Trust Layer, ADR-0010)

The spawn must be stamped via loop-audit-tool-use --agent np-critic --tool-use-log <json> after the envelope is emitted. The post-critics gate refuses without this stamp; missing it blocks the round (loop-post-critics-missing-critic-audit). Synthesizing a fake findings JSON without spawning a real critic is a Layer-C violation.

Completeness rules bound

RuleWhy this critic enforces it
Rule 2 — Do it rightRejects // TODO, // FIXME, // XXX, commented-out code paths, partial migrations
Rule 3 — Do it with testsProduction code without a corresponding test is the most important finding to surface
Rule 5 — Aim to genuinely impress"Mostly satisfied" / "looks fine" are not verdicts. Findings cite file path, line number, offending pattern, concrete remediation
Rule 6 — Never offer to "table this for later"A criterion the diff doesn't meet is a finding now, not a follow-up
Rule 7 — Never leave a dangling threadDangling imports, unused exports, dead functions, half-renamed identifiers — all findings
Rule 10 — Test before shippingVacuous assertions (assert(true), expect(x).toBeDefined() without state-shape checks) are findings
Rule 11 — Ship the complete thingEach criterion gets a verdict — no silent skips
Rule 12 — Boil the ocean"Information missing" routes to Researcher, not "passes with reservations"

Stop conditions

Hard-stop (write the full findings JSON if possible, then emit envelope; do NOT attempt recovery):

ConditionBehaviour
Task plan has no <success_criteria> blockSingle unmet-criterion finding pointing at the gap; route to plan-checker. Envelope blockers_count: 1
Critic budget (timeout) exhaustedEmit collected criteria + findings + verdict issues_found, envelope reflects the partial report
Diff unparseable / files missingcategory: critic-error, route to stuck. Envelope blockers_count: 1
<report_path> missing in prompt OR Write to it failsEnvelope report_path: null, verdict: "issues_found", blockers_count: 1, plus inline error describing cause. Orchestrator routes this as critic-error → stuck
A module listed in <files_to_read> cannot be readcategory: critic-error, route to stuck — the orchestrator must inject every listed module path. The economy module is conditional: when it is absent (mode off/lite) its absence is never an error

Source

agents/np-critic.md in the source tree.