Appearance
np-critic
| Attribute | Value |
|---|---|
| Tier | sonnet |
| Tools | Read, Write, Bash, Grep, Glob |
| Spawned by | /np:execute-phase — Critic step of the Nubosloop |
| Spawned with | alone (Single-Critic Revision, ADR-0010 §2026-05-05) |
| Reads | task plan, slice UAT, milestone CONTEXT, executor diff, verify output, the three core audit-surface modules (plus the economy module when agents.economy is full/ultra) |
| Writes | the full findings JSON to the orchestrator-supplied <report_path> only — never source |
Role
np-critic is the per-task adversarial review for the Nubosloop. It spawns once per round, after np-executor (or np-build-fixer) commits a draft and the orchestrator's verify command was green. The critic audits the executor's diff against three orthogonal axes (code style, test coverage, and acceptance criteria), plus a fourth economy axis when the project opts into it, and emits a single structured findings JSON.
The orchestrator merges the findings into the routing engine (lib/nubosloop.cjs), which decides the next action: executor, build-fixer, researcher, askuser, plan-checker, commit, or stuck.
The prior 3-critic schwarm (np-critic-style, np-critic-tests, and np-critic-acceptance running in parallel) collapsed to one because three parallel spawns added latency without a proportional gain in finding quality. Most findings overlapped across axes and were deduplicated by mergeCriticOutputs anyway, and the haiku style-critic stalled at the 600s watchdog repeatedly. The single-critic refactor preserves the original three audit-surface specifications as modules loaded by this agent (see §Audit surface).
Audit surface
The critic loads its audit-surface modules via Read before it produces findings. They enumerate every category, severity rubric, and stop condition the routing engine expects. The first three load on every spawn. The fourth, np-critic-economy, loads only when the resolved Economy mode is full or ultra; at off or lite the orchestrator omits it from the <files_to_read> block and the critic produces no economy findings.
| Module | Injected | What it covers |
|---|---|---|
np-critic-style | always | Markers, dead code, dangling threads, lint-equivalents, comment & import hygiene |
np-critic-tests | always | Missing tests, edge-case gaps, weak assertions, silenced failures, naming, non-determinism, verify-mismatch |
np-critic-acceptance | always | Per-success_criterion verdict, locked-decision conformance, scope-creep, stuck-detection, infrastructure-mismatch |
np-critic-economy | full/ultra only | Over-engineering, stdlib-reinvention, native-duplication, shrinkable logic — bounded so it never flags a test, validation, error path, or security control |
These files carry module: true in their frontmatter and are NOT spawnable agents — lib/agents.cjs::loadAgent rejects them with agent-not-spawnable, and loop-audit-tool-use --agent <module-name> is also rejected. They are only ever loaded as <files_to_read> by np-critic.
Output schema — Verdict-Only Contract (ADR-0010 §L5)
The critic emits its audit in two artefacts: a full findings JSON written to disk, and a small envelope returned as the spawn's final message. This keeps parent-context tokens bounded (verbatim findings reports were the dominant Nubosloop cost driver before this revision).
Step 1 — write the full report to disk
The orchestrator passes a <report_path> value in the spawn prompt (typically ${TMPDIR}/nubos-pilot/critic-reports/critic-<task-id>-r<round>.json). The critic uses Write to emit:
json
{
"critic": "critic",
"task_id": "M001-S001-T0001",
"round": 1,
"criteria": [
{
"id": "SC-1",
"claim": "Endpoint returns 401 with WWW-Authenticate: Bearer header",
"verdict": "Satisfied | Unsatisfied | Information-Missing",
"evidence": "tests/Feature/AuthTest.php@returns_401_for_missing_token (passed in verify output)",
"missing_info": "—"
}
],
"findings": [
{
"id": "C-001",
"category": "<see ROUTE_TABLE>",
"severity": "fail | risk | nit",
"file": "src/foo.ts",
"line": 42,
"remediation": "<concrete fix instruction>",
"criterion_id": "SC-3",
"question_to_user": null
}
],
"verdict": "passed | issues_found"
}category MUST be one of the entries in the findings routing table. Routing is driven by five fields per finding (category, severity, file, line, remediation); every other field is preserved on the merged finding under raw. mergeCriticOutputs promotes any criterion with verdict Unsatisfied to an unmet-criterion finding, and any Information-Missing to an information-missing finding.
Step 2 — emit the verdict envelope
The spawn's final response, the message that lands in the orchestrator's context, is one small JSON object:
json
{
"critic": "critic",
"task_id": "M001-S001-T0001",
"round": 1,
"verdict": "passed | issues_found",
"blockers_count": 0,
"report_path": ".nubos-pilot/.tmp/<run-id>/critic-M001-S001-T0001-r1.json",
"run_id": "<run-id>"
}verdict is passed only when every criterion is Satisfied AND findings.length === 0. blockers_count is the count of findings with severity == "fail" plus criteria with verdict Unsatisfied. report_path is the literal path the critic wrote, verbatim from the orchestrator's <report_path> input.
The orchestrator passes this path to loop-run-round --phase post-critics --critic-outputs-path <file> to drive routing. The envelope itself is for at-a-glance triage on np:dashboard.
Spawn-Evidence Audit (Trust Layer, ADR-0010)
The spawn must be stamped via loop-audit-tool-use --agent np-critic --tool-use-log <json> after the envelope is emitted. The post-critics gate refuses without this stamp; missing it blocks the round (loop-post-critics-missing-critic-audit). Synthesizing a fake findings JSON without spawning a real critic is a Layer-C violation.
Completeness rules bound
| Rule | Why this critic enforces it |
|---|---|
| Rule 2 — Do it right | Rejects // TODO, // FIXME, // XXX, commented-out code paths, partial migrations |
| Rule 3 — Do it with tests | Production code without a corresponding test is the most important finding to surface |
| Rule 5 — Aim to genuinely impress | "Mostly satisfied" / "looks fine" are not verdicts. Findings cite file path, line number, offending pattern, concrete remediation |
| Rule 6 — Never offer to "table this for later" | A criterion the diff doesn't meet is a finding now, not a follow-up |
| Rule 7 — Never leave a dangling thread | Dangling imports, unused exports, dead functions, half-renamed identifiers — all findings |
| Rule 10 — Test before shipping | Vacuous assertions (assert(true), expect(x).toBeDefined() without state-shape checks) are findings |
| Rule 11 — Ship the complete thing | Each criterion gets a verdict — no silent skips |
| Rule 12 — Boil the ocean | "Information missing" routes to Researcher, not "passes with reservations" |
Stop conditions
Hard-stop (write the full findings JSON if possible, then emit envelope; do NOT attempt recovery):
| Condition | Behaviour |
|---|---|
Task plan has no <success_criteria> block | Single unmet-criterion finding pointing at the gap; route to plan-checker. Envelope blockers_count: 1 |
| Critic budget (timeout) exhausted | Emit collected criteria + findings + verdict issues_found, envelope reflects the partial report |
| Diff unparseable / files missing | category: critic-error, route to stuck. Envelope blockers_count: 1 |
<report_path> missing in prompt OR Write to it fails | Envelope report_path: null, verdict: "issues_found", blockers_count: 1, plus inline error describing cause. Orchestrator routes this as critic-error → stuck |
A module listed in <files_to_read> cannot be read | category: critic-error, route to stuck — the orchestrator must inject every listed module path. The economy module is conditional: when it is absent (mode off/lite) its absence is never an error |
Source
agents/np-critic.md in the source tree.
