Appearance
ADR-0019: Plan-side Trust Layer — Mechanical PLAN.md Validation Before Promote
- Status: Accepted
- Date: 2026-05-05
- Supersedes: None
- Related: ADR-0010 (Execute-side Trust Layer, Layers A/B/C)
Numbering note. This ADR was originally authored as
0013-plan-trust-layer.md, colliding with the earlier0013-learnings-store-schema-evolution.md(authored 2026-05-03, two days before this one). The collision was resolved by keeping0013for the learnings-store ADR and renumbering this one to0019. TheLayer-Dnaming used throughout the body is internal to this ADR and is unaffected by the renumber.
Context and Problem Statement
ADR-0010 closed the Execute-side Trust gaps (Layers A, B, C): commit-task and loop-run-round now refuse to advance unless the per-task evidence and audit-trail are intact. Real spawns must demonstrably have happened. That solved one half of the problem.
The other half surfaced in production runs: the plans themselves were buggy. Three failure classes recurred across milestones M002 and M004:
Phantom CLI verbs in
<verify>blocks. A plan would specify<verify>node .nubos-pilot/bin/np-tools.cjs codebase doc-lint</verify>, butcodebaseis not a registered np-tools verb. The verify command is mechanically unexecutable. The Nubosloop catches this at execute time (verify-red → build-fixer → verify-red → build-fixer → stuck), but the cost is roughly 3 executor + 3 build-fixer + 9 critic spawns for a deterministically-failing outcome that a 30-line lint check would have caught at plan time.False parallel-safety claims. Tasks marked
depends_on: [](parallel-safe) where one task's<verify>reads working-tree state (update-docs --check,phpstan analyse,git diff) against files another sibling task modifies. Filesystem-race at runtime; parallel-safe in name only.Implementation over-specification. Plans bake in framework-controlled details: exact migration filenames (
0001_01_01_000004_create_customer_columns_table.php), schema DDL (Schema::create('subscriptions', function (Blueprint $table) { ... })), code-style edicts (use Cashier::calculateTaxes() inline in boot()). These are not the planner's territory. The framework decides migration shapes, the executor reads codebase docs for style, the publish step decides filenames. A plan that pretends to know these is making falsifiable claims it cannot verify.
In all three classes, the orchestrator fielded the consequences at execute time when the planner should have prevented them.
Decision Drivers
- The Nubosloop is expensive (1 executor + 3 critics per task per round). Burning a full Nubosloop on a plan-bug is wasteful when mechanical detection is cheap.
- Plan-checker is an LLM-judgment agent (opus tier). LLM judgment is unreliable for syntactic checks (verb-existence, regex-pattern-detection); those are mechanical-checker territory.
- Planners under user pressure rationalize over-specification ("being thorough"). Doctrine alone doesn't hold; mechanical refusal does.
Considered Options
- Status quo (LLM-judgment plan-checker only). Rejected. It fails on all three classes, observed in M002 + M004.
- Add the three checks to the
np-plan-checkeragent prompt. Rejected. LLM-judgment is non-deterministic. The same plan, checked twice, can produce different verdicts. For mechanical violations the project needs deterministic refusal. - New mechanical lint verb plus Layer-D enforcement in the
plan-phaseworkflow. Chosen.
Decision Outcome
Chosen: Plan-side Trust Layer with three mechanical linters wrapped in a CLI verb (np-tools.cjs plan-lint), called from the plan-phase workflow before each verification-loop iteration. Critical findings are merged into the LLM-checker verdict and force iteration-2.
Layer-D — three deterministic linters
D1,
lintVerifyCommands(severity: critical). Every<verify>block is parsed; the first command per non-comment line is validated against:- Known np-tools verbs (read from
_commands.cjs::COMMANDS) - Declared composer scripts (
composer.json::scripts) - Declared npm / pnpm / yarn scripts (
package.json::scripts) vendor/bin/*andnode_modules/.bin/*paths (lint-time existence + conventional-bin-dir tolerance)- The POSIX baseline (
echo,test,[,sed,grep,find, …) - Interpreter-prefixed calls (
node,php,composer,npm,npx, …)
Unknown commands emit
verify-command-unknownwith a concreteraw.reason(np-tools-unknown-verb,composer-script-not-declared,npm-script-not-declared,path-not-found).- Known np-tools verbs (read from
D2,
lintParallelTaskRaces(severity: critical). For every slice with multiple tasks markeddepends_on: [], it computes whether any sibling's<verify>matches the working-tree-reader pattern (update-docs,phpstan analyse,pint,eslint,tsc,git diff/status/ls-files/log,find -newer,pre-commit run). If yes AND another sibling has a non-emptyfiles_modified, it emitsparallel-task-implicit-dependencynaming the conflict. The hint includes the exactdepends_onarray the planner should have written.D3,
lintOverSpecification(severity: major, advisory). A heuristic regex scan for:- Schema DDL (
CREATE TABLE,ALTER COLUMN,Schema::create,Schema::table, common Eloquent column-builder calls) - Framework-timestamped filenames (
\d{4}_\d{2}_\d{2}_\d{6}_*.php) - Inline code blocks over 200 characters
It emits
plan-over-specifies-implementation. Severity ismajor(advisory) so it surfaces without blocking the gate; heuristic false-positives are tolerated.- Schema DDL (
Granularity Doctrine — propagated to planner agents
agents/np-planner.md gains a <plan_granularity> section codifying that plans specify intent + boundary + acceptance, not implementation. Concrete prohibitions are enumerated, including:
- Schema DDL belongs to the executor.
- Framework-timestamped filenames are publish-time output; use globs or an empty
files_modified. - Code-style is codebase-state; the executor reads
.nubos-pilot/codebase/<module>.md. - Library-internal claims must be
[VERIFIED]via the researcher or stay above that level.
agents/np-architect.md gains a granularity reminder: architecture decisions are intent-level (which library, which boundary, which protocol), not implementation prescriptions.
agents/np-plan-checker.md gains the three new canonical finding categories. The plan-checker agent mirrors the mechanical findings into its YAML verdict so the verification loop treats them uniformly with semantic findings.
Workflow integration
workflows/plan-phase.md calls plan-lint --milestone $milestone_id between the plan-checker spawn and the status-pass check, in each iteration of the verification loop. Critical findings are merged into the verdict JSON; the loop forces iteration-2 (and rejects the plan if iteration-2 still has critical findings).
Defaults / configuration
No new configuration. The linter is always-on; severity is fixed (critical for D1+D2, major for D3). Future configurability would live in .nubos-pilot/config.json::plan_lint if needed (for example, a project-specific allowlist for vendor binaries).
Consequences
- Good, because all three plan-bug classes from M002+M004 are now caught at plan time, before any executor spawns. It saves roughly 190 agent invocations per milestone (M004 had 27 tasks × ~7 spawns/task that would have hit the deterministic failure).
- Good, because the mechanical layer is deterministic and auditable. Same plan, same verdict, every time.
- Good, because plan-checker (LLM, opus) can now focus on semantic checks (success-criterion coverage, decision fidelity) where its judgment adds value.
- Good, because doctrine in planner agents reinforces the lesson; the mechanical layer enforces it when doctrine slips.
- Bad, because D3 (over-specification) is heuristic and can false-positive on legitimate plans (for example, a plan that must prescribe a schema for a custom CRUD-builder task). Mitigated by
severity: major(advisory, not blocking). - Bad, because plan-lint adds about 50 ms to each plan-phase iteration. Acceptable, since it saves orders of magnitude more on the execute side.
- Bad, because the Granularity Doctrine constrains planner output style. Existing plans in flight may fail D3 until rewritten. Migration path: D3 is advisory only; D1+D2 catch the actual blockers.
More Information
- Library:
lib/plan-lint.cjs— pure-function linters, no I/O outside file reads. - CLI verb:
bin/np-tools/plan-lint.cjs—plan-lint <path>orplan-lint --milestone M<NNN>. Exit 2 on critical findings, 0 otherwise. Output is JSON. - Tests:
lib/plan-lint.test.cjs(25 unit tests),bin/np-tools/plan-lint.test.cjs(10 CLI integration + e2e tests). - Workflow integration:
workflows/plan-phase.md§ Verification Loop. plan-lint runs each iteration, findings merged into the verdict. - Related ADRs:
- ADR-0010: Execute-side Trust Layer (A/B/C). Layer-D (this ADR) is the planner-side counterpart.
- ADR-0011: the researcher-schwarm runs at plan time too; D1's
verify-command-unknownis structurally similar to a researcher's[VERIFIED]provenance check, applied to the verify-command surface. - ADR-0012: Layer-D enforces Rule 3 (do it with tests) at plan time: the verify command must be runnable.
