Skip to content

ADR-0019: Plan-side Trust Layer — Mechanical PLAN.md Validation Before Promote

  • Status: Accepted
  • Date: 2026-05-05
  • Supersedes: None
  • Related: ADR-0010 (Execute-side Trust Layer, Layers A/B/C)

Numbering note. This ADR was originally authored as 0013-plan-trust-layer.md, colliding with the earlier 0013-learnings-store-schema-evolution.md (authored 2026-05-03, two days before this one). The collision was resolved by keeping 0013 for the learnings-store ADR and renumbering this one to 0019. The Layer-D naming used throughout the body is internal to this ADR and is unaffected by the renumber.

Context and Problem Statement

ADR-0010 closed the Execute-side Trust gaps (Layers A, B, C): commit-task and loop-run-round now refuse to advance unless the per-task evidence and audit-trail are intact. Real spawns must demonstrably have happened. That solved one half of the problem.

The other half surfaced in production runs: the plans themselves were buggy. Three failure classes recurred across milestones M002 and M004:

  1. Phantom CLI verbs in <verify> blocks. A plan would specify <verify>node .nubos-pilot/bin/np-tools.cjs codebase doc-lint</verify>, but codebase is not a registered np-tools verb. The verify command is mechanically unexecutable. The Nubosloop catches this at execute time (verify-red → build-fixer → verify-red → build-fixer → stuck), but the cost is roughly 3 executor + 3 build-fixer + 9 critic spawns for a deterministically-failing outcome that a 30-line lint check would have caught at plan time.

  2. False parallel-safety claims. Tasks marked depends_on: [] (parallel-safe) where one task's <verify> reads working-tree state (update-docs --check, phpstan analyse, git diff) against files another sibling task modifies. Filesystem-race at runtime; parallel-safe in name only.

  3. Implementation over-specification. Plans bake in framework-controlled details: exact migration filenames (0001_01_01_000004_create_customer_columns_table.php), schema DDL (Schema::create('subscriptions', function (Blueprint $table) { ... })), code-style edicts (use Cashier::calculateTaxes() inline in boot()). These are not the planner's territory. The framework decides migration shapes, the executor reads codebase docs for style, the publish step decides filenames. A plan that pretends to know these is making falsifiable claims it cannot verify.

In all three classes, the orchestrator fielded the consequences at execute time when the planner should have prevented them.

Decision Drivers

  • The Nubosloop is expensive (1 executor + 3 critics per task per round). Burning a full Nubosloop on a plan-bug is wasteful when mechanical detection is cheap.
  • Plan-checker is an LLM-judgment agent (opus tier). LLM judgment is unreliable for syntactic checks (verb-existence, regex-pattern-detection); those are mechanical-checker territory.
  • Planners under user pressure rationalize over-specification ("being thorough"). Doctrine alone doesn't hold; mechanical refusal does.

Considered Options

  • Status quo (LLM-judgment plan-checker only). Rejected. It fails on all three classes, observed in M002 + M004.
  • Add the three checks to the np-plan-checker agent prompt. Rejected. LLM-judgment is non-deterministic. The same plan, checked twice, can produce different verdicts. For mechanical violations the project needs deterministic refusal.
  • New mechanical lint verb plus Layer-D enforcement in the plan-phase workflow. Chosen.

Decision Outcome

Chosen: Plan-side Trust Layer with three mechanical linters wrapped in a CLI verb (np-tools.cjs plan-lint), called from the plan-phase workflow before each verification-loop iteration. Critical findings are merged into the LLM-checker verdict and force iteration-2.

Layer-D — three deterministic linters

  • D1, lintVerifyCommands (severity: critical). Every <verify> block is parsed; the first command per non-comment line is validated against:

    • Known np-tools verbs (read from _commands.cjs::COMMANDS)
    • Declared composer scripts (composer.json::scripts)
    • Declared npm / pnpm / yarn scripts (package.json::scripts)
    • vendor/bin/* and node_modules/.bin/* paths (lint-time existence + conventional-bin-dir tolerance)
    • The POSIX baseline (echo, test, [, sed, grep, find, …)
    • Interpreter-prefixed calls (node, php, composer, npm, npx, …)

    Unknown commands emit verify-command-unknown with a concrete raw.reason (np-tools-unknown-verb, composer-script-not-declared, npm-script-not-declared, path-not-found).

  • D2, lintParallelTaskRaces (severity: critical). For every slice with multiple tasks marked depends_on: [], it computes whether any sibling's <verify> matches the working-tree-reader pattern (update-docs, phpstan analyse, pint, eslint, tsc, git diff/status/ls-files/log, find -newer, pre-commit run). If yes AND another sibling has a non-empty files_modified, it emits parallel-task-implicit-dependency naming the conflict. The hint includes the exact depends_on array the planner should have written.

  • D3, lintOverSpecification (severity: major, advisory). A heuristic regex scan for:

    • Schema DDL (CREATE TABLE, ALTER COLUMN, Schema::create, Schema::table, common Eloquent column-builder calls)
    • Framework-timestamped filenames (\d{4}_\d{2}_\d{2}_\d{6}_*.php)
    • Inline code blocks over 200 characters

    It emits plan-over-specifies-implementation. Severity is major (advisory) so it surfaces without blocking the gate; heuristic false-positives are tolerated.

Granularity Doctrine — propagated to planner agents

agents/np-planner.md gains a <plan_granularity> section codifying that plans specify intent + boundary + acceptance, not implementation. Concrete prohibitions are enumerated, including:

  • Schema DDL belongs to the executor.
  • Framework-timestamped filenames are publish-time output; use globs or an empty files_modified.
  • Code-style is codebase-state; the executor reads .nubos-pilot/codebase/<module>.md.
  • Library-internal claims must be [VERIFIED] via the researcher or stay above that level.

agents/np-architect.md gains a granularity reminder: architecture decisions are intent-level (which library, which boundary, which protocol), not implementation prescriptions.

agents/np-plan-checker.md gains the three new canonical finding categories. The plan-checker agent mirrors the mechanical findings into its YAML verdict so the verification loop treats them uniformly with semantic findings.

Workflow integration

workflows/plan-phase.md calls plan-lint --milestone $milestone_id between the plan-checker spawn and the status-pass check, in each iteration of the verification loop. Critical findings are merged into the verdict JSON; the loop forces iteration-2 (and rejects the plan if iteration-2 still has critical findings).

Defaults / configuration

No new configuration. The linter is always-on; severity is fixed (critical for D1+D2, major for D3). Future configurability would live in .nubos-pilot/config.json::plan_lint if needed (for example, a project-specific allowlist for vendor binaries).

Consequences

  • Good, because all three plan-bug classes from M002+M004 are now caught at plan time, before any executor spawns. It saves roughly 190 agent invocations per milestone (M004 had 27 tasks × ~7 spawns/task that would have hit the deterministic failure).
  • Good, because the mechanical layer is deterministic and auditable. Same plan, same verdict, every time.
  • Good, because plan-checker (LLM, opus) can now focus on semantic checks (success-criterion coverage, decision fidelity) where its judgment adds value.
  • Good, because doctrine in planner agents reinforces the lesson; the mechanical layer enforces it when doctrine slips.
  • Bad, because D3 (over-specification) is heuristic and can false-positive on legitimate plans (for example, a plan that must prescribe a schema for a custom CRUD-builder task). Mitigated by severity: major (advisory, not blocking).
  • Bad, because plan-lint adds about 50 ms to each plan-phase iteration. Acceptable, since it saves orders of magnitude more on the execute side.
  • Bad, because the Granularity Doctrine constrains planner output style. Existing plans in flight may fail D3 until rewritten. Migration path: D3 is advisory only; D1+D2 catch the actual blockers.

More Information

  • Library: lib/plan-lint.cjs — pure-function linters, no I/O outside file reads.
  • CLI verb: bin/np-tools/plan-lint.cjsplan-lint <path> or plan-lint --milestone M<NNN>. Exit 2 on critical findings, 0 otherwise. Output is JSON.
  • Tests: lib/plan-lint.test.cjs (25 unit tests), bin/np-tools/plan-lint.test.cjs (10 CLI integration + e2e tests).
  • Workflow integration: workflows/plan-phase.md § Verification Loop. plan-lint runs each iteration, findings merged into the verdict.
  • Related ADRs:
    • ADR-0010: Execute-side Trust Layer (A/B/C). Layer-D (this ADR) is the planner-side counterpart.
    • ADR-0011: the researcher-schwarm runs at plan time too; D1's verify-command-unknown is structurally similar to a researcher's [VERIFIED] provenance check, applied to the verify-command surface.
    • ADR-0012: Layer-D enforces Rule 3 (do it with tests) at plan time: the verify command must be runnable.