Troubleshooting

How-to. Symptom-driven entry point. Find your symptom in the first column, follow the recovery in the second. The error-code reference is the Error Codes page; this page tells you what to do when one of them shows up.

Quickest first

You see / hear	Try first
Anything weird right after install	`npx nubos-pilot doctor --fix`
The host CLI doesn't expose `np:*` commands	Re-open the host CLI; managed-block in `CLAUDE.md` / `AGENTS.md` is read once at start.
`not-in-project` thrown anywhere	You are running outside a directory that has `.nubos-pilot/`. `cd` into the project root.

Execution / commit problems

Symptom	First diagnosis	Recovery
`commit-all-paths-gitignored` thrown	Every path in the task's `files_modified` is gitignored.	Either fix `.gitignore` or open `T<NNNN>-PLAN.md` and correct `files_modified`. Re-run `/np:execute-phase`.
`commit-no-paths`	Task's `files_modified` is empty.	Edit the task PLAN frontmatter to declare paths, then re-run.
Executor hung on a task	Network / model timeout.	`/np:resume-work` to classify (`resume` / `orphan` / `clean`). If `orphan`, follow with `/np:reset-slice` to clear in-flight state.
Executor crashed mid-task, file system left dirty	Working tree has un-committed edits from the crash.	`/np:reset-slice` (no commit, restores `files_modified` from HEAD, drops checkpoint).
One committed task is wrong, but the rest of the slice is fine	Need to undo just that commit.	`/np:undo-task M001-S001-T0003` — forward `git revert`, sets task status back to `pending`.
Whole milestone is wrong, want to start over	Need bulk revert.	`/np:undo 1` — reverts every task commit of milestone 1, newest-first. Plans + CONTEXT.md remain.
Verify command keeps failing inside a task	Test or build is genuinely broken.	Inside `/np:execute-phase` the Nubosloop already routes a verify failure to `np-build-fixer` in round 2 and later. If the loop reaches `stuck` and the failure is in the task's scope, fix manually then `/np:resume-work`. If the failure reveals a planning gap, run `/np:undo-task`, edit the task plan, then re-run.
`pending_todos` counter in `STATE.md` looks wrong	Race or crash during `add-todo`.	Run `/np:state` to inspect, then edit `STATE.md` manually if needed — `pending_todos` is single-writer-locked but a forced kill mid-write can theoretically skew it.

Plan-phase problems

Symptom	First diagnosis	Recovery
Plan-checker keeps returning `issues_found` after 2 iterations	Loop is bounded at 2 iterations. The planner couldn't satisfy the checker.	Read `M<NNN>-PLAN-REVIEW.md` to see the findings. Most often the gap is in CONTEXT — re-run `/np:discuss-phase` to add the missing decision, then `/np:plan-phase --repromote` (or a full re-plan).
`PHASE SPLIT RECOMMENDED` from planner	Milestone is too big for one plan.	Use `/np:propose-milestones` to re-shape the open pipeline, splitting `M<NNN>` into two smaller milestones.
`np-researcher` only writes `## Research Coverage` annotation	WebFetch + Context7 both unavailable; the offline-confirm protocol kicked in.	Either make Context7 available in the host CLI (see Installation § Research tools) or accept the local-only research and proceed.
`## CONTEXT CONFLICT` from `np-architect`	Architect detected a decision that would violate `M<NNN>-CONTEXT.md` or `RULES.md`.	Re-open `/np:discuss-phase <N>` with `Append-update` to either adjust the conflicting decision or remove the architectural ambition.
Scaffolder silently dropped a task	`<task>` block in slice PLAN missing one of `id`, `depends_on`, `wave`, `tier` attributes.	Open the slice's `S<NNN>-PLAN.md`, fix the opening tag, re-run `/np:plan-phase <N> --repromote`.

Verify / validate problems

Symptom	First diagnosis	Recovery
`/np:verify-work` reports `Fail` for an SC	Code shipped doesn't satisfy the success_criterion.	Read `M<NNN>-VERIFICATION.md` to see the evidence cite. Three options: fix in next milestone, add a new task to current milestone, or accept and update the CONTEXT to defer.
Verifier classifies several SCs as `Needs-User-Confirm`	Subjective/UX criteria need human eye.	Pass-2 `askuser` gate fires automatically. Answer `Pass` / `Fail` / `Defer` per SC.
`/np:validate-phase` reports many `UNDER_SAMPLED`	Tests exist but don't directly observe the requirement.	Read the `## Remediation Guidance` section in `M<NNN>-VALIDATION.md` for suggested test names and assertion shapes. Add tests; re-run validate.
Verify says `Pass`, validate says `UNCOVERED`	Code works but no test covers it — silent regression risk.	Same as above — add direct test assertions. Don't merge until both agree.

Worktree problems (`workflow.worktree_isolation: true`)

Symptom	First diagnosis	Recovery
FF-merge fails at slice end with non-FF	The base branch advanced while the slice ran.	Decide whether to rebase the slice branch (manually, then re-run `worktree-ff-merge`) or accept the divergence and merge manually. ADR-0008 D-8.7 surfaces this rather than silently rewriting commits.
Stale worktree directory under `.nubos-pilot/worktrees/`	A previous slice crashed without cleanup.	`node np-tools.cjs worktree-list` to enumerate; `worktree-remove --force <slice-id>` to clean up.
`.nubos-pilot/worktrees/` not gitignored	Pre-flight check fails before any worktree is created.	Add `.nubos-pilot/worktrees/` to `.gitignore` and re-run.

Install problems

Symptom	First diagnosis	Recovery
`target-is-symlink`	Payload directory is a symlink — refused by the installer.	Inspect `.claude/nubos-pilot/`; replace the symlink with a real directory or move the payload destination.
`manifest-path-traversal`	Manifest contains a `..` or absolute path.	Likely a corrupted payload. `npx nubos-pilot uninstall` (preserves Project-State) and reinstall.
Codex `[features]` table broken in `~/.codex/config.toml`	Known Codex-side bug.	`npx nubos-pilot doctor --fix` repairs it idempotently.
`npx nubos-pilot update` fights with hand-edits in payload files	The diff said "user-modified" and backed up to `.bak.<n>` before overwriting.	Compare the `.bak.<n>` against the new file and merge by hand.

When the Nubosloop is stuck

The Nubosloop terminates with stuck state when loop.maxRounds (default 3, configurable range [1, 100]) is reached without convergence. This is a first-class state, surfaced explicitly in STATE.md and on /np:dashboard, never a silent downgrade.

Diagnose

Read the task's checkpoint:

bash

cat .nubos-pilot/checkpoints/M001-S001-T0001.json

The nubosloop block holds the loop history with the last Critic-Schwarm findings.

Three patterns of stuck-ness, with different recovery paths:

Pattern 1 — Same finding survives all rounds

The Executor or Build-Fixer can't address a particular finding. The cause is usually one of two:

The plan didn't contemplate the constraint the Critic surfaced. Run /np:undo-task <task-id>, then /np:discuss-phase <N> to lock the gap as a decision and re-run /np:plan-phase <N> to refresh the plan.
The Critic itself is wrong. Read the captured finding text; if it cites a non-existent file or a misread of the code, file an issue and re-run with /np:reset-slice <task-id>.

Pattern 2 — Findings cycle (fixing one introduces another)

The Executor is making local-optima moves. Override the loop:

bash

# Manual fix: edit the files yourself
vim <files>

# Mark the task as resumed (clears the stuck flag)
/np:resume-work

# Re-run from the next slice
/np:execute-phase <N>

Pattern 3 — Findings include `information-missing` or `question-to-user`

The loop pauses for human input.

information-missing → re-run /np:research-phase <N> with the missing topic added to scope.
question-to-user → answer in the prompt; the loop resumes automatically when integrated with Temporal-style signal-wait.

Reset path (last resort)

bash

/np:reset-slice M001-S001-T0001     # discards working tree, drops checkpoint
/np:discuss-phase 1                 # lock the discovered gap as a decision
/np:plan-phase 1                    # re-plan with the gap in CONTEXT
/np:execute-phase 1                 # re-run from the affected slice

Tuning to reduce stuck-ness

If stuck happens often:

loop.maxRounds: 5 — more rounds for convergence at higher token cost.
swarm.research.k: 5 — broader research at start reduces information-missing later.
swarm.research.threshold: 0.7 — more aggressive cache hits, fewer first-time researches.

See Swarm + Loop Config for the full knob list.

Doctor checks

npx nubos-pilot doctor runs 12 checks:

Manifest integrity — every shipped file exists and matches its recorded SHA-256.
Version mismatch — installed payload version differs from package.json version.
Hooks presence — when the manifest declares hooks/ entries, the hooks/ directory exists on disk.
Codex trapped-features — known broken [features] table in ~/.codex/config.toml.
Askuser runtime — the prompt capability is callable.
Codebase docs freshness — modules whose hashes drifted since /np:scan-codebase, plus _TBD module docs.
Milestone layout — every roadmap.yaml entry has a matching milestones/M<NNN>/ directory; legacy phases/ is flagged.
Nubosloop Critic agents — the spawnable np-critic plus the three axis modules (np-critic-style, np-critic-tests, np-critic-acceptance) are present in the payload.
Nubosloop knowledge store — .nubos-pilot/knowledge/learnings.json is present and well-formed.
Nubosloop config — swarm.knowledge_adapter is local, and loop.maxRounds is in range.
Orphan tmp-file sweep — stale .tmp files (older than one hour) from a hard-killed process are swept.
Output-schema drift — every M<NNN>-VERIFICATION.md and M<NNN>-VALIDATION.md lints clean against its schema (ADR-0017). Surfaces legacy files written before write-time enforcement existed. Fix path: re-run /np:verify-work <N> / /np:validate-phase <N> for the offending milestone.

--fix applies the auto-safe checks (mainly 4 and 11). Schema-drift issues (12) are not auto-fixable; they require re-spawning the producing workflow with the current schema in the spawn prompt.

Schema violations after a spawn

When a workflow exits with output-schema-violation:

[np:verify-work] VERIFICATION.md violates output schema — re-spawn np-verifier with the violation list above as feedback. Do NOT hand-edit.

The agent's output doesn't conform to its artefact schema (a missing frontmatter key, a wrong enum, [object Object] in a heading, a missing **Reasoning:** field per Decision/Risk/Pattern, and so on). Do not hand-edit the file to satisfy the linter; that hides the real bug, which is either an agent-prompt drift or a schema mismatch. Instead:

Read the violation list. It cites the exact path and the missing or wrong field.
Re-spawn the producing agent (/np:verify-work, /np:validate-phase, /np:research-phase). The schema is injected into the new spawn prompt via output-lint prompt --schema <name>.
If the same violation keeps appearing across re-spawns, the agent prompt itself may be out of sync with the schema; file an issue with the violation list and the spawn output.

See Output Schemas and ADR-0017 for the full enforcement model.

Researcher swarm doesn't converge

If /np:research-phase fires the disagreement hard-gate (Step 5.7):

Researcher-Schwarm konvergiert nicht. Wie weiter?
  1. Re-spawn mit schärferer task_query
  2. Fortfahren mit Reconciler-Pick
  3. Manuell entscheiden

This means agreement_score < min_agreement_score (default 0.5) OR contested_count > max_contested (default 2). The most common cause is a task_query that the spawns interpreted along different axes. Pick option 1, refine the query (more specific scope, named alternatives to compare, explicit exclusion list), re-run. Tune defaults in config.json → swarm.research.{min_agreement_score, max_contested} or per-invocation via researcher-reconcile gate <N> --min-agreement-score 0.7. See Researcher-Schwarm and ADR-0018.

Phantom blockers in /np:close-project

If /np:close-project reports UNCOVERED counts or "milestone status is X" blockers that don't match the underlying VERIFICATION/VALIDATION files, those artefacts predate ADR-0017. Run np:doctor to surface the drift (output-schema-violation issues), then re-run /np:verify-work <N> / /np:validate-phase <N> for the offending milestones. Current spawns write schema-bound frontmatter the aggregator reads as canonical signal; only pre-rule files hit the legacy body-grep fallback that produced the phantoms.

When in doubt

/np:state — what does the tool think the current milestone/task is?
/np:dashboard — visual overview of every slice's task statuses.
node np-tools.cjs config-get <key> — what's actually in config.json?
node np-tools.cjs detect-runtime — which runtime is the host resolving to?
git log --oneline | grep '^[a-f0-9]\{7,\} task(' — every per-task commit, in chronological order.

If everything else fails: the Error Codes reference lists every NubosPilotError code with its source module. Match on err.code, not on the message.

Troubleshooting ​

Quickest first ​

Execution / commit problems ​

Plan-phase problems ​

Verify / validate problems ​

Worktree problems (workflow.worktree_isolation: true) ​

Install problems ​

When the Nubosloop is stuck ​

Diagnose ​

Pattern 1 — Same finding survives all rounds ​

Pattern 2 — Findings cycle (fixing one introduces another) ​

Pattern 3 — Findings include information-missing or question-to-user ​

Reset path (last resort) ​

Tuning to reduce stuck-ness ​

Doctor checks ​

Schema violations after a spawn ​

Researcher swarm doesn't converge ​

Phantom blockers in /np:close-project ​

When in doubt ​

Troubleshooting

Quickest first

Execution / commit problems

Plan-phase problems

Verify / validate problems

Worktree problems (`workflow.worktree_isolation: true`)

Install problems

When the Nubosloop is stuck

Diagnose

Pattern 1 — Same finding survives all rounds

Pattern 2 — Findings cycle (fixing one introduces another)

Pattern 3 — Findings include `information-missing` or `question-to-user`

Reset path (last resort)

Tuning to reduce stuck-ness

Doctor checks

Schema violations after a spawn

Researcher swarm doesn't converge

Phantom blockers in /np:close-project

When in doubt