Skip to content

Troubleshooting

How-to. Symptom-driven entry point. Find your symptom in the first column, follow the recovery in the second. The error-code reference is the Error Codes page; this page tells you what to do when one of them shows up.

Quickest first

You see / hearTry first
Anything weird right after installnpx nubos-pilot doctor --fix
The host CLI doesn't expose np:* commandsRe-open the host CLI; managed-block in CLAUDE.md / AGENTS.md is read once at start.
not-in-project thrown anywhereYou are running outside a directory that has .nubos-pilot/. cd into the project root.

Execution / commit problems

SymptomFirst diagnosisRecovery
commit-all-paths-gitignored thrownEvery path in the task's files_modified is gitignored.Either fix .gitignore or open T<NNNN>-PLAN.md and correct files_modified. Re-run /np:execute-phase.
commit-no-pathsTask's files_modified is empty.Edit the task PLAN frontmatter to declare paths, then re-run.
Executor hung on a taskNetwork / model timeout./np:resume-work to classify (resume / orphan / clean). If orphan, follow with /np:reset-slice to clear in-flight state.
Executor crashed mid-task, file system left dirtyWorking tree has un-committed edits from the crash./np:reset-slice (no commit, restores files_modified from HEAD, drops checkpoint).
One committed task is wrong, but the rest of the slice is fineNeed to undo just that commit./np:undo-task M001-S001-T0003 — forward git revert, sets task status back to pending.
Whole milestone is wrong, want to start overNeed bulk revert./np:undo 1 — reverts every task commit of milestone 1, newest-first. Plans + CONTEXT.md remain.
Verify command keeps failing inside a taskTest or build is genuinely broken.Inside /np:execute-phase the Nubosloop already routes a verify failure to np-build-fixer in round 2 and later. If the loop reaches stuck and the failure is in the task's scope, fix manually then /np:resume-work. If the failure reveals a planning gap, run /np:undo-task, edit the task plan, then re-run.
pending_todos counter in STATE.md looks wrongRace or crash during add-todo.Run /np:state to inspect, then edit STATE.md manually if needed — pending_todos is single-writer-locked but a forced kill mid-write can theoretically skew it.

Plan-phase problems

SymptomFirst diagnosisRecovery
Plan-checker keeps returning issues_found after 2 iterationsLoop is bounded at 2 iterations. The planner couldn't satisfy the checker.Read M<NNN>-PLAN-REVIEW.md to see the findings. Most often the gap is in CONTEXT — re-run /np:discuss-phase to add the missing decision, then /np:plan-phase --repromote (or a full re-plan).
PHASE SPLIT RECOMMENDED from plannerMilestone is too big for one plan.Use /np:propose-milestones to re-shape the open pipeline, splitting M<NNN> into two smaller milestones.
np-researcher only writes ## Research Coverage annotationWebFetch + Context7 both unavailable; the offline-confirm protocol kicked in.Either make Context7 available in the host CLI (see Installation § Research tools) or accept the local-only research and proceed.
## CONTEXT CONFLICT from np-architectArchitect detected a decision that would violate M<NNN>-CONTEXT.md or RULES.md.Re-open /np:discuss-phase <N> with Append-update to either adjust the conflicting decision or remove the architectural ambition.
Scaffolder silently dropped a task<task> block in slice PLAN missing one of id, depends_on, wave, tier attributes.Open the slice's S<NNN>-PLAN.md, fix the opening tag, re-run /np:plan-phase <N> --repromote.

Verify / validate problems

SymptomFirst diagnosisRecovery
/np:verify-work reports Fail for an SCCode shipped doesn't satisfy the success_criterion.Read M<NNN>-VERIFICATION.md to see the evidence cite. Three options: fix in next milestone, add a new task to current milestone, or accept and update the CONTEXT to defer.
Verifier classifies several SCs as Needs-User-ConfirmSubjective/UX criteria need human eye.Pass-2 askuser gate fires automatically. Answer Pass / Fail / Defer per SC.
/np:validate-phase reports many UNDER_SAMPLEDTests exist but don't directly observe the requirement.Read the ## Remediation Guidance section in M<NNN>-VALIDATION.md for suggested test names and assertion shapes. Add tests; re-run validate.
Verify says Pass, validate says UNCOVEREDCode works but no test covers it — silent regression risk.Same as above — add direct test assertions. Don't merge until both agree.

Worktree problems (workflow.worktree_isolation: true)

SymptomFirst diagnosisRecovery
FF-merge fails at slice end with non-FFThe base branch advanced while the slice ran.Decide whether to rebase the slice branch (manually, then re-run worktree-ff-merge) or accept the divergence and merge manually. ADR-0008 D-8.7 surfaces this rather than silently rewriting commits.
Stale worktree directory under .nubos-pilot/worktrees/A previous slice crashed without cleanup.node np-tools.cjs worktree-list to enumerate; worktree-remove --force <slice-id> to clean up.
.nubos-pilot/worktrees/ not gitignoredPre-flight check fails before any worktree is created.Add .nubos-pilot/worktrees/ to .gitignore and re-run.

Install problems

SymptomFirst diagnosisRecovery
target-is-symlinkPayload directory is a symlink — refused by the installer.Inspect .claude/nubos-pilot/; replace the symlink with a real directory or move the payload destination.
manifest-path-traversalManifest contains a .. or absolute path.Likely a corrupted payload. npx nubos-pilot uninstall (preserves Project-State) and reinstall.
Codex [features] table broken in ~/.codex/config.tomlKnown Codex-side bug.npx nubos-pilot doctor --fix repairs it idempotently.
npx nubos-pilot update fights with hand-edits in payload filesThe diff said "user-modified" and backed up to .bak.<n> before overwriting.Compare the .bak.<n> against the new file and merge by hand.

When the Nubosloop is stuck

The Nubosloop terminates with stuck state when loop.maxRounds (default 3, configurable range [1, 100]) is reached without convergence. This is a first-class state, surfaced explicitly in STATE.md and on /np:dashboard, never a silent downgrade.

Diagnose

Read the task's checkpoint:

bash
cat .nubos-pilot/checkpoints/M001-S001-T0001.json

The nubosloop block holds the loop history with the last Critic-Schwarm findings.

Three patterns of stuck-ness, with different recovery paths:

Pattern 1 — Same finding survives all rounds

The Executor or Build-Fixer can't address a particular finding. The cause is usually one of two:

  • The plan didn't contemplate the constraint the Critic surfaced. Run /np:undo-task <task-id>, then /np:discuss-phase <N> to lock the gap as a decision and re-run /np:plan-phase <N> to refresh the plan.
  • The Critic itself is wrong. Read the captured finding text; if it cites a non-existent file or a misread of the code, file an issue and re-run with /np:reset-slice <task-id>.

Pattern 2 — Findings cycle (fixing one introduces another)

The Executor is making local-optima moves. Override the loop:

bash
# Manual fix: edit the files yourself
vim <files>

# Mark the task as resumed (clears the stuck flag)
/np:resume-work

# Re-run from the next slice
/np:execute-phase <N>

Pattern 3 — Findings include information-missing or question-to-user

The loop pauses for human input.

  • information-missing → re-run /np:research-phase <N> with the missing topic added to scope.
  • question-to-user → answer in the prompt; the loop resumes automatically when integrated with Temporal-style signal-wait.

Reset path (last resort)

bash
/np:reset-slice M001-S001-T0001     # discards working tree, drops checkpoint
/np:discuss-phase 1                 # lock the discovered gap as a decision
/np:plan-phase 1                    # re-plan with the gap in CONTEXT
/np:execute-phase 1                 # re-run from the affected slice

Tuning to reduce stuck-ness

If stuck happens often:

  • loop.maxRounds: 5 — more rounds for convergence at higher token cost.
  • swarm.research.k: 5 — broader research at start reduces information-missing later.
  • swarm.research.threshold: 0.7 — more aggressive cache hits, fewer first-time researches.

See Swarm + Loop Config for the full knob list.

Doctor checks

npx nubos-pilot doctor runs 12 checks:

  1. Manifest integrity — every shipped file exists and matches its recorded SHA-256.
  2. Version mismatch — installed payload version differs from package.json version.
  3. Hooks presence — when the manifest declares hooks/ entries, the hooks/ directory exists on disk.
  4. Codex trapped-features — known broken [features] table in ~/.codex/config.toml.
  5. Askuser runtime — the prompt capability is callable.
  6. Codebase docs freshness — modules whose hashes drifted since /np:scan-codebase, plus _TBD module docs.
  7. Milestone layout — every roadmap.yaml entry has a matching milestones/M<NNN>/ directory; legacy phases/ is flagged.
  8. Nubosloop Critic agents — the spawnable np-critic plus the three axis modules (np-critic-style, np-critic-tests, np-critic-acceptance) are present in the payload.
  9. Nubosloop knowledge store — .nubos-pilot/knowledge/learnings.json is present and well-formed.
  10. Nubosloop config — swarm.knowledge_adapter is local, and loop.maxRounds is in range.
  11. Orphan tmp-file sweep — stale .tmp files (older than one hour) from a hard-killed process are swept.
  12. Output-schema drift — every M<NNN>-VERIFICATION.md and M<NNN>-VALIDATION.md lints clean against its schema (ADR-0017). Surfaces legacy files written before write-time enforcement existed. Fix path: re-run /np:verify-work <N> / /np:validate-phase <N> for the offending milestone.

--fix applies the auto-safe checks (mainly 4 and 11). Schema-drift issues (12) are not auto-fixable; they require re-spawning the producing workflow with the current schema in the spawn prompt.

Schema violations after a spawn

When a workflow exits with output-schema-violation:

[np:verify-work] VERIFICATION.md violates output schema — re-spawn np-verifier with the violation list above as feedback. Do NOT hand-edit.

The agent's output doesn't conform to its artefact schema (a missing frontmatter key, a wrong enum, [object Object] in a heading, a missing **Reasoning:** field per Decision/Risk/Pattern, and so on). Do not hand-edit the file to satisfy the linter; that hides the real bug, which is either an agent-prompt drift or a schema mismatch. Instead:

  1. Read the violation list. It cites the exact path and the missing or wrong field.
  2. Re-spawn the producing agent (/np:verify-work, /np:validate-phase, /np:research-phase). The schema is injected into the new spawn prompt via output-lint prompt --schema <name>.
  3. If the same violation keeps appearing across re-spawns, the agent prompt itself may be out of sync with the schema; file an issue with the violation list and the spawn output.

See Output Schemas and ADR-0017 for the full enforcement model.

Researcher swarm doesn't converge

If /np:research-phase fires the disagreement hard-gate (Step 5.7):

Researcher-Schwarm konvergiert nicht. Wie weiter?
  1. Re-spawn mit schärferer task_query
  2. Fortfahren mit Reconciler-Pick
  3. Manuell entscheiden

This means agreement_score < min_agreement_score (default 0.5) OR contested_count > max_contested (default 2). The most common cause is a task_query that the spawns interpreted along different axes. Pick option 1, refine the query (more specific scope, named alternatives to compare, explicit exclusion list), re-run. Tune defaults in config.jsonswarm.research.{min_agreement_score, max_contested} or per-invocation via researcher-reconcile gate <N> --min-agreement-score 0.7. See Researcher-Schwarm and ADR-0018.

Phantom blockers in /np:close-project

If /np:close-project reports UNCOVERED counts or "milestone status is X" blockers that don't match the underlying VERIFICATION/VALIDATION files, those artefacts predate ADR-0017. Run np:doctor to surface the drift (output-schema-violation issues), then re-run /np:verify-work <N> / /np:validate-phase <N> for the offending milestones. Current spawns write schema-bound frontmatter the aggregator reads as canonical signal; only pre-rule files hit the legacy body-grep fallback that produced the phantoms.

When in doubt

  • /np:state — what does the tool think the current milestone/task is?
  • /np:dashboard — visual overview of every slice's task statuses.
  • node np-tools.cjs config-get <key> — what's actually in config.json?
  • node np-tools.cjs detect-runtime — which runtime is the host resolving to?
  • git log --oneline | grep '^[a-f0-9]\{7,\} task(' — every per-task commit, in chronological order.

If everything else fails: the Error Codes reference lists every NubosPilotError code with its source module. Match on err.code, not on the message.