Appearance
Troubleshooting
How-to. Symptom-driven entry point. Find your symptom in the first column, follow the recovery in the second. The error-code reference is the Error Codes page; this page tells you what to do when one of them shows up.
Quickest first
| You see / hear | Try first |
|---|---|
| Anything weird right after install | npx nubos-pilot doctor --fix |
The host CLI doesn't expose np:* commands | Re-open the host CLI; managed-block in CLAUDE.md / AGENTS.md is read once at start. |
not-in-project thrown anywhere | You are running outside a directory that has .nubos-pilot/. cd into the project root. |
Execution / commit problems
| Symptom | First diagnosis | Recovery |
|---|---|---|
commit-all-paths-gitignored thrown | Every path in the task's files_modified is gitignored. | Either fix .gitignore or open T<NNNN>-PLAN.md and correct files_modified. Re-run /np:execute-phase. |
commit-no-paths | Task's files_modified is empty. | Edit the task PLAN frontmatter to declare paths, then re-run. |
| Executor hung on a task | Network / model timeout. | /np:resume-work to classify (resume / orphan / clean). If orphan, follow with /np:reset-slice to clear in-flight state. |
| Executor crashed mid-task, file system left dirty | Working tree has un-committed edits from the crash. | /np:reset-slice (no commit, restores files_modified from HEAD, drops checkpoint). |
| One committed task is wrong, but the rest of the slice is fine | Need to undo just that commit. | /np:undo-task M001-S001-T0003 — forward git revert, sets task status back to pending. |
| Whole milestone is wrong, want to start over | Need bulk revert. | /np:undo 1 — reverts every task commit of milestone 1, newest-first. Plans + CONTEXT.md remain. |
| Verify command keeps failing inside a task | Test or build is genuinely broken. | Inside /np:execute-phase the Nubosloop already routes a verify failure to np-build-fixer in round 2 and later. If the loop reaches stuck and the failure is in the task's scope, fix manually then /np:resume-work. If the failure reveals a planning gap, run /np:undo-task, edit the task plan, then re-run. |
pending_todos counter in STATE.md looks wrong | Race or crash during add-todo. | Run /np:state to inspect, then edit STATE.md manually if needed — pending_todos is single-writer-locked but a forced kill mid-write can theoretically skew it. |
Plan-phase problems
| Symptom | First diagnosis | Recovery |
|---|---|---|
Plan-checker keeps returning issues_found after 2 iterations | Loop is bounded at 2 iterations. The planner couldn't satisfy the checker. | Read M<NNN>-PLAN-REVIEW.md to see the findings. Most often the gap is in CONTEXT — re-run /np:discuss-phase to add the missing decision, then /np:plan-phase --repromote (or a full re-plan). |
PHASE SPLIT RECOMMENDED from planner | Milestone is too big for one plan. | Use /np:propose-milestones to re-shape the open pipeline, splitting M<NNN> into two smaller milestones. |
np-researcher only writes ## Research Coverage annotation | WebFetch + Context7 both unavailable; the offline-confirm protocol kicked in. | Either make Context7 available in the host CLI (see Installation § Research tools) or accept the local-only research and proceed. |
## CONTEXT CONFLICT from np-architect | Architect detected a decision that would violate M<NNN>-CONTEXT.md or RULES.md. | Re-open /np:discuss-phase <N> with Append-update to either adjust the conflicting decision or remove the architectural ambition. |
| Scaffolder silently dropped a task | <task> block in slice PLAN missing one of id, depends_on, wave, tier attributes. | Open the slice's S<NNN>-PLAN.md, fix the opening tag, re-run /np:plan-phase <N> --repromote. |
Verify / validate problems
| Symptom | First diagnosis | Recovery |
|---|---|---|
/np:verify-work reports Fail for an SC | Code shipped doesn't satisfy the success_criterion. | Read M<NNN>-VERIFICATION.md to see the evidence cite. Three options: fix in next milestone, add a new task to current milestone, or accept and update the CONTEXT to defer. |
Verifier classifies several SCs as Needs-User-Confirm | Subjective/UX criteria need human eye. | Pass-2 askuser gate fires automatically. Answer Pass / Fail / Defer per SC. |
/np:validate-phase reports many UNDER_SAMPLED | Tests exist but don't directly observe the requirement. | Read the ## Remediation Guidance section in M<NNN>-VALIDATION.md for suggested test names and assertion shapes. Add tests; re-run validate. |
Verify says Pass, validate says UNCOVERED | Code works but no test covers it — silent regression risk. | Same as above — add direct test assertions. Don't merge until both agree. |
Worktree problems (workflow.worktree_isolation: true)
| Symptom | First diagnosis | Recovery |
|---|---|---|
| FF-merge fails at slice end with non-FF | The base branch advanced while the slice ran. | Decide whether to rebase the slice branch (manually, then re-run worktree-ff-merge) or accept the divergence and merge manually. ADR-0008 D-8.7 surfaces this rather than silently rewriting commits. |
Stale worktree directory under .nubos-pilot/worktrees/ | A previous slice crashed without cleanup. | node np-tools.cjs worktree-list to enumerate; worktree-remove --force <slice-id> to clean up. |
.nubos-pilot/worktrees/ not gitignored | Pre-flight check fails before any worktree is created. | Add .nubos-pilot/worktrees/ to .gitignore and re-run. |
Install problems
| Symptom | First diagnosis | Recovery |
|---|---|---|
target-is-symlink | Payload directory is a symlink — refused by the installer. | Inspect .claude/nubos-pilot/; replace the symlink with a real directory or move the payload destination. |
manifest-path-traversal | Manifest contains a .. or absolute path. | Likely a corrupted payload. npx nubos-pilot uninstall (preserves Project-State) and reinstall. |
Codex [features] table broken in ~/.codex/config.toml | Known Codex-side bug. | npx nubos-pilot doctor --fix repairs it idempotently. |
npx nubos-pilot update fights with hand-edits in payload files | The diff said "user-modified" and backed up to .bak.<n> before overwriting. | Compare the .bak.<n> against the new file and merge by hand. |
When the Nubosloop is stuck
The Nubosloop terminates with stuck state when loop.maxRounds (default 3, configurable range [1, 100]) is reached without convergence. This is a first-class state, surfaced explicitly in STATE.md and on /np:dashboard, never a silent downgrade.
Diagnose
Read the task's checkpoint:
bash
cat .nubos-pilot/checkpoints/M001-S001-T0001.jsonThe nubosloop block holds the loop history with the last Critic-Schwarm findings.
Three patterns of stuck-ness, with different recovery paths:
Pattern 1 — Same finding survives all rounds
The Executor or Build-Fixer can't address a particular finding. The cause is usually one of two:
- The plan didn't contemplate the constraint the Critic surfaced. Run
/np:undo-task <task-id>, then/np:discuss-phase <N>to lock the gap as a decision and re-run/np:plan-phase <N>to refresh the plan. - The Critic itself is wrong. Read the captured finding text; if it cites a non-existent file or a misread of the code, file an issue and re-run with
/np:reset-slice <task-id>.
Pattern 2 — Findings cycle (fixing one introduces another)
The Executor is making local-optima moves. Override the loop:
bash
# Manual fix: edit the files yourself
vim <files>
# Mark the task as resumed (clears the stuck flag)
/np:resume-work
# Re-run from the next slice
/np:execute-phase <N>Pattern 3 — Findings include information-missing or question-to-user
The loop pauses for human input.
information-missing→ re-run/np:research-phase <N>with the missing topic added to scope.question-to-user→ answer in the prompt; the loop resumes automatically when integrated with Temporal-style signal-wait.
Reset path (last resort)
bash
/np:reset-slice M001-S001-T0001 # discards working tree, drops checkpoint
/np:discuss-phase 1 # lock the discovered gap as a decision
/np:plan-phase 1 # re-plan with the gap in CONTEXT
/np:execute-phase 1 # re-run from the affected sliceTuning to reduce stuck-ness
If stuck happens often:
loop.maxRounds: 5— more rounds for convergence at higher token cost.swarm.research.k: 5— broader research at start reducesinformation-missinglater.swarm.research.threshold: 0.7— more aggressive cache hits, fewer first-time researches.
See Swarm + Loop Config for the full knob list.
Doctor checks
npx nubos-pilot doctor runs 12 checks:
- Manifest integrity — every shipped file exists and matches its recorded SHA-256.
- Version mismatch — installed payload version differs from
package.jsonversion. - Hooks presence — when the manifest declares
hooks/entries, thehooks/directory exists on disk. - Codex trapped-features — known broken
[features]table in~/.codex/config.toml. - Askuser runtime — the prompt capability is callable.
- Codebase docs freshness — modules whose hashes drifted since
/np:scan-codebase, plus_TBDmodule docs. - Milestone layout — every
roadmap.yamlentry has a matchingmilestones/M<NNN>/directory; legacyphases/is flagged. - Nubosloop Critic agents — the spawnable
np-criticplus the three axis modules (np-critic-style,np-critic-tests,np-critic-acceptance) are present in the payload. - Nubosloop knowledge store —
.nubos-pilot/knowledge/learnings.jsonis present and well-formed. - Nubosloop config —
swarm.knowledge_adapterislocal, andloop.maxRoundsis in range. - Orphan tmp-file sweep — stale
.tmpfiles (older than one hour) from a hard-killed process are swept. - Output-schema drift — every
M<NNN>-VERIFICATION.mdandM<NNN>-VALIDATION.mdlints clean against its schema (ADR-0017). Surfaces legacy files written before write-time enforcement existed. Fix path: re-run/np:verify-work <N>//np:validate-phase <N>for the offending milestone.
--fix applies the auto-safe checks (mainly 4 and 11). Schema-drift issues (12) are not auto-fixable; they require re-spawning the producing workflow with the current schema in the spawn prompt.
Schema violations after a spawn
When a workflow exits with output-schema-violation:
[np:verify-work] VERIFICATION.md violates output schema — re-spawn np-verifier with the violation list above as feedback. Do NOT hand-edit.The agent's output doesn't conform to its artefact schema (a missing frontmatter key, a wrong enum, [object Object] in a heading, a missing **Reasoning:** field per Decision/Risk/Pattern, and so on). Do not hand-edit the file to satisfy the linter; that hides the real bug, which is either an agent-prompt drift or a schema mismatch. Instead:
- Read the violation list. It cites the exact path and the missing or wrong field.
- Re-spawn the producing agent (
/np:verify-work,/np:validate-phase,/np:research-phase). The schema is injected into the new spawn prompt viaoutput-lint prompt --schema <name>. - If the same violation keeps appearing across re-spawns, the agent prompt itself may be out of sync with the schema; file an issue with the violation list and the spawn output.
See Output Schemas and ADR-0017 for the full enforcement model.
Researcher swarm doesn't converge
If /np:research-phase fires the disagreement hard-gate (Step 5.7):
Researcher-Schwarm konvergiert nicht. Wie weiter?
1. Re-spawn mit schärferer task_query
2. Fortfahren mit Reconciler-Pick
3. Manuell entscheidenThis means agreement_score < min_agreement_score (default 0.5) OR contested_count > max_contested (default 2). The most common cause is a task_query that the spawns interpreted along different axes. Pick option 1, refine the query (more specific scope, named alternatives to compare, explicit exclusion list), re-run. Tune defaults in config.json → swarm.research.{min_agreement_score, max_contested} or per-invocation via researcher-reconcile gate <N> --min-agreement-score 0.7. See Researcher-Schwarm and ADR-0018.
Phantom blockers in /np:close-project
If /np:close-project reports UNCOVERED counts or "milestone status is X" blockers that don't match the underlying VERIFICATION/VALIDATION files, those artefacts predate ADR-0017. Run np:doctor to surface the drift (output-schema-violation issues), then re-run /np:verify-work <N> / /np:validate-phase <N> for the offending milestones. Current spawns write schema-bound frontmatter the aggregator reads as canonical signal; only pre-rule files hit the legacy body-grep fallback that produced the phantoms.
When in doubt
/np:state— what does the tool think the current milestone/task is?/np:dashboard— visual overview of every slice's task statuses.node np-tools.cjs config-get <key>— what's actually inconfig.json?node np-tools.cjs detect-runtime— which runtime is the host resolving to?git log --oneline | grep '^[a-f0-9]\{7,\} task('— every per-task commit, in chronological order.
If everything else fails: the Error Codes reference lists every NubosPilotError code with its source module. Match on err.code, not on the message.
