Appearance
ADR-0021: Model-Agnostic Agent Dispatch — Provider Abstraction + Own Tool-Use Loop
- Status: Accepted. The decision (provider abstraction, a nubos-pilot-owned one-shot tool-use loop, and guard parity) is implemented and proven end-to-end through its most demanding consumer: the audited, file-writing, Bash-running off-host executor wired into
execute-phase. The read-only off-host roles (critic, researcher) reuse the same seam; they are downstream work, not a separate decision (see Slice 4c). - Date: 2026-06-16
- Supersedes: None
- Related: ADR-0001 (No-Daemon), ADR-0002 (Zero Runtime Deps), ADR-0010 (Execute-side Trust Layer), ADR-0020 (In-Session Security Review)
Context and Problem Statement
Today every agent nubos-pilot spawns runs on a Claude model. The resolution chain is clean but Claude-only:
agent frontmatter `tier` (haiku|sonnet|opus)
→ config.model_profile (frontier|quality|balanced|budget|inherit)
→ resolveAlias(tier, profile) [lib/model-profiles.cjs]
→ MODEL_ALIAS_MAP[alias] [Claude model id]
→ host runtime spawns the subagent [bin/np-tools resolve-model]agents/*.md may declare only tier; model and model_profile are forbidden frontmatter fields. The single seam where a tier becomes a concrete model is resolveFromConfig() in bin/np-tools/resolve-model.cjs.
Users want a model-agnostic workspace: choose, per agent, which model runs it — including non-Anthropic hosted models (OpenAI, xAI/Grok) and local models (Ollama, vLLM, LM Studio). Concretely: "planner on Claude opus, critic on OpenAI gpt-4o, executor on local Ollama qwen."
There are two structurally different ways to deliver this, and the project already contains the seam for one of them:
- Host-delegated. nubos-pilot is primarily a generator/orchestrator that emits artifacts for a host coding agent. The
lib/runtime/adapters already model this: each host (claude,codex,gemini,opencode,qwen, …) carries acapabilities.modelResolutionfield (claude='profile',qwen='inherit'). Several hosts (OpenCode, Qwen Code) are already multi-provider and run local models natively. For those, model-agnosticism is the host's job; nubos-pilot need only emit the right per-agent model hint in that host's format. - Own dispatch layer. When the active host cannot route a specific agent to a non-native model — most importantly Claude Code, whose Agent tool only accepts Claude tiers — nubos-pilot must run the agent loop itself against an OpenAI-compatible endpoint.
The requirement explicitly includes routing an individual agent (e.g. the executor) to a local model while the rest of the run stays on the default host. Host-delegation cannot cover that case, so the own dispatch layer is required. The hard part: an executor is not a single prompt, it is a tool-use loop (Read, Edit, Bash, Grep, Glob). Claude Code supplies that loop today. Off-host, nubos-pilot has to own it.
Decision Drivers
- Per-agent granularity is the point. "Executor on Ollama, planner on Claude, in the same run" cannot be expressed by switching the host globally; it must be a per-agent routing decision nubos-pilot resolves.
- The default must not regress. The overwhelming common case is Claude (or the setup-chosen host). It must keep working byte-for-byte; model-agnosticism is opt-in via config.
- [ADR-0001] No daemon. Whatever runs a local model must be a one-shot invocation per spawn — no background server owned by nubos-pilot. (The model server itself, e.g.
ollama serve, is the user's, not ours.) - [ADR-0002] Zero runtime dependencies. No
openai/@anthropic-aiSDK. The provider client must be built on Node's built-infetch. - Guard parity is non-negotiable. The Nubosloop trust layer (ADR-0010), output-schema enforcement (ADR-0017), working-tree safety, commit-policy, and in-session security review (ADR-0020) all wrap Claude subagents today. An off-host executor that bypassed them would be a hole in every one of those guarantees.
- Local tool-use is unreliable. Small local models are materially weaker at multi-step function-calling than frontier Claude. The design must fail loudly and early rather than silently produce garbage.
Considered Options
- Host-delegation only (recommend OpenCode/Qwen as the host for the model-agnostic scenario; emit per-agent model hints). Rejected as the sole answer. It is the cheaper, lower-risk path and remains the right tool when the user is willing to switch host — but it cannot route a single agent off-model while keeping the default host, which is the stated requirement. Kept as a complementary path (see Consequences).
- Session-wide proxy (
ANTHROPIC_BASE_URL→ LiteLLM translating to Ollama). Rejected. It is session-global, not per-agent, and routes all agents — including ones that must stay on Claude — through a translation layer, with no per-agent control and degraded tool-use fidelity across the board. - Forbid off-host execution for tool-using agents; allow only single-shot agents (critics, researchers). Rejected for this ADR (though it is exactly Slice 2's intermediate state). The requirement explicitly includes the executor.
- Provider abstraction + own one-shot tool-use loop, engaged only when config routes an agent to a non-native provider. Chosen.
Decision Outcome
Chosen: a provider dimension on top of the existing tier/profile resolution, plus a nubos-pilot-owned, one-shot, zero-dependency tool-use loop that engages only for agents routed to a non-native provider. The default path (native host, Claude tiers) is unchanged.
Configuration surface
Two new top-level config keys. Both optional; absent ⇒ today's behaviour exactly.
jsonc
{
"model_providers": {
"default": "claude", // = setup-chosen host; provider used when no routing entry matches
"claude": { "kind": "native" }, // delegate to the host runtime (Claude Code Agent tool, etc.)
"openai": {
"kind": "openai-compat",
"base_url": "https://api.openai.com/v1",
"api_key_env": "OPENAI_API_KEY",
"models": { "haiku": "gpt-4o-mini", "sonnet": "gpt-4o", "opus": "gpt-4.1" }
},
"grok": { "kind": "openai-compat", "base_url": "https://api.x.ai/v1", "api_key_env": "XAI_API_KEY",
"models": { "sonnet": "grok-2", "opus": "grok-2" } },
"ollama": { "kind": "openai-compat", "base_url": "http://localhost:11434/v1",
"models": { "haiku": "qwen2.5-coder:7b", "sonnet": "qwen2.5-coder:32b", "opus": "qwen2.5-coder:32b" } }
},
"agent_routing": {
"np-planner": { "provider": "claude", "model": "claude-opus-4-7" },
"np-critic*": { "provider": "openai", "model": "gpt-4o" },
"np-executor": { "provider": "ollama", "model": "qwen2.5-coder:32b" }
}
}kind: "native"⇒ the host runtime executes the agent (status quo; forclaudethat is Claude Code's Agent tool).tierresolution is untouched.kind: "openai-compat"⇒ a single fetch-based client covers OpenAI, xAI/Grok, Ollama, vLLM, LM Studio, LiteLLM — they all speakPOST /v1/chat/completionswith tool-calling.base_urlis the only locator;api_key_envnames the env var (omitted for keyless local servers).
Resolution precedence (extends resolveFromConfig)
For an agent, resolve-model now returns { tier, profile, provider, model, kind }:
agent_routing[<exact-agent>]— exact name wins.agent_routing[<glob>]— e.g.np-critic*matches all four critic agents.model_providers.defaultprovider, with the agent'stiermapped through that provider'smodelstable.
Within a matched routing entry: an explicit model pins the model; otherwise the provider's models[tier] is used (so the agent's intrinsic tier still drives size — 7b vs 32b — without pinning). tier remains the agent's intrinsic difficulty property (ADR-respecting); agent_routing is the orthogonal where-it-runs layer.
Loud, not silent (per the hardening doctrine). A routing entry referencing an undefined provider is a hard config error at load time — never a silent fallback to Claude:
config error: agent_routing["np-executor"] references provider "ollama",
but model_providers.ollama is not defined.Dispatch seam
At spawn time, after resolution:
kind === "native"→ emit the host spawn exactly as today (model id / alias / inherit into the host's Agent tool).kind === "openai-compat"→ invokelib/runtime/agent-loop.cjsas a one-shot subprocess: it builds messages from the agent's system prompt (agents/<name>.md) + the task, advertises the agent's declaredtoolsas OpenAI function schemas, POSTs tobase_url, executes returned tool-calls against the workspace, appends results, and loops until a final answer or an iteration/budget cap. Output is consumed by the workflow exactly as a Claude subagent's report is.
Provider abstraction, tool-runtime, guard parity (planned layout)
lib/runtime/providers/openai-compat.cjs— zero-depfetchclient:chat({ messages, tools, model, signal }) → { content, tool_calls }. No SDK. No daemon — process exits when the loop returns.lib/runtime/agent-loop.cjs— the one-shot harness that drives the loop.lib/runtime/tools/— workspace implementations of Read/Grep/Glob (Slice 2) and Write/Edit/Bash (Slice 3), all routed through the existinglib/safe-path.cjs.- Guard parity. The off-host executor path runs through the same
commit-policy, working-tree guard (no autonomous stash/checkout/restore),output-lint, Nubosloop Rule-9 audit, and security-review (ADR-0020) as the Claude path. These guards move behind a shared seam so both paths call them; they are not reimplemented per provider. - Preflight (before any off-host spawn): daemon reachable (
GET /api/tagsfor Ollama, or a cheap models call), requested model present, tool-calling supported. A failed preflight aborts with an actionable message (run: ollama pull <model>), never mid-task.
Compliance with standing invariants
- ADR-0001 (No-Daemon): the loop is one-shot per spawn. nubos-pilot never owns a long-lived process; the model server is external and user-managed.
- ADR-0002 (Zero Runtime Deps): the client uses global
fetchonly. No provider SDK is added.
Consequences
- Good, because per-agent model routing — including local models — works under any host, including Claude Code, without switching the host or routing unrelated agents off-model.
- Good, because the default (native host, Claude tiers) is untouched: absent the new config keys, resolution and spawning are byte-for-byte as before.
- Good, because the host-delegation path (OpenCode/Qwen as a multi-provider host) remains available and is the cheaper answer when the user is willing to switch host — this ADR does not remove it; it adds the per-agent capability the host axis cannot provide.
- Good, because guard parity is explicit: the off-host executor is held to the same trust/security/commit guarantees as the Claude path, by design rather than by accident.
- Bad, because nubos-pilot now owns an agent/tool-use harness — roughly doubling the runtime surface and adding a long-term maintenance burden (tool-call dialect drift across providers, schema quirks).
- Bad, because local-model tool-use reliability is materially lower than frontier Claude; executor quality degrades and the Nubosloop critic-swarm only partially compensates. Mitigated by loud preflight, capped iterations, and the recommendation to keep high-risk agents (security-reviewer, planner) on Claude by default.
- Bad, because a self-driven Bash/Edit loop steered by a local model is a new footgun/attack surface absent from the pure host path. Mitigated by mandatory
safe-pathrouting and full security-review parity, but it raises the security bar for the off-host path specifically. - Bad, because
model_providersadds a config schema that must be validated and documented; a misconfigured provider is a new failure class (caught loudly at load time per the doctrine). - Bad, because the Slice 1–3 runtime is a tool-execution primitive with only per-tool guards — orchestration guard parity (checkpoint, metrics, commit-policy, output-schema validation, in-session review-as-gate, Rule-9 audit, worktree) does not exist until the Slice 4 dispatch seam wires it. Until then off-host routing is deliberately non-dispatchable (the
resolve-modelCLI refuses it loudly) so the gap cannot silently misroute. Off-host Bash specifically cannot be filesystem-confined by a string denylist and is OFF by default pending worktree isolation; a critic-swarm review confirmed the denylist is bypassable and is a UX guardrail, not a boundary.
More Information
- Resolution seam:
bin/np-tools/resolve-model.cjs(resolveFromConfig) — extended to return{ provider, model, kind };lib/model-profiles.cjsgains the provider-aware mapping;lib/config-schema.cjs+lib/config-defaults.cjsgainmodel_providers+agent_routing(defaultmodel_providers.default = "claude", no routing entries ⇒ status quo). - Own loop:
lib/runtime/agent-loop.cjs,lib/runtime/providers/openai-compat.cjs,lib/runtime/tools/. - Delivery slices:
- (done) Provider routing seam:
lib/model-providers.cjs, config schema and defaults, andresolve-modelreturning{provider, kind, model}. An undefined provider is a loud error, and theresolve-modelCLI refusesopenai-compat(off-host-dispatch-not-wired) so a routed model id never leaks into aclaudespawn. Default-preserving, fully green. - (done)
openai-compatclient + read-only agent-loop (Read/Grep/Glob) + preflight —lib/runtime/{providers/openai-compat,preflight,agent-loop,tools}.cjs. Zero-dep, one-shot. - (done — tool-execution primitive, NOT orchestration parity) Write/Edit/Bash tool-runtime with per-tool guards:
safe-pathconfinement + symlink-leaf rejection on Write/Edit, advisory security scan (scanContent) surfaced into the result, Bash viaspawnSyncwith a denylist + timeout + output cap. Bash is opt-in (allowBash:true), OFF by default, because a denylist is not a sandbox and off-host Bash is not filesystem-confined. The mutating runtime exists; it is a primitive, not yet wired to any spawn path. 4a. (done) Callable dispatch seam + refuse-loud gates.lib/runtime/dispatch.cjs(dispatchOffHost) + CLInp-tools spawn-offhost: resolve provider (resolveFromConfignow also returnsbaseUrl/apiKeyEnv) → requireopenai-compat→assertPreflight(hard gate) → load agent prompt + tools (loadAgentSource) →runAgentLoop→ metricsbuildRecord/appendRecord(runtime parity; tokens still claude-only). A critic-swarm review hardened the callable surface so it never outruns the deferred guards: Rule-9-audited agents (np-executor/np-researcher/np-build-fixer) are refused off-host (offhost-audited-agent-unsupported);--allow-bashis refused (offhost-bash-requires-sandbox) pending worktree isolation; the native-pathresolve-modelrefusal was reworded to point atspawn-offhost(off-host-not-on-native-path). So off-host runs work today for non-audited, read/write (no-Bash) agents. 4b-i. (done) Rule-9 at the dispatch level.lib/runtime/tools/index.cjsgains a nativeknowledge-searchtool (the off-host equivalent of the Claude path's Bashnp-tools knowledge-search) that records the search-evidence ledger;toolsetFor({withSearch})injects it for audited agents (off-host Bash is gated, so this is their only path to comply).dispatchOffHostlifts the audited-agent refusal when a canonical--task-idis present (ids.cjs::TASK_ID_RE), injects the search tool, and runsauditToolUseafter the loop — the verdict ridesenvelope.rule9and is persisted to the checkpoint for the orchestrator to route, matching the native non-blocking audit. A critic-swarm confirmed the two-channel (tool-log + ledger) audit is forge-resistant and at parity with the Claude path. 4b-ii. Worktree-gated Bash (done) + execute-phase auto-wiring (remaining). Off-host Bash is now permitted only inside a real slice worktree —dispatchOffHostverifies the cwd is alib/worktree.cjsslice worktree and otherwise refuses (offhost-bash-requires-sandbox); the blanket CLI refusal is replaced by this confinement check, so model-driven shell can never touch the live tree. Executor auto-wiring (done):execute-phasebrancheskind === 'openai-compat'(detected viaresolve-model --json) intospawn-offhost --cwd <slice-worktree> --allow-bash --no-auditinstead of the host Agent spawn. Off-host requiresworkflow.worktree_isolation=true— it reuses the per-wave worktree (so model-driven edits are confined and the slice-end ff-merge lands the work; force-creating a worktree out of band would strand commits and break the orchestrator's cwd convention) and refuses loud otherwise.spawn-offhostruns--no-audit; the orchestrator runs the single canonical Step-4loop-audit-tool-usewith the returned tool-log, so the Layer-C stamp is produced exactly once (no double-routing). Commit-policy + in-session security review are already agent-agnostic (they run on the diff orchestrator-side) and apply unchanged.
- (done) Provider routing seam:
Resolved (not gaps): off-host metrics rows record runtime/duration/status but null tokens_in/out — the same rule [D-09] applies to every non-Claude runtime (codex, gemini, …), so off-host follows it by design, not omission. model_providers.default is the implicit claude-native default already (the setup-chosen environment is the host runtime, which is exactly that default).
Slice 4c (done — read-only roles, reuse the seam): off-host branches for np-critic/np-researcher in execute-phase, plus a generic output_lint hook on the dispatch envelope (spawn-offhost --output-schema <name>). (a) The critic runs --read-only; because the off-host Write tool confines to cwd it cannot write $TMPDIR/.../critic-*.json, so it emits the { "critic":"critic", "findings":[…] } object as its final message — the orchestrator writes that to $CRITIC_REPORT_PATH only after asserting critic ∈ {critic,style,tests,acceptance} (any other value is silently dropped by mergeCriticOutputs — project_np_critic_field_schema_bug), failing loud otherwise. (b) The researcher swarm runs $SWARM_K spawn-offhost --read-only --no-audit calls, each emitting the per-spawn consensus JSON {decisions,risks,patterns,open_questions,sources} that researcher-merge consumes (NOT the researcher-output markdown artifact — a different contract); a spawn whose output is not that JSON is substituted with an empty {} so the merge degrades gracefully instead of aborting the wave. The orchestrator stamps one loop-audit-tool-use per spawn (satisfying the post-researcher SKIP-GUARD). (c) np-researcher is Rule-9-audited, satisfied by the same injected knowledge-search tool the executor uses (--task-id required; the search tool is injected after the read-only filter, so read-only does not strip it). 4d. (done — rollout to non-execute workflows; same seam, no new decision). The dispatch seam is generic, so wiring the remaining workflows is a per-agent contract mapping, not new architecture. The dividing line is what the agent writes, and it collapses to three cases: live-code editors (executor/build-fixer — worktree + --allow-bash, execute-phase only); artefact writers that write M<NNN>-*.md under .nubos-pilot/ inside the repo cwd (planner, plan-checker, nyquist-auditor, architect, researcher-reconciler — default cwd, Write enabled, NO worktree, NO Bash, no emit-and-persist contract, because the target is already inside cwd — this is the key simplification over execute-phase, where the critic/researcher had to emit-as-message only because $TMPDIR is outside cwd); and read-only emitters (critic, verifier — --read-only, emit the verdict as the final message, orchestrator persists). Full pipeline coverage, mechanically enforced. Every workflow agent-spawn now has an off-host branch: execute-phase (executor, build-fixer, researcher, critic), plan-phase (planner, plan-checker), discuss-phase (sc-extractor), research-phase (researcher, reconciler), architect-phase (architect, researcher, critic), validate-phase (nyquist-auditor), verify-work (verifier), scan-codebase (codebase-documenter). bin/np-tools/resolve-model --kind gives each site a one-line native|openai-compat detector. scripts/check-offhost-coverage.cjs (a tests/ test) discovers every spawn-indicator and fails the suite if one lacks an off-host branch — drift-proof, so a new agent-spawn cannot ship native-only by omission. The audited np-researcher runs off-host with a synthetic canonical task-id ${MILESTONE_ID}-S000-T0000 (milestone-level "no slice/task" convention) so its Rule-9 ledger + audit apply — the earlier offhost-audited-agent-unsupported refusal is resolved, not relaxed (the gate still requires a TASK_ID_RE-valid id). np-security-reviewer / np-learnings-extractor spawn via spawn-headless (a claude -p subprocess in lib code), not a workflow orchestrator spawn. Their off-host routing is handled centrally inside spawn-headless: run() is now async and, when the agent routes to an openai-compat provider, calls dispatchOffHost instead of claude -p, writing the identical {result} envelope so review.cjs/extract.cjs parse it unchanged (the CLI dispatcher already awaits promises; both in-process callers — runReview/runExtract — were made async). One bounded constraint remains, surfaced loudly: the off-host toolset has no WebFetch/context7, so off-host np-researcher covers offline (knowledge-search) research only — online mode routes native. Net: every agent in every step can run on any openai-compat provider, native stays the default, and the coverage check (tests/check-offhost-coverage.test.cjs) fails the suite on any new native-only spawn site.
