ADR-0021: Model-Agnostic Agent Dispatch — Provider Abstraction + Own Tool-Use Loop

Status: Accepted. The decision (provider abstraction, a nubos-pilot-owned one-shot tool-use loop, and guard parity) is implemented and proven end-to-end through its most demanding consumer: the audited, file-writing, Bash-running off-host executor wired into execute-phase. The read-only off-host roles (critic, researcher) reuse the same seam; they are downstream work, not a separate decision (see Slice 4c).
Date: 2026-06-16
Supersedes: None
Related: ADR-0001 (No-Daemon), ADR-0002 (Zero Runtime Deps), ADR-0010 (Execute-side Trust Layer), ADR-0020 (In-Session Security Review)

Context and Problem Statement

Today every agent nubos-pilot spawns runs on a Claude model. The resolution chain is clean but Claude-only:

agent frontmatter `tier` (haiku|sonnet|opus)
  → config.model_profile (frontier|quality|balanced|budget|inherit)
  → resolveAlias(tier, profile)                       [lib/model-profiles.cjs]
  → MODEL_ALIAS_MAP[alias]                             [Claude model id]
  → host runtime spawns the subagent                  [bin/np-tools resolve-model]

agents/*.md may declare only tier; model and model_profile are forbidden frontmatter fields. The single seam where a tier becomes a concrete model is resolveFromConfig() in bin/np-tools/resolve-model.cjs.

Users want a model-agnostic workspace: choose, per agent, which model runs it — including non-Anthropic hosted models (OpenAI, xAI/Grok) and local models (Ollama, vLLM, LM Studio). Concretely: "planner on Claude opus, critic on OpenAI gpt-4o, executor on local Ollama qwen."

There are two structurally different ways to deliver this, and the project already contains the seam for one of them:

Host-delegated. nubos-pilot is primarily a generator/orchestrator that emits artifacts for a host coding agent. The lib/runtime/ adapters already model this: each host (claude, codex, gemini, opencode, qwen, …) carries a capabilities.modelResolution field (claude='profile', qwen='inherit'). Several hosts (OpenCode, Qwen Code) are already multi-provider and run local models natively. For those, model-agnosticism is the host's job; nubos-pilot need only emit the right per-agent model hint in that host's format.
Own dispatch layer. When the active host cannot route a specific agent to a non-native model — most importantly Claude Code, whose Agent tool only accepts Claude tiers — nubos-pilot must run the agent loop itself against an OpenAI-compatible endpoint.

The requirement explicitly includes routing an individual agent (e.g. the executor) to a local model while the rest of the run stays on the default host. Host-delegation cannot cover that case, so the own dispatch layer is required. The hard part: an executor is not a single prompt, it is a tool-use loop (Read, Edit, Bash, Grep, Glob). Claude Code supplies that loop today. Off-host, nubos-pilot has to own it.

Decision Drivers

Per-agent granularity is the point. "Executor on Ollama, planner on Claude, in the same run" cannot be expressed by switching the host globally; it must be a per-agent routing decision nubos-pilot resolves.
The default must not regress. The overwhelming common case is Claude (or the setup-chosen host). It must keep working byte-for-byte; model-agnosticism is opt-in via config.
[ADR-0001] No daemon. Whatever runs a local model must be a one-shot invocation per spawn — no background server owned by nubos-pilot. (The model server itself, e.g. ollama serve, is the user's, not ours.)
[ADR-0002] Zero runtime dependencies. No openai/@anthropic-ai SDK. The provider client must be built on Node's built-in fetch.
Guard parity is non-negotiable. The Nubosloop trust layer (ADR-0010), output-schema enforcement (ADR-0017), working-tree safety, commit-policy, and in-session security review (ADR-0020) all wrap Claude subagents today. An off-host executor that bypassed them would be a hole in every one of those guarantees.
Local tool-use is unreliable. Small local models are materially weaker at multi-step function-calling than frontier Claude. The design must fail loudly and early rather than silently produce garbage.

Considered Options

Host-delegation only (recommend OpenCode/Qwen as the host for the model-agnostic scenario; emit per-agent model hints). Rejected as the sole answer. It is the cheaper, lower-risk path and remains the right tool when the user is willing to switch host — but it cannot route a single agent off-model while keeping the default host, which is the stated requirement. Kept as a complementary path (see Consequences).
Session-wide proxy (ANTHROPIC_BASE_URL → LiteLLM translating to Ollama). Rejected. It is session-global, not per-agent, and routes all agents — including ones that must stay on Claude — through a translation layer, with no per-agent control and degraded tool-use fidelity across the board.
Forbid off-host execution for tool-using agents; allow only single-shot agents (critics, researchers). Rejected for this ADR (though it is exactly Slice 2's intermediate state). The requirement explicitly includes the executor.
Provider abstraction + own one-shot tool-use loop, engaged only when config routes an agent to a non-native provider. Chosen.

Decision Outcome

Chosen: a provider dimension on top of the existing tier/profile resolution, plus a nubos-pilot-owned, one-shot, zero-dependency tool-use loop that engages only for agents routed to a non-native provider. The default path (native host, Claude tiers) is unchanged.

Configuration surface

Two new top-level config keys. Both optional; absent ⇒ today's behaviour exactly.

jsonc

{
  "model_providers": {
    "default": "claude",                       // = setup-chosen host; provider used when no routing entry matches
    "claude": { "kind": "native" },            // delegate to the host runtime (Claude Code Agent tool, etc.)
    "openai": {
      "kind": "openai-compat",
      "base_url": "https://api.openai.com/v1",
      "api_key_env": "OPENAI_API_KEY",
      "models": { "haiku": "gpt-4o-mini", "sonnet": "gpt-4o", "opus": "gpt-4.1" }
    },
    "grok":   { "kind": "openai-compat", "base_url": "https://api.x.ai/v1", "api_key_env": "XAI_API_KEY",
                "models": { "sonnet": "grok-2", "opus": "grok-2" } },
    "ollama": { "kind": "openai-compat", "base_url": "http://localhost:11434/v1",
                "models": { "haiku": "qwen2.5-coder:7b", "sonnet": "qwen2.5-coder:32b", "opus": "qwen2.5-coder:32b" } }
  },

  "agent_routing": {
    "np-planner":  { "provider": "claude", "model": "claude-opus-4-7" },
    "np-critic*":  { "provider": "openai", "model": "gpt-4o" },
    "np-executor": { "provider": "ollama", "model": "qwen2.5-coder:32b" }
  }
}

kind: "native" ⇒ the host runtime executes the agent (status quo; for claude that is Claude Code's Agent tool). tier resolution is untouched.
kind: "openai-compat" ⇒ a single fetch-based client covers OpenAI, xAI/Grok, Ollama, vLLM, LM Studio, LiteLLM — they all speak POST /v1/chat/completions with tool-calling. base_url is the only locator; api_key_env names the env var (omitted for keyless local servers).

Resolution precedence (extends `resolveFromConfig`)

For an agent, resolve-model now returns { tier, profile, provider, model, kind }:

agent_routing[<exact-agent>] — exact name wins.
agent_routing[<glob>] — e.g. np-critic* matches all four critic agents.
model_providers.default provider, with the agent's tier mapped through that provider's models table.

Within a matched routing entry: an explicit model pins the model; otherwise the provider's models[tier] is used (so the agent's intrinsic tier still drives size — 7b vs 32b — without pinning). tier remains the agent's intrinsic difficulty property (ADR-respecting); agent_routing is the orthogonal where-it-runs layer.

Loud, not silent (per the hardening doctrine). A routing entry referencing an undefined provider is a hard config error at load time — never a silent fallback to Claude:

config error: agent_routing["np-executor"] references provider "ollama",
              but model_providers.ollama is not defined.

Dispatch seam

At spawn time, after resolution:

kind === "native" → emit the host spawn exactly as today (model id / alias / inherit into the host's Agent tool).
kind === "openai-compat" → invoke lib/runtime/agent-loop.cjs as a one-shot subprocess: it builds messages from the agent's system prompt (agents/<name>.md) + the task, advertises the agent's declared tools as OpenAI function schemas, POSTs to base_url, executes returned tool-calls against the workspace, appends results, and loops until a final answer or an iteration/budget cap. Output is consumed by the workflow exactly as a Claude subagent's report is.

Provider abstraction, tool-runtime, guard parity (planned layout)

lib/runtime/providers/openai-compat.cjs — zero-dep fetch client: chat({ messages, tools, model, signal }) → { content, tool_calls }. No SDK. No daemon — process exits when the loop returns.
lib/runtime/agent-loop.cjs — the one-shot harness that drives the loop.
lib/runtime/tools/ — workspace implementations of Read/Grep/Glob (Slice 2) and Write/Edit/Bash (Slice 3), all routed through the existing lib/safe-path.cjs.
Guard parity. The off-host executor path runs through the same commit-policy, working-tree guard (no autonomous stash/checkout/restore), output-lint, Nubosloop Rule-9 audit, and security-review (ADR-0020) as the Claude path. These guards move behind a shared seam so both paths call them; they are not reimplemented per provider.
Preflight (before any off-host spawn): daemon reachable (GET /api/tags for Ollama, or a cheap models call), requested model present, tool-calling supported. A failed preflight aborts with an actionable message (run: ollama pull <model>), never mid-task.

Compliance with standing invariants

ADR-0001 (No-Daemon): the loop is one-shot per spawn. nubos-pilot never owns a long-lived process; the model server is external and user-managed.
ADR-0002 (Zero Runtime Deps): the client uses global fetch only. No provider SDK is added.

Consequences

Good, because per-agent model routing — including local models — works under any host, including Claude Code, without switching the host or routing unrelated agents off-model.
Good, because the default (native host, Claude tiers) is untouched: absent the new config keys, resolution and spawning are byte-for-byte as before.
Good, because the host-delegation path (OpenCode/Qwen as a multi-provider host) remains available and is the cheaper answer when the user is willing to switch host — this ADR does not remove it; it adds the per-agent capability the host axis cannot provide.
Good, because guard parity is explicit: the off-host executor is held to the same trust/security/commit guarantees as the Claude path, by design rather than by accident.
Bad, because nubos-pilot now owns an agent/tool-use harness — roughly doubling the runtime surface and adding a long-term maintenance burden (tool-call dialect drift across providers, schema quirks).
Bad, because local-model tool-use reliability is materially lower than frontier Claude; executor quality degrades and the Nubosloop critic-swarm only partially compensates. Mitigated by loud preflight, capped iterations, and the recommendation to keep high-risk agents (security-reviewer, planner) on Claude by default.
Bad, because a self-driven Bash/Edit loop steered by a local model is a new footgun/attack surface absent from the pure host path. Mitigated by mandatory safe-path routing and full security-review parity, but it raises the security bar for the off-host path specifically.
Bad, because model_providers adds a config schema that must be validated and documented; a misconfigured provider is a new failure class (caught loudly at load time per the doctrine).
Bad, because the Slice 1–3 runtime is a tool-execution primitive with only per-tool guards — orchestration guard parity (checkpoint, metrics, commit-policy, output-schema validation, in-session review-as-gate, Rule-9 audit, worktree) does not exist until the Slice 4 dispatch seam wires it. Until then off-host routing is deliberately non-dispatchable (the resolve-model CLI refuses it loudly) so the gap cannot silently misroute. Off-host Bash specifically cannot be filesystem-confined by a string denylist and is OFF by default pending worktree isolation; a critic-swarm review confirmed the denylist is bypassable and is a UX guardrail, not a boundary.

More Information

Resolution seam: bin/np-tools/resolve-model.cjs (resolveFromConfig) — extended to return { provider, model, kind }; lib/model-profiles.cjs gains the provider-aware mapping; lib/config-schema.cjs + lib/config-defaults.cjs gain model_providers + agent_routing (default model_providers.default = "claude", no routing entries ⇒ status quo).
Own loop: lib/runtime/agent-loop.cjs, lib/runtime/providers/openai-compat.cjs, lib/runtime/tools/.
Delivery slices:
1. (done) Provider routing seam: lib/model-providers.cjs, config schema and defaults, and resolve-model returning {provider, kind, model}. An undefined provider is a loud error, and the resolve-model CLI refuses openai-compat (off-host-dispatch-not-wired) so a routed model id never leaks into a claude spawn. Default-preserving, fully green.
2. (done) openai-compat client + read-only agent-loop (Read/Grep/Glob) + preflight — lib/runtime/{providers/openai-compat,preflight,agent-loop,tools}.cjs. Zero-dep, one-shot.
3. (done — tool-execution primitive, NOT orchestration parity) Write/Edit/Bash tool-runtime with per-tool guards: safe-path confinement + symlink-leaf rejection on Write/Edit, advisory security scan (scanContent) surfaced into the result, Bash via spawnSync with a denylist + timeout + output cap. Bash is opt-in (allowBash:true), OFF by default, because a denylist is not a sandbox and off-host Bash is not filesystem-confined. The mutating runtime exists; it is a primitive, not yet wired to any spawn path. 4a. (done) Callable dispatch seam + refuse-loud gates. lib/runtime/dispatch.cjs (dispatchOffHost) + CLI np-tools spawn-offhost: resolve provider (resolveFromConfig now also returns baseUrl/apiKeyEnv) → require openai-compat → assertPreflight (hard gate) → load agent prompt + tools (loadAgentSource) → runAgentLoop → metrics buildRecord/appendRecord (runtime parity; tokens still claude-only). A critic-swarm review hardened the callable surface so it never outruns the deferred guards: Rule-9-audited agents (np-executor/np-researcher/np-build-fixer) are refused off-host (offhost-audited-agent-unsupported); --allow-bash is refused (offhost-bash-requires-sandbox) pending worktree isolation; the native-path resolve-model refusal was reworded to point at spawn-offhost (off-host-not-on-native-path). So off-host runs work today for non-audited, read/write (no-Bash) agents. 4b-i. (done) Rule-9 at the dispatch level. lib/runtime/tools/index.cjs gains a native knowledge-search tool (the off-host equivalent of the Claude path's Bash np-tools knowledge-search) that records the search-evidence ledger; toolsetFor({withSearch}) injects it for audited agents (off-host Bash is gated, so this is their only path to comply). dispatchOffHost lifts the audited-agent refusal when a canonical --task-id is present (ids.cjs::TASK_ID_RE), injects the search tool, and runs auditToolUse after the loop — the verdict rides envelope.rule9 and is persisted to the checkpoint for the orchestrator to route, matching the native non-blocking audit. A critic-swarm confirmed the two-channel (tool-log + ledger) audit is forge-resistant and at parity with the Claude path. 4b-ii. Worktree-gated Bash (done) + execute-phase auto-wiring (remaining). Off-host Bash is now permitted only inside a real slice worktree — dispatchOffHost verifies the cwd is a lib/worktree.cjs slice worktree and otherwise refuses (offhost-bash-requires-sandbox); the blanket CLI refusal is replaced by this confinement check, so model-driven shell can never touch the live tree. Executor auto-wiring (done): execute-phase branches kind === 'openai-compat' (detected via resolve-model --json) into spawn-offhost --cwd <slice-worktree> --allow-bash --no-audit instead of the host Agent spawn. Off-host requires workflow.worktree_isolation=true — it reuses the per-wave worktree (so model-driven edits are confined and the slice-end ff-merge lands the work; force-creating a worktree out of band would strand commits and break the orchestrator's cwd convention) and refuses loud otherwise. spawn-offhost runs --no-audit; the orchestrator runs the single canonical Step-4 loop-audit-tool-use with the returned tool-log, so the Layer-C stamp is produced exactly once (no double-routing). Commit-policy + in-session security review are already agent-agnostic (they run on the diff orchestrator-side) and apply unchanged.

Resolved (not gaps): off-host metrics rows record runtime/duration/status but null tokens_in/out — the same rule [D-09] applies to every non-Claude runtime (codex, gemini, …), so off-host follows it by design, not omission. model_providers.default is the implicit claude-native default already (the setup-chosen environment is the host runtime, which is exactly that default).

Slice 4c (done — read-only roles, reuse the seam): off-host branches for np-critic/np-researcher in execute-phase, plus a generic output_lint hook on the dispatch envelope (spawn-offhost --output-schema <name>). (a) The critic runs --read-only; because the off-host Write tool confines to cwd it cannot write $TMPDIR/.../critic-*.json, so it emits the { "critic":"critic", "findings":[…] } object as its final message — the orchestrator writes that to $CRITIC_REPORT_PATH only after asserting critic ∈ {critic,style,tests,acceptance} (any other value is silently dropped by mergeCriticOutputs — project_np_critic_field_schema_bug), failing loud otherwise. (b) The researcher swarm runs $SWARM_K spawn-offhost --read-only --no-audit calls, each emitting the per-spawn consensus JSON {decisions,risks,patterns,open_questions,sources} that researcher-merge consumes (NOT the researcher-output markdown artifact — a different contract); a spawn whose output is not that JSON is substituted with an empty {} so the merge degrades gracefully instead of aborting the wave. The orchestrator stamps one loop-audit-tool-use per spawn (satisfying the post-researcher SKIP-GUARD). (c) np-researcher is Rule-9-audited, satisfied by the same injected knowledge-search tool the executor uses (--task-id required; the search tool is injected after the read-only filter, so read-only does not strip it). 4d. (done — rollout to non-execute workflows; same seam, no new decision). The dispatch seam is generic, so wiring the remaining workflows is a per-agent contract mapping, not new architecture. The dividing line is what the agent writes, and it collapses to three cases: live-code editors (executor/build-fixer — worktree + --allow-bash, execute-phase only); artefact writers that write M<NNN>-*.md under .nubos-pilot/ inside the repo cwd (planner, plan-checker, nyquist-auditor, architect, researcher-reconciler — default cwd, Write enabled, NO worktree, NO Bash, no emit-and-persist contract, because the target is already inside cwd — this is the key simplification over execute-phase, where the critic/researcher had to emit-as-message only because $TMPDIR is outside cwd); and read-only emitters (critic, verifier — --read-only, emit the verdict as the final message, orchestrator persists). Full pipeline coverage, mechanically enforced. Every workflow agent-spawn now has an off-host branch: execute-phase (executor, build-fixer, researcher, critic), plan-phase (planner, plan-checker), discuss-phase (sc-extractor), research-phase (researcher, reconciler), architect-phase (architect, researcher, critic), validate-phase (nyquist-auditor), verify-work (verifier), scan-codebase (codebase-documenter). bin/np-tools/resolve-model --kind gives each site a one-line native|openai-compat detector. scripts/check-offhost-coverage.cjs (a tests/ test) discovers every spawn-indicator and fails the suite if one lacks an off-host branch — drift-proof, so a new agent-spawn cannot ship native-only by omission. The audited np-researcher runs off-host with a synthetic canonical task-id ${MILESTONE_ID}-S000-T0000 (milestone-level "no slice/task" convention) so its Rule-9 ledger + audit apply — the earlier offhost-audited-agent-unsupported refusal is resolved, not relaxed (the gate still requires a TASK_ID_RE-valid id). np-security-reviewer / np-learnings-extractor spawn via spawn-headless (a claude -p subprocess in lib code), not a workflow orchestrator spawn. Their off-host routing is handled centrally inside spawn-headless: run() is now async and, when the agent routes to an openai-compat provider, calls dispatchOffHost instead of claude -p, writing the identical {result} envelope so review.cjs/extract.cjs parse it unchanged (the CLI dispatcher already awaits promises; both in-process callers — runReview/runExtract — were made async). One bounded constraint remains, surfaced loudly: the off-host toolset has no WebFetch/context7, so off-host np-researcher covers offline (knowledge-search) research only — online mode routes native. Net: every agent in every step can run on any openai-compat provider, native stays the default, and the coverage check (tests/check-offhost-coverage.test.cjs) fails the suite on any new native-only spawn site.

Related ADRs:
- ADR-0001: one-shot loop, no nubos-pilot-owned process.
- ADR-0002: fetch-only client, no SDK.
- ADR-0010: the off-host executor runs through the same Execute-side Trust Layer.
- ADR-0020: security-review parity applies to the off-host path, where the footgun is larger.

ADR-0021: Model-Agnostic Agent Dispatch — Provider Abstraction + Own Tool-Use Loop ​

Context and Problem Statement ​

Decision Drivers ​

Considered Options ​

Decision Outcome ​

Configuration surface ​

Resolution precedence (extends resolveFromConfig) ​

Dispatch seam ​

Provider abstraction, tool-runtime, guard parity (planned layout) ​

Compliance with standing invariants ​

Consequences ​

More Information ​