Appearance
ADR-0015: Named-Agent-Messaging for Inter-Agent Loop Coordination
Context and Problem Statement
The Nubosloop (ADR-0010) coordinates per-task spawns through file artefacts: nubosloop.checkpoint.json, the per-task tool_use_audit log (Layer C), the Critic's <report_path> (L5), and the orchestrator's stuck-state file. Within a round, the routing engine reads these artefacts and dispatches the next spawn.
Two coordination patterns are not addressed by these artefacts:
- Per-finding addressed handback between agents. When the Critic emits N findings, all of category
style → executorare merged into a single Build-Fixer prompt. The Executor cannot acknowledge per-finding ("understood; fixed; ready for re-check"); the Critic re-runs against the next round's diff and re-evaluates from scratch. Findings the Build-Fixer cannot resolve (genuinely ambiguous remediation) cannot be deferred or re-routed without falling out toaskuser. - Critic ↔ Executor dialogue inside a round. Today the round structure is fixed: Executor → Critic → Routing → either commit or re-loop. There is no mechanism for the Critic to ask the Executor a clarifying question ("did you intend to delete
FoobarService, or was that a side-effect?") without falling out toaskuser.
A persistent, addressed, file-based message channel inside .nubos-pilot/messages/ lets agents within a round (and within a task across rounds) carry on a structured dialogue. Each message carries from, to, kind, subject, body, and an optional in_reply_to thread-id. The mechanism is not a daemon and not a network bus. It is append-only filesystem state, identical in spirit to lib/handoff.cjs and the per-task audit log.
The Rule
Agents in np:execute-phase may exchange addressed messages via lib/messaging.cjs and the np:messages-* subcommands. Messages persist under .nubos-pilot/messages/ as JSON files. The Nubosloop loop-termination condition is extended: a round may not commit while any expects_reply: true message in the current task is unarchived.
Decision Drivers
- Adressierte Re-Check-Loops: the Critic-to-Executor handback today is per-round, not per-finding. Per-finding addressing reduces re-evaluation cost and surfaces genuinely-stuck findings earlier (a finding that bounces back and forth ≥ 2 rounds is a stuck signal, not an executor-quality signal).
- Audit-trail richness: the message log is a Layer-C-adjacent artefact: same threat model (synthetic evidence), same defence (orchestrator-stamped, filesystem-witnessed). Critics auditing prior phases see a thread, not a state-snapshot.
- No daemon, no bus: every message is a file. Reads are
readdir + readFile. No coordination process. (ADR-0001.) - Zero runtime deps: pure Node
fs+crypto. No new package.json entry. (ADR-0002.) - Three-tree orthogonality:
.nubos-pilot/messages/is a strict Project-State sub-tree. (ADR-0005.)
Considered Options
- A: No messaging, status quo. Reject: per-finding handback impossible; round-level granularity is the only available unit.
- B: In-process message bus (e.g. EventEmitter inside the orchestrator). Reject: violates ADR-0001 (in-session daemon-shaped state) and is invisible to Layer-C audit.
- C: Network message broker (Redis / NATS / etc.). Reject: violates ADR-0001 hard, requires service install on consumer machine.
- D: File-based addressed messaging in
.nubos-pilot/messages/, append-only manifest, archive-on-ack. Chosen.
Decision Outcome
Chosen: Option D, file-based addressed messaging, because it preserves the no-daemon stance, ships zero deps, integrates with the existing audit-trail philosophy of ADR-0010, and gives the routing engine a concrete loop-termination predicate (no unarchived expects_reply messages).
Layout
.nubos-pilot/messages/ # Project-State sub-tree
inbox/<agent-name>/<msg-id>.json # ungelesen, addressed at <agent-name>
archive/<msg-id>.json # processed (acked / replied)
archive/by-task/<task-id>/<msg-id>.json # post-task historical archive
manifest.jsonl # append-only audit log, all events<msg-id> is a UUIDv4 plus a millisecond timestamp prefix for sort-stability: <unix-ms>-<uuid>. Filenames sort chronologically, and no two messages share an id even on the same millisecond.
Message schema
json
{
"id": "1730000000123-9b3e...",
"from": "np-critic",
"to": "np-executor",
"phase": "M005-S007-T0002",
"round": 2,
"kind": "request|response|notify",
"subject": "filament-resource-policy-missing",
"body": "...",
"expects_reply": true,
"in_reply_to": "1729999999999-...|null",
"created_at": "2026-05-08T..."
}kind semantics:
request:expects_reply: true; receiver mustarchivewith a reply or escalate.response:expects_reply: false; carriesin_reply_topointing at the request it answers; archives the request as a side-effect.notify:expects_reply: false; no reply required; archived by receiver after read.
subject is a kebab-case finding-category or topic id, matching the Critic finding-category taxonomy where applicable (style, dead-code, missing-test, weak-assertion, unmet-criterion, scope-creep, …) so the routing engine can index by it.
Library surface
lib/messaging.cjs:
async function send(opts: { from, to, kind, subject, body, expects_reply, in_reply_to? }): Promise<MessageId>
async function inbox(agent: string, opts?: { kind?, since? }): Promise<Message[]>
async function archive(msg_id: string): Promise<void>
async function thread(msg_id: string): Promise<Message[]>
async function pendingReplies(task_id: string): Promise<Message[]>
async function sweepTaskOnCommit(task_id: string): Promise<number>Each send() writes to inbox/<to>/<id>.json and appends a sent event to manifest.jsonl. Each archive() moves inbox/<...>/<id>.json to archive/<id>.json and appends an archived event. All file mutations go through atomicWriteFileSync / rename; no partial-write window.
Subcommands
np:messages-send: emit a message; prints the new id.np:messages-inbox: list ungelesen messages addressed to an agent.np:messages-archive: mark processed; refuses withmessages-archive-without-replywhenexpects_reply: trueAND no reply was sent.np:messages-thread: print the reply-chain in causal order.
Nubosloop integration
Critic emits findings → optional addressed messages. When the single-Critic spawn (ADR-0010 §Single-Critic Revision) wants to address a specific Executor question or per-finding clarification, it calls
messages-send --to np-executor --kind request --subject <finding-category> --expects-replyfrom inside the spawn. The aggregate findings JSON still travels via<report_path>(L5); messages carry the dialogue layer, not the findings layer.Executor / Build-Fixer reads inbox before writing. Round-2+ Build-Fixer's prompt includes
messages-inbox --agent np-executorin the read-list. Replies are sent viamessages-send --kind response --in-reply-to <id>.Loop-termination predicate.
bin/np-tools/loop-run-round.cjs::_runCommitLayer-B precondition is extended: in addition toverify_exit_code=0andfindings: [],pendingReplies(task_id).length === 0MUST hold. A pending reply means a Critic question is unanswered; commit is blocked, the round routes back to Executor with the open inbox surfaced viadetails.pending_subjects.Phase-completion sweep. After a successful commit,
_runCommitcallssweepTaskOnCommit(task_id)which moves every message with the matchingphasefrominbox/andarchive/intoarchive/by-task/<task_id>/and emits atask-sweptevent inmanifest.jsonl. Future tasks see clean inboxes; the per-task audit-trail stays accessible.Stuck escalation. If
pendingReplies(task_id)does not shrink across two consecutive rounds (the Executor's reply doesn't satisfy the Critic and a new request bounces back), the routing engine treats this as amessaging-stalematefinding and triggers the standardaskuserfour-options dialog (ADR-0010 §Failure Mode). (Routing-engine entry tracked separately; the surface is in place.)
Layer-C compatibility
Messaging events do not replace the Layer-C audit; they augment it. A Critic that emits messages-send --kind request from inside its spawn still has its loop-audit-tool-use --agent np-critic stamp applied by the orchestrator after spawn-return. The message itself is an additional artefact, not a substitute for the audit entry.
A hostile orchestrator that wants to fake messaging can write inbox files directly (filesystem is unguarded). This is the same threat model as Layer-C today (the audit log is appended by the orchestrator, not by the runtime) and the same mitigation applies: future Stufe-2 runtime instrumentation will stamp message provenance the orchestrator cannot forge.
Cleanup and lifecycle
- Per-task scope. Messages are scoped to the task that produced them (
phasefield). At task-commit time, all messages with thatphasemove frominbox/andarchive/intoarchive/by-task/<phase>/for historical lookup but not active routing. - Manifest is append-only.
manifest.jsonlis never truncated; it is the audit-trail. - TTL: none in v1. If
inbox/accumulates due to abandoned tasks (operator-killed mid-task),np:doctorsurfacesmessaging-orphan-inboxwith--fixsemantics that move orphans toarchive/orphan/. (Doctor finding tracked separately.)
Messaging is never committed
Per the same User-Vorgabe as ADR-0014: .nubos-pilot/messages/ is runtime-state, not source-of-truth. The directory is added to the consumer-project's .gitignore. The audit-trail value lives in the directory during a project's life; replay from manifest.jsonl is possible but not load-bearing.
Consequences
- Good, because per-finding Critic ↔ Executor dialogue is now expressible without falling out to
askuser. - Good, because
pendingRepliesis a deterministic loop-termination predicate, not a heuristic. - Good, because the message log is a Layer-C-adjacent audit artefact: Critics in later phases can replay the dialogue for stuck-detection and recurring-pattern analysis.
- Good, because zero new runtime deps: pure
fs+crypto.randomUUID. - Bad, because the messaging surface is one more thing to teach agents; agent prompts grow. Mitigation: optional surface, so agents that don't
sendcontinue to work as today. - Bad, because filesystem-message-races on parallel-spawn topologies (Researcher-Schwarm
k=3, future parallel critics) require care. Mitigation: filenames are timestamp-prefixed UUIDs; readers sort by filename. - Bad, because a wave of orphan inbox messages from killed tasks is possible. Mitigation:
np:doctorcovers it;archive/by-task/retention is bounded by task-archive policy.
More information
- Library:
lib/messaging.cjs;bin/np-tools/loop-run-round.cjs::_runCommit(extended Layer-B precondition + sweep hook). - Subcommand:
bin/np-tools/messages-{send,inbox,archive,thread}.cjs. - Concept: Named-Agent-Messaging.
- Agents:
agents/np-critic.md(Inter-Agent Messaging section),agents/np-executor.md(Round 2+ inbox-read),agents/np-build-fixer.md(Step 0 inbox-read).
This ADR specifies the in-task addressed-messaging surface. Inter-task and inter-phase messaging (operator-to-agent notes, mid-phase escalation tickets) are out-of-scope and remain handled by lib/handoff.cjs and the existing askuser path.
