ADR-0020: In-Session Security Review Layer — Catch Vulnerabilities While the Agent Writes

Status: Accepted
Date: 2026-06-02
Supersedes: None
Related: ADR-0010 (Execute-side Trust Layer), ADR-0019 (Plan-side Trust Layer)

Context and Problem Statement

Security feedback on agent-written code historically arrived at code-review time — too late to fold larger fixes back in cheaply. nubos-pilot already ships a post-milestone np-security-reviewer audit (/np:validate-phase), but that runs once per milestone, after many tasks have landed. There was no layer catching issues as the code is written, in any Claude Code session, regardless of the milestone workflow.

Anthropic ships exactly this as the official security-guidance Claude Code plugin: a hook-based, always-on, non-blocking layer that reviews the agent's own changes at three depths and feeds findings back into the same session. We want that capability as a first-class part of nubos-pilot — not an external add-on the customer must install separately, and not a Python/venv/marketplace dependency that breaks our language-agnostic, node-only install promise.

Decision Drivers

The earliest a fix is cheapest. Catch issues in the editor, before the PR.
Must be always-on with nothing to invoke — installed with the rest of nubos-pilot.
Must never block a write or commit. It is one layer of defense in depth, not a gate.
The reviewer must be independent — never the same instance that wrote the code grading itself.
Stay in the nubos-pilot stack: node/cjs, spawn-headless, the existing np-security-reviewer agent, the config.json toggle system. No Python, no venv, no marketplace, no CI/Action.

Considered Options

Bundle / auto-enable the official plugin. Rejected: pulls a Python 3.8+ runtime, a pip-installed Agent SDK and the Anthropic marketplace onto every customer machine, and the behaviour/cadence is Anthropic's, not ours. Breaks the language-agnostic install.
Build it as a CI / GitHub Action (like Anthropic's PR-time claude-code-security-review). Rejected: that is a different layer (PR-time), and out of scope for the in-session product feature.
Reimplement in the nubos-pilot stack, architecturally aligned to the plugin. Chosen.

Decision Outcome

Chosen: a node-only, hook-based in-session security layer that mirrors the official plugin's architecture (five lifecycle hooks, three review depths, the same caps and dedup rules), built on spawn-headless and a new session/diff mode of the existing np-security-reviewer agent, toggled through config.json.

Five Claude Code lifecycle hooks (one DRY script, verb-parameterized)

A single payload script np-security-hook.cjs is registered against five events, each piping the hook payload to np-tools security <verb> --stdin:

Hook event	Verb	Purpose
`SessionStart`	`session-start`	Initialize the session ledger
`UserPromptSubmit`	`baseline`	Capture the git working-tree baseline the turn-diff is measured against
`PostToolUse` (`Edit\|Write\|MultiEdit\|NotebookEdit`)	`scan`	Layer 1 — deterministic pattern scan, no model call
`Stop`	`review`	Layer 2 — harvest prior findings, spawn the background turn-diff review
`PostToolUse` (`Bash`, filtered to `git commit`/`git push`)	`commit`	Layer 3 — deeper background review of the agent's own commit

Three review depths

Layer 1 — on each edit. A deterministic regex/substring scan of the new content for known risky patterns (dynamic code execution, unsafe deserialization, DOM injection, workflow-file edits, hardcoded secrets). No model call, no cost. Each finding fires once per pattern per file per session.
Layer 2 — end of turn (Stop). Computes the git diff of everything the turn changed (capped at 30 files) and spawns an independent np-security-reviewer against it, detached in the background so the reply is not delayed. Findings land in the ledger and are surfaced on the next Stop as a non-blocking block decision that prompts the agent to fix them as a follow-up. Surfacing yields back to the user after at most 3 consecutive rounds.
Layer 3 — on commit/push (PostToolUse on Bash). A deeper, surrounding-context review of the commit the agent just made, capped at 20 per rolling hour, deduplicated against Layer-2 findings so a clean commit produces no output. Only the agent's own Bash commits are reviewed — never a user's shell or ! escape.

Non-blocking by construction

No layer denies a tool. Layer 3 runs after the commit (PostToolUse, not PreToolUse). Layer 2's Stopblock decision only re-prompts for the follow-up fix; it never prevents a write or commit. Findings reach the writing agent as instructions, and the review model can miss things — this is defense in depth, not a guarantee.

Independence

Layer 1 is a pure string match — no model. Layers 2 and 3 spawn a separate claude -p call (spawn-headless) with a fresh context and a security-only prompt that starts from the diff. The reviewer never wrote the code it reviews. The agent gains a session/diff input mode (Modus B): it returns a JSON findings envelope as its final message instead of writing M<NNN>-SECURITY.md, and stays read-only.

Configuration (`config.json`, always-on default)

A new security block, validated by the config schema and merged from defaults like every other nested toggle:

json

{
  "security": {
    "enabled": true,
    "scan_on_write": true,
    "review_on_stop": true,
    "review_on_commit": true,
    "custom_rules_path": null,
    "guidance_path": null,
    "review_timeout_ms": 180000,
    "max_stop_reviews_in_a_row": 3,
    "max_commit_reviews_per_hour": 20,
    "max_files_per_review": 30
  }
}

Built-in checks cannot be disabled individually — only whole layers or the feature. custom_rules_path (a JSON pattern file) and guidance_path (a markdown "what to watch for" file) are additive: they extend the built-ins and never suppress them. RULES.md/CONTEXT.md continue to authorize/neutralize findings as in the milestone audit. Toggling a flag takes effect at runtime; no reinstall is needed — disabled layers no-op.

Consequences

Good: security issues are found and fixed in the same session, not days later in review, with zero setup.
Good: no Python/venv/marketplace dependency — installs uniformly with the rest of nubos-pilot, into any language's project.
Good: reuses spawn-headless, the np-security-reviewer agent, the ledger/lock primitives and the config schema — one testable source of truth (lib/security/), no payload drift.
Bad: Layers 2 and 3 spend model usage like any other Claude request. Mitigated by the caps and by Layer 1 being free.
Bad: Claude-runtime only — the hooks are a Claude Code mechanism, so Codex/Gemini/OpenCode are out of scope for now.
Bad: per-edit Layer 1 adds a short node subprocess per write. Acceptable; writes are infrequent relative to reads and the scan is deterministic and fast.

More Information

Library: lib/security/patterns.cjs, lib/security/scan.cjs, lib/security/ledger.cjs, lib/security/review.cjs — pure functions plus the ledger (lock + atomic writes).
CLI verb: bin/np-tools/security.cjs — security session-start|baseline|scan|review|commit|run-review.
Hook script: templates/claude/payload/hooks/np-security-hook.cjs (one script, five registrations).
Hook registration: lib/install/claude-hooks.cjs (which: 'all'), wired from bin/install.js.
Agent: agents/np-security-reviewer.md — Modus B (session/diff) added; Modus A (milestone) unchanged.
Tests: lib/security/*.test.cjs, bin/np-tools/security.test.cjs, lib/install/claude-hooks.test.cjs.
Related ADRs:
- ADR-0010: Execute-side Trust Layer. This layer is orthogonal — a runtime safety net independent of the Nubosloop.
- ADR-0019: the plan-side counterpart in the same defense-in-depth spirit.

ADR-0020: In-Session Security Review Layer — Catch Vulnerabilities While the Agent Writes ​

Context and Problem Statement ​

Decision Drivers ​

Considered Options ​

Decision Outcome ​

Five Claude Code lifecycle hooks (one DRY script, verb-parameterized) ​

Three review depths ​

Non-blocking by construction ​

Independence ​

Configuration (config.json, always-on default) ​

Consequences ​

More Information ​