Planning, dispatch, and oversight

Fuze Agent ships three runtime surfaces that turn an evidence-grade audit story into a usable agent framework: a plan tool that the model commits to before touching sensitive data, capability envelopes for typed sub-agent dispatch, and an Article 14 oversight primitive that suspends a run durably until a human reviewer resolves it. All three feed the same hash-chained evidence ledger.

The model in one paragraph

A Fuze project has top-level agents (defined via defineAgent) and roles (defined via defineAgentRole). Agents carry full compliance metadata — purpose, lawfulBasis, annexIIIDomain, producesArt22Decision, art14OversightPlan. Roles are capability envelopes: they declare what tools, data classes, and EU residency a child can operate under, without claiming a specific task. Agents dispatch to roles at runtime with a freeform task brief. The runtime auto-injects three plan tools when planning is enabled, and one typed dispatch_<role> tool per envelope listed in canDispatch.

Folder convention

code
my-app/
  agents/
    roles/
      researcher.ts          ← defineAgentRole(...)
      personal-data-researcher.ts
    loan-orchestrator/
      agent.ts               ← defineAgent(...) imports the markdown
      instructions.md        ← short behavioral prompt
      context/
        underwriting-policy.md
        edge-cases.md

The convention is documented but not required — defineAgent and defineAgentRole work programmatically. fromMarkdown(path) reads + sha256-hashes a file at module-load time so the resolved string and its hash become part of the agent fingerprint.

Planning

When def.annexIIIDomain !== 'none' or def.producesArt22Decision === true, the runtime defaults to requiring a plan before any tool with dataClassification: 'personal' | 'special-category' can fire. Override per-agent via planning: { required: false }. The override leaves an evidence trail.

ts
defineAgent({
  // ...
  planning: { required: 'auto-when-high-risk', minSteps: 2, maxSteps: 10 },
})

Three tools are auto-injected and visible to the model under their declared names:

  • commit_plan({ steps: [{ content, active_form, parent_step_id? }] }) — version 1
  • update_plan_step({ step_id, status, evidence_refs?, unlink_refs?, note? }) — append-only
  • revise_plan({ add_steps?, remove_steps?, reorder?, rationale }) — produces v2..n

Step IDs are minted on creation (step_<sha256-prefix>) and never reassigned across revisions. Splits create new IDs with derived_from: [old_id] edges. Removed steps get status: 'superseded' — their evidence stays linked. Once a step transitions to done, status is locked.

Auto-capture. Evidence rows emitted while a step is in_progress are auto-linked to that step. Each transition records linkage_source: 'auto' | 'explicit' | 'corrected' so auditors see which links the agent declared vs. which the runtime inferred. Use evidence_refs to add links the runtime missed; use unlink_refs to correct auto-captured mistakes.

Per-step lifecycle timestamps. createdAt, startedAt, suspendedAt, resumedAt, endedAt are populated automatically on transitions. Regulators care how long humans were in the loop.

Dispatch — capability envelopes

defineAgentRole produces a typed envelope. The role's roleHash covers name, instructions hash, context manifest, tools, data classification, residency, and view names — so changes are auditable.

ts
const researcher = defineAgentRole({
  name: 'researcher',
  instructions: 'Answer with citations.',
  tools: [searchPolicies, getPrecedent],
  dataClassification: 'public',
  outputSchema: z.object({ summary: z.string() }),
  outputViews: {
    citations: z.object({ sources: z.array(z.object({ url: z.string(), quote: z.string() })) }),
    table: z.object({ rows: z.array(z.record(z.string(), z.unknown())) }),
  },
  maxSteps: 8,
})

The orchestrator declares which envelopes it can dispatch into:

ts
const orchestrator = defineAgent({
  // ...
  canDispatch: [researcher, personalDataResearcher, computational],
})

The runtime synthesizes one tool per role: dispatch_researcher, dispatch_personal_data_researcher, etc. Each accepts { task, view?, forward_context?, forward? }. The model picks view from the role's outputViews enum at call time; the return type narrows accordingly.

No metadata inheritance. A child does not operate under the parent's lawfulBasis or annexIIIDomain. Roles either declare their own (when their data classification requires one) or declare none (and then can't process the relevant data classes). Annex IV documentation reads the envelope, not the call site.

Forwarding is opt-in, with role-level pull.

  • requiresTenant: true on a role auto-forwards tenant when the parent has one. If the parent has no tenant, dispatch fails closed.
  • requiresPrincipal: true is the same for principal.
  • The parent can additionally pass forward: ['principal', 'subjectRef'] to flow more context per call.

The runChild callback you provide to runAgent({ runChild }) receives a fully-typed RunChildInput and returns a DispatchResult<T>:

ts
type DispatchResult<T> =
  | { ok: true;  output: T;            runId: RunId; chainRoot: string }
  | { ok: false; failure: AgentRunFailure; runId: RunId; chainRoot: string }

Children's exceptions never bubble to the parent — they become typed failures. The parent sees { ok: false, failure: { category, message, attribution, retriable, attempt, childFailure? } } and decides whether to retry.

Article 14 oversight (requestOversight)

ctx.requestOversight() (or the standalone requestOversight helper) suspends the run durably through a DurableExecutionAdapter. Restate is the intended production substrate; an InMemoryDurableAdapter ships for tests and local dev.

ts
import { requestOversight, InMemoryDurableAdapter } from '@fuze-ai/agent'

const adapter = new InMemoryDurableAdapter()

const decision = await requestOversight(
  {
    adapter,
    emitSuspendEvent: (req, hash) => emitter.emit({ /* ... */ }),
    emitResumeEvent: (req, dec, entryHash) => emitter.emit({ /* ... */ }),
  },
  {
    runId,
    reason: 'tool_high_risk',
    evidence: { tool: 'send_email', to: redactedRef },
    reviewerHint: 'team-compliance',
    timeoutMs: 60 * 60_000,
    proposedArgs: { subject: 'original', body: '...' },
  },
)

if (decision.decision === 'approve') { /* proceed */ }
if (decision.decision === 'modify')  { useArgs(decision.modifiedArgs) }
if (decision.decision === 'reject')  { halt() }
if (decision.decision === 'timeout') { fall back to deny }

Two distinct evidence entries chain across the suspend gap:

  • oversight_suspend carries evidencePayloadHash and the awakeable id.
  • oversight_resume carries humanInputEntryHash plus the reviewer's signature.

A modify decision creates a chain fork — downstream tool calls reference the modify event as parent, not the model's original args. Without this the chain would misrepresent what executed.

The dashboard resolves a pending oversight via:

ts
import { resolveOversight } from '@fuze-ai/agent'

await resolveOversight(adapter, {
  awakeableId,
  decision: 'modify',
  modifiedArgs: { subject: 'reviewer-edited', body: '...' },
  reviewerId: 'reviewer-42',
  reviewerSignature: ed25519DetachedSignature,
})

Drift refusal on resume

A run paused for review at 9am Monday and resumed Thursday may face model snapshot drift: OpenAI rotates system_fingerprint routinely; customers bump gpt-4o-2024-08-06 to gpt-4o-2024-11-20. Fuze's resumeRun enforces:

  • Snapshot drift + producesArt22Decision: true → refuse with ModelDriftAtResumeError. Reviewer can pass allowModelDrift: true to override after explicit acknowledgment (which itself becomes evidence).
  • Snapshot drift + non-Art22 → warn, record drift in evidence, proceed.
  • System-fingerprint-only drift → never blocks. Fingerprints rotate on infra changes, not model changes.

Soft-cancel on operator stop

Tools declare their I/O profile via softCancelTimeoutMs (default 10s):

ts
defineTool.personal({
  name: 'send_email',
  // ...
  softCancelTimeoutMs: 30000,  // wait up to 30s for SMTP I/O
})

defineTool.public({
  name: 'search_docs',
  // ...
  softCancelTimeoutMs: 2000,
})

When the operator hits stop, the in-flight tool gets up to its declared grace period. After that it's hard-killed with a cancellation_truncated evidence row noting the truncation. A separate emergency-abort path skips the grace period entirely; that decision is also evidenced.

What ships in @fuze-ai/agent today

  • defineAgent, defineAgentRole, fromMarkdown (+ fromMarkdown.dir)
  • runAgent, resumeRun, ModelDriftAtResumeError
  • PlanState, buildPlanTools, synthesizeDispatchTool, buildDispatchTools
  • requestOversight, resolveOversight, InMemoryDurableAdapter
  • Type exports: AgentRoleDefinition, DispatchResult<T>, AgentErrorCategory, PlanEvent, PlanStep, ReplayMode, OversightDecision, full ledger entry types

What's deferred

  • Real Restate adapter — the DurableExecutionAdapter interface is stable; the production package binds to restate.send / ctx.awakeable.
  • run.replay() execution — types are in place (ReplayMode, ReplayResult, drift shapes). Runtime needs a tool-output cache substrate.
  • OTel/OpenInference exporter — adapter, not contract; can ship anytime.
  • Four-eyes mode for biometric Annex III §1(a) — until a design partner needs it.