# Fuze — Full documentation Generated 2026-05-22T14:53:17.990Z. Site: https://fuze-ai.tech --- # Agent --- ## Audit walkthrough Source: https://fuze-ai.tech/docs/agent/guides/audit/ # Audit walkthrough Walk an external auditor through inspecting a production agent run. The Fuze CLI (`@fuze-ai/agent-cli`) is the system of record; the dashboard exposes the same data. ## Query by subject Find every run that touched a given subject's data: ```bash fuze-agent audit query \ --subject "user:alice@example.org" \ --since "2026-01-01" \ --tenant my-tenant ``` The output lists `runId`, `agentPurpose`, `lawfulBasis`, `evidenceHashChainHead`, and `signedRunRoot`. Subject lookup uses the indexed `subjectRef` field on every span where the tool's classification is non-public. ## Replay a run Replay reconstructs the run deterministically from the evidence stream: ```bash fuze-agent audit replay --run --out ./replay.json ``` This rehydrates the spans, re-orders them by `sequence`, and writes a JSON trace. The replay does not re-execute the model, it shows the recorded inputs, outputs, decisions, and timings. ## Verify the hash chain ```bash fuze-agent audit verify --run ``` This walks every record in the run, checks each `prevHash` against the canonical hash of the previous record (RFC 8785 JSON canonicalization, then SHA-256), and confirms the final hash matches `evidenceHashChainHead`. Exit code 0 if valid, 1 otherwise. In code: ```ts import { verifyChain } from '@fuze-ai/agent' const ok = verifyChain(records) // boolean ``` ## Verify the signed run-root The run-root is signed by `@fuze-ai/agent-signing`. Verify the signature against the agent's public key: ```bash fuze-agent audit verify-signature \ --run \ --pubkey ./agent.pub ``` The signer is pluggable, `LocalKeySigner` for Dev/Cloud, KMS-backed signers via `@fuze-ai/agent-signing-kms` for Sovereign. ## Verify the transparency log proof ```bash fuze-agent audit verify-anchor --run ``` Each signed run-root is anchored to a transparency log (Sigstore-style). The CLI fetches the inclusion proof and confirms the run-root is in the log under the witnessed checkpoint. ## Export the evidence bundle ```bash fuze-agent audit export --run --out ./bundle.zip ``` The bundle contains: the canonical record stream, the run-root, the run-root signature, the transparency log proof, the agent definition snapshot, the policy bundle hash, and the model fingerprint. This is the deliverable for AI Act Art. 26 deployer obligations. ## Dashboard equivalent The dashboard exposes the same operations through a UI. Filtering, replay, and chain-verification status are first-class. --- ## Compliance mapping Source: https://fuze-ai.tech/docs/agent/guides/compliance/ # Compliance mapping Map regulatory obligations to Fuze Agent fields, code paths, and span attributes. ## GDPR ### Art. 6, Lawfulness of processing Each agent declares a `lawfulBasis` on `defineAgent`. Each tool's `retention` carries the set of lawful bases under which records may be retained. At run start, the loop refuses to proceed if the agent's lawful basis is not in the retention's allowed set. This is fail-stop. ```ts defineAgent({ lawfulBasis: 'consent', // Art. 6(1)(a) // ... }) ``` Allowed values map to Art. 6(1) bases: `consent`, `contract`, `legal_obligation`, `vital_interests`, `public_task`, `legitimate_interests`. ### Art. 9, Processing of special categories Tools handling special-category data must be defined with `defineTool.specialCategory` and supply `art9Basis`. The compiler rejects `defineTool.public` for these payloads. Allowed values map to Art. 9(2)(a)-(j). ### Art. 12 / 15, Information and access The evidence pipeline records every span with `subjectRef` when the data classification is non-public. The `@fuze-ai/agent-cli` exposes `query --subject ` which lists every run that touched the subject's data, sourced from the hash-chained log. ### Art. 17, Right to erasure The evidence backend honors `cascade(subjectRef)`, which walks the chain by subject reference and removes the full-content payload while retaining the hash + decision record. `fullContentTtlDays`, `hashTtlDays`, and `decisionTtlDays` are independent retention windows for this reason. ### Art. 33 / 34, Personal data breach notification See the breach-notification workflow in the [Sovereign tier guide](/docs/agent/guides/sovereign). ### Cross-border transfers `FuzeModel.residency` is one of `'eu' | 'eea' | 'us' | 'other'`. The loop refuses to run if a tool requires `egressDomains: 'none'` but the model residency is non-EU, or if a tool's retention forbids the model's residency. This prevents silent third-country transfer. ## EU AI Act ### Art. 6 / Annex III, High-risk classification `defineAgent` requires `annexIIIDomain`. Allowed values cover the eight Annex III domains plus `'none'`. Setting any non-`'none'` value triggers the AI Act high-risk obligations below. ### Art. 14, Human oversight `producesArt22Decision: true` or any non-`'none'` `annexIIIDomain` requires the HITL primitive on at least one tool. The loop's suspend/resume branch produces a `human.oversight.decision` span recording the overseer's principal ID, decision, and timestamp. See `evaluateApproval` in `@fuze-ai/agent`. ### Art. 22 GDPR + AI Act, Solely automated decisions `producesArt22Decision: true` enforces that an oversight tool is present and that no run can complete without an `evaluateApproval` span. Without it, the loop halts with `fuze.run.missing_oversight=true`. ### Art. 26, Deployer obligations The evidence bundle exported by `@fuze-ai/agent-cli export` packages: the run-root, the hash-chained record stream, the model fingerprint, the policy bundle hash, the signing public key, and the transparency log proof. This is the deployer's "logs and traceability" deliverable. ### Art. 33 / 34, Serious incident reporting Any run that halts with `fuze.policy.engine_error=true`, `fuze.guardrail.hard_block=true`, or an unhandled provider error is flagged in the dashboard's incident view. ## Annex IV, Technical documentation `@fuze-ai/agent-annex-iv` generates the Annex IV technical file from the agent definition: purpose, intended deployment, training/data sources, evaluation, oversight measures. ## Span attributes (audit-grade signals) | Attribute | Set by | Meaning | |---|---|---| | `fuze.policy.engine_error` | Cerbos gate | Engine error; run is halted (fail-stop). | | `fuze.policy.decision` | Cerbos gate | `allow` / `deny` / `error`. | | `fuze.guardrail.phase` | Guardrail runner | `input` / `toolResult` / `output`. | | `fuze.guardrail.hard_block` | Guardrail runner | Run halted by a guardrail. | | `fuze.run.lawful_basis` | Loop start | The declared basis at run start. | | `fuze.tool.classification` | Tool dispatch | `public` / `personal` / `special_category` / `confidential`. | | `fuze.model.residency` | Model dispatch | EU / EEA / US / other. | | `fuze.run.missing_oversight` | Loop end | Required oversight span never recorded. | | `human.oversight.decision` | HITL resume | Overseer principal, decision, timestamp. | --- ## Operations Source: https://fuze-ai.tech/docs/agent/guides/operations/ # Operations ## Deployment tiers Pick a tier by data residency requirements. The public surface is identical across tiers. ### Dev tier `evidenceSink` writes to an in-memory buffer. `StaticPolicyEngine` substitutes for Cerbos. No signing. Used for local iteration and unit tests. ```ts import { StaticPolicyEngine } from '@fuze-ai/agent' const records: unknown[] = [] const policy = new StaticPolicyEngine([ { id: 'allow.greet', toolName: 'greet', effect: 'allow' }, ]) ``` ### Cloud tier `evidenceSink` ships records to the Fuze cloud daemon over HTTPS. Cerbos runs embedded WASM via `@fuze-ai/agent-policy-cerbos`. Run-roots are signed via `@fuze-ai/agent-signing` (`LocalKeySigner`) and anchored to a transparency log. All ingest is hosted in the EU. ### Sovereign tier Customer-operated Kubernetes + Postgres + Cerbos + KMS, deployed via `@fuze-ai/agent-sovereign-terraform`. Signing keys come from the customer's KMS (`@fuze-ai/agent-signing-kms`). No data leaves the customer perimeter. See the [Sovereign tier guide](/docs/agent/guides/sovereign). ## Monitor The tracer emits OTel-shaped spans for every loop iteration, model call, tool execution, guardrail phase, and policy decision. Span names are stable and namespaced under `fuze.*`. | Span | Emitted by | Notable attributes | |---|---|---| | `fuze.run` | Loop entry | `fuze.run.id`, `fuze.run.lawful_basis`, `fuze.run.tenant` | | `fuze.model` | Model dispatch | `fuze.model.residency`, `fuze.model.tokens_in/out` | | `fuze.tool` | Tool dispatch | `fuze.tool.classification`, `fuze.tool.name` | | `fuze.policy` | Cerbos gate | `fuze.policy.decision`, `fuze.policy.engine_error` | | `fuze.guardrail` | Guardrail runner | `fuze.guardrail.phase`, `fuze.guardrail.hard_block` | | `fuze.evidence.append` | Hash-chain emitter | `fuze.evidence.seq`, `fuze.evidence.head` | ## Scale The loop is single-process per run. Separate runs share nothing through `Ctx`. Suspend/resume goes through `@fuze-ai/agent-suspend-store` (SQLite locally; Postgres in production). Provider rate limits are absorbed by `maxRetries: 0` at the provider plus the loop's own retry budget. ## Troubleshoot ### Engine error halts the run ``` fuze.policy.engine_error=true ``` The Cerbos engine threw or returned malformed output. The loop is fail-stop on this signal; there is no allow-on-error path. Check Cerbos pod logs, then the `policy-bundle` hash referenced in the run's evidence bundle. ### Lawful-basis mismatch at run start ``` LawfulBasisMismatch: agent declared 'legitimate_interests' but tool 'lookup_user' retention 'pii.v2' permits ['consent','contract'] ``` Either change the agent's `lawfulBasis` or remove the tool from the agent. ### Missing oversight ``` fuze.run.missing_oversight=true ``` `producesArt22Decision: true` or a non-`'none'` `annexIIIDomain` requires an oversight tool path that records `evaluateApproval`. Add the HITL primitive, see [HITL tutorial](/docs/agent/tutorial/02-hitl). ### Hash chain verification fails `verifyChain(records)` returned `false`. Records are out of order, a record was dropped, or a byte was flipped. Re-fetch the record stream from the canonical sink. ## Upgrade Patch versions are drop-in. Minor versions: read the CHANGELOG. Major versions: see [v0 to v1 migration](/docs/agent/migration/v0-to-v1). --- ## EU Sovereign tier Source: https://fuze-ai.tech/docs/agent/guides/sovereign/ # EU Sovereign tier Deploy Fuze Agent entirely inside the customer's EU perimeter. The operator runs the full stack; Fuze ships Terraform modules, signing wiring, and a verification command. ## What gets deployed - Agent runtime pods (`@fuze-ai/agent` + `@fuze-ai/agent-api-server`) - Cerbos policy engine pods (`@fuze-ai/agent-policy-cerbos`) - Postgres (suspend store, evidence index, signed-root ledger) - Object storage (raw evidence stream, retention-tiered) - KMS integration (signing keys; never extracted) - Transparency log (private witness or public Sigstore-style) - EU model providers (Mistral, Aleph Alpha, OpenAI EU residency) ## Terraform setup ```bash git clone https://github.com/fuze-ai/fuze cd fuze/packages/agent-sovereign-terraform/examples/aws-eu-west-1 terraform init terraform apply -var-file=./customer.tfvars ``` The module provisions: a VPC with no internet egress on the agent subnet, an EKS cluster, IAM roles bound to the KMS signing key with deny-on-export, an RDS Postgres with at-rest encryption, an S3 bucket with object lock for the evidence stream, and a private CloudHSM-backed transparency log witness. The default region set is `eu-west-1`, `eu-central-1`, `eu-north-1`. The Terraform refuses to apply outside the EU, check `validation.tf`. ## KMS bootstrap ```bash fuze-agent kms bootstrap \ --provider aws \ --key-arn arn:aws:kms:eu-west-1:123456789012:key/abcd-1234 \ --output ./agent.pub ``` This creates the signer binding, registers the public key, and writes the public half to disk. The private key never leaves KMS. The `LocalKeySigner` is replaced by `@fuze-ai/agent-signing-kms` `KmsSigner` at this point. ## EU model providers Configure model residency to `'eu'`: ```ts import { mistralModel } from '@fuze-ai/agent-providers/mistral' const model = mistralModel({ apiKey: process.env.MISTRAL_API_KEY!, residency: 'eu', endpoint: 'https://api.mistral.ai', }) ``` The loop refuses to dispatch to a non-EU model when any tool's `egressDomains` is `'none'` or when `annexIIIDomain` is non-`'none'`. ## Breach-notification workflow GDPR Art. 33 requires notification to the supervisory authority within 72 hours. The sovereign tier emits a `fuze.incident` span on: - `fuze.policy.engine_error=true` (potential systemic failure) - An export of full-content evidence outside the configured residency - A failed signature verification on a run-root from the ledger - An anomalous spike in `evaluateApproval` denials Each `fuze.incident` triggers the configured notification handler: ```yaml # fuze.sovereign.yaml incident: notification: handler: webhook url: https://internal.example.org/dpo/incident timeout_seconds: 10 retry: { attempts: 3, backoff_seconds: 30 } ticket: system: jira project: DPO ``` The DPO confirms the incident, classifies severity, and the dashboard records the Art. 33 timer. ## Verification The sovereign module ships with `fuze-agent verify-deployment`, which checks: KMS deny-on-export is set, RDS encryption is on, S3 object lock is enabled, the transparency log witness is reachable, and the agent runtime pods cannot reach `0.0.0.0/0`. ```bash fuze-agent verify-deployment --config ./fuze.sovereign.yaml ``` Exit code 0 means the deployment matches the documented invariants. This command is the basis of the Art. 26 deployer self-assessment. --- ## Fuze Agent Source: https://fuze-ai.tech/docs/agent/ # Fuze Agent Fuze Agent is a TypeScript framework for building AI agents that produce EU AI Act and GDPR-grade runtime evidence. Compliance fields are type invariants on every tool, a tool that handles special-category data without an `art9Basis` field will not compile. Every model call, tool execution, and policy decision is recorded as a span in a hash-chained audit log whose head is the run-root. The loop is non-bypassable: tools cannot call siblings directly, providers run with `maxRetries: 0`, and a policy engine error halts the run with `fuze.policy.engine_error=true`. ## Install ```bash npm install @fuze-ai/agent zod ``` ## Run an agent ```ts import { z } from 'zod' import { defineAgent, defineTool, inMemorySecrets, runAgent, StaticPolicyEngine, verifyChain, makeTenantId, makePrincipalId, Ok, type ThreatBoundary, } from '@fuze-ai/agent' const threatBoundary: ThreatBoundary = { trustedCallers: ['agent-loop'], observesSecrets: false, egressDomains: 'none', readsFilesystem: false, writesFilesystem: false, } const greet = defineTool.public({ name: 'greet', description: 'returns a greeting for the given name', input: z.object({ name: z.string() }), output: z.object({ greeting: z.string() }), threatBoundary, retention: { id: 'demo.v1', hashTtlDays: 30, fullContentTtlDays: 7, decisionTtlDays: 90 }, run: async (input) => Ok({ greeting: `hello, ${input.name}` }), }) const agent = defineAgent({ purpose: 'demo-greeter', lawfulBasis: 'consent', annexIIIDomain: 'none', producesArt22Decision: false, model: yourFuzeModel, // replace with a model from @fuze-ai/agent-providers tools: [greet], output: z.object({ final: z.string() }), maxSteps: 5, retryBudget: 0, deps: {}, }) const records: any[] = [] const policy = new StaticPolicyEngine([ { id: 'allow.greet', toolName: 'greet', effect: 'allow' }, ]) const result = await runAgent( { definition: agent, policy, evidenceSink: (r) => records.push(r) }, { tenant: makeTenantId('demo-tenant'), principal: makePrincipalId('demo-user'), secrets: inMemorySecrets({}), userMessage: 'please greet world', }, ) console.log({ status: result.status, hashChainHead: result.evidenceHashChainHead, hashChainValid: verifyChain(records), }) ``` A complete runnable version lives at `examples/typescript/agent-hello-world/index.ts`. ## Next steps - [Quickstart](/docs/agent/quickstart), ship a working agent in five minutes. - [How it works](/docs/agent/reference/how-it-works), the loop, evidence pipeline, and policy gate explained. - [Planning, dispatch, and oversight](/docs/agent/reference/planning-and-dispatch), the auto-injected plan tool, capability envelopes for sub-agents, and Article 14 suspend/resume. - [First agent tutorial](/docs/agent/tutorial/01-first-agent), build an agent end-to-end with verified evidence. - [Architecture](/docs/agent/reference/architecture), primitives, evidence pipeline, deployment tiers. --- ## Quickstart Source: https://fuze-ai.tech/docs/agent/quickstart/ Ship a Fuze Agent in five minutes. Sensible defaults, full compliance evidence, no ceremony. # Quickstart Ship a working agent in five minutes. The defaults are safe, every model call, tool execution, and policy decision still goes through the hash-chained evidence pipeline. Swap in production policies and retention when you're ready; the API doesn't change. ## 1. Install ```bash npm install @fuze-ai/agent @fuze-ai/agent-providers zod ``` ## 2. Get a Mistral key Sign in at [console.mistral.ai](https://console.mistral.ai), create an API key, and put it in `MISTRAL_API_KEY`. The free tier is enough to follow this page. ## 3. Write the agent Save as `agent.ts`: ```ts import { z } from 'zod' import { quickAgent, quickTool } from '@fuze-ai/agent/quickstart' import { mistralModel } from '@fuze-ai/agent-providers' const wordCount = quickTool({ name: 'word_count', description: 'count words in a string', input: z.object({ text: z.string() }), output: z.object({ count: z.number() }), run: ({ text }) => ({ count: text.trim().split(/\s+/).filter(Boolean).length }), }) const agent = quickAgent({ model: mistralModel({ apiKey: process.env.MISTRAL_API_KEY ?? '' }), tools: [wordCount], }) const result = await agent.run('How many words are in "the quick brown fox"?') console.log(result.status, result.output) console.log('evidence spans:', agent.records().length) ``` ## 4. Run it ```bash MISTRAL_API_KEY=sk-... node --experimental-strip-types agent.ts ``` You should see the run status, the structured output, and the number of evidence spans recorded. ## 5. What just happened - **Hash-chained evidence**: each model call, tool execution, and policy decision was recorded as a span. `agent.records()` returns the full chain, every record's `prevHash` matches the previous record's `hash`. - **Policy gate ran**: the default `quickstart` configuration uses an allow-all `StaticPolicyEngine`. You'll see a one-time warning on stderr reminding you to define a real policy before production. - **Same evidence shape as production**: the only difference between quickstart and a hand-wired agent is who picks the defaults. Tools are classified `public`, retention is the short `fuze.quickstart.v1` policy, lawful basis is `consent`. Switch any of these by dropping back to `defineAgent` / `defineTool`. ## Next steps - [How it works](/docs/agent/reference/how-it-works), the loop, evidence pipeline, and policy gate explained. - [First agent tutorial](/docs/agent/tutorial/01-first-agent), define your own tool and verify the chain. - [API reference](/docs/agent/reference/api), the full `defineAgent` / `defineTool` surface for production use. --- ## API reference Source: https://fuze-ai.tech/docs/agent/reference/api/ # API reference Generate the full TypeDoc reference: ```bash npm run docs:typedoc ``` This runs `typedoc --plugin typedoc-plugin-markdown` against `packages/agent/src/index.ts` and writes markdown into `docs/reference/api/`. The output mirrors the export shape of `@fuze-ai/agent`; the canonical listing of public symbols is in the package's `src/index.ts` and the matching `.d.ts`. ## Public surface (high-level) Top-level factories and runners: - `defineTool` (`.public`, `.personal`, `.specialCategory`, `.confidential`) - `defineAgent` - `defineAgentRole` — capability envelope for sub-agent dispatch - `fromMarkdown` (+ `fromMarkdown.dir`) — co-locate instructions/context with code, hashed at module load - `runAgent` - `resumeRun` (and `ModelDriftAtResumeError`) - `verifyChain` - `Ok`, `Err`, `Suspend` - `inMemorySecrets` - `makeTenantId`, `makePrincipalId` - `StaticPolicyEngine` Planning + dispatch + oversight: - `PlanState`, `buildPlanTools`, `PLAN_TOOL_NAMES` - `synthesizeDispatchTool`, `synthesizeDispatchTools`, `buildDispatchTools`, `dispatchManifestHash` - `requestOversight`, `resolveOversight`, `InMemoryDurableAdapter` - Helpers: `NEVER_RETRY_CATEGORIES`, `isRetriableCategory`, `planStepStatusForFailure` Types: - `FuzeTool`, `FuzeAgent`, `FuzeModel`, `ModelStep`, `Ctx` - `ThreatBoundary`, `Retention`, `Result` - `Art9Basis`, `LawfulBasis`, `AnnexIIIDomain` - `EvidenceRecord` - `AgentRoleDefinition`, `DefineAgentRoleInput`, `RetryPolicy`, `OutputViews` - `DispatchResult`, `AgentRunFailure`, `AgentErrorCategory`, `FailureAttribution` - `PlanStep`, `PlanStepStatus`, `PlanStepLifecycle`, `PlanVersion`, `PlanEvent`, `PlanCommitInput`, `PlanStepUpdateInput`, `PlanReviseInput`, `LinkageSource`, `PlanRequirement`, `PlanningConfig` - `LedgerEntry` and variants (`ToolCallLedgerEntry`, `ModelCallLedgerEntry`, `HumanInputLedgerEntry`, `Dispatch{Committed,Completed}LedgerEntry`, `Oversight{Suspend,Resume}LedgerEntry`, `ExpectedDeterminism`) - `ReplayMode`, `ReplayResult`, `ReplayInput`, `DeterminismVerdict`, `ToolCallDrift`, `ModelCallDrift`, `PlanDrift`, `OutputDrift` - `DurableExecutionAdapter`, `OversightRequest`, `OversightDecision`, `OversightDecisionKind`, `OversightReason`, `ReviewerSignature` Sibling packages: - `@fuze-ai/agent-policy-cerbos`, production policy engine - `@fuze-ai/agent-mcp`, MCP host with admission policy - `@fuze-ai/agent-mcp-server`, expose Fuze tools as MCP server - `@fuze-ai/agent-sandbox-justbash`, local sandbox adapter - `@fuze-ai/agent-sandbox-e2b`, managed sandbox adapter - `@fuze-ai/agent-tools`, first-party tools (bash, fetch, read/write file) - `@fuze-ai/agent-signing`, Ed25519 signer interface + LocalKeySigner - `@fuze-ai/agent-signing-kms`, KMS-backed signer - `@fuze-ai/agent-suspend-store`, SQLite-backed suspend store - `@fuze-ai/agent-transparency`, transparency log anchor + verify - `@fuze-ai/agent-providers`, Mistral, Aleph Alpha, OpenAI EU - `@fuze-ai/agent-guardrails`, input/toolResult/output guardrail set - `@fuze-ai/agent-redaction`, pre-export secret redactor - `@fuze-ai/agent-eval`, eval harness - `@fuze-ai/agent-cli`, CLI for query/replay/verify/export - `@fuze-ai/agent-api-server`, HTTP server for hosted ingest - `@fuze-ai/agent-annex-iv`, Annex IV technical-file generator - `@fuze-ai/agent-compliance`, compliance helpers - `@fuze-ai/agent-memory`, long-term agent memory - `@fuze-ai/agent-durable`, durable execution wrappers - `@fuze-ai/agent-legal-templates`, DPA, processor agreements - `@fuze-ai/agent-sovereign-terraform`, Terraform modules --- ## Architecture Source: https://fuze-ai.tech/docs/agent/reference/architecture/ # Architecture Six primitives, an evidence pipeline, three deployment tiers. This page is the canonical description, if a behavior contradicts it, the behavior is the bug. ## Six primitives ```mermaid flowchart LR Tool["Tool
compile-time invariants"] --> Agent["Agent
purpose + basis + tools"] Agent --> Loop["Loop
owns retries · maxRetries=0"] Loop --> Guard["Guardrails
input · toolResult · output"] Guard --> Policy["Policy
Cerbos · fail-stop"] Policy --> Evidence["Evidence
hash chain · RFC 8785 canonical"] ``` ### Tool Discriminated union over `dataClassification`: `'public' | 'personal' | 'special_category' | 'confidential'`. Each variant has different required fields. The compiler refuses constructions that omit `art9Basis` for special-category, `subjectRef` policy in `Ctx`, or `retention`. ### Agent `defineAgent({ purpose, lawfulBasis, annexIIIDomain, producesArt22Decision, model, tools, output, maxSteps, retryBudget, deps })`. The lawful basis is checked against each tool's retention at run start. ### Loop `runAgent` owns the model dispatch, tool dispatch, retry counter, and the suspend/resume branch. Providers are configured with `maxRetries: 0`; the loop alone decides retries against `maxSteps`. The loop is non-bypassable: there is no API for a tool to call a sibling except `ctx.invoke(name, input)`, which re-enters the loop. ### Evidence Every span goes through `EvidenceEmitter`. Records are RFC 8785 canonicalized, hashed (SHA-256), and chained with `prevHash`. The chain head is the run-root. `verifyChain` walks the chain to confirm no record was added, removed, reordered, or modified. ### Policy `PolicyEngine` is the interface; `CerbosPolicyEngine` is the production implementation. A policy decision is `allow | deny | error`. `error` halts the run with `fuze.policy.engine_error=true`. There is no fallback. ### Guardrails Three phases run around the model dispatch: `input`, `toolResult`, `output`. Guardrails get a restricted model handle, never the raw provider. A guardrail can hard-block (halt with `fuze.guardrail.hard_block=true`) or annotate (record a decision and continue). ## Runtime topology ```mermaid flowchart TD subgraph Customer["Customer process (Node.js)"] direction TB L["@fuze-ai/agent loop
maxRetries=0 · non-bypassable"] P{"Policy gate
(Cerbos)"} T["Tool execute
per dispatch"] E["Evidence emitter
RFC 8785 + redact"] R["ChainedRecord stream"] L --> P P -- "deny / engine_error" --> Halt(["halt"]) P -- "allow" --> T T --> E E --> R end subgraph Sandbox["Sandbox tiers"] direction TB S1["in-process (bash)"] S2["vm-managed (E2B)"] S3["vm-self-hosted (EU)"] end T -.dispatch.-> Sandbox R --> Sink["sink · sign · anchor"] ``` ## Evidence pipeline ```mermaid flowchart TD DT["defineTool"] --> RA["runAgent (loop)"] DA["defineAgent"] --> RA RA --> EE["EvidenceEmitter"] EE --> Records["records"] EE --> Chain["hashChain"] Records --> Sink["evidenceSink"] Chain --> Head["evidenceHashChainHead"] Head --> Sign["Signer"] Sign --> TL["TransparencyLog"] TL --> Bundle["Bundle export"] ``` Every primitive is a discrete recordable event. The pipeline guarantees: - monotonic sequence - canonical serialization (RFC 8785) - forward-linked hash chain - pluggable sink (in-memory, daemon, sovereign object store) - pluggable signer (`LocalKeySigner`, `KmsSigner`) - pluggable transparency log ## Deployment tiers | Tier | Policy engine | Evidence sink | Signer | Transparency | |---|---|---|---|---| | Dev | `StaticPolicyEngine` | in-memory | none | none | | Cloud | Cerbos WASM | EU-hosted daemon | `LocalKeySigner` | EU public log | | Sovereign | Cerbos pod | customer S3 + Postgres | `KmsSigner` | customer-witnessed log | The public surface is identical across tiers. A run authored against the Dev tier ports to Sovereign by swapping the engine, sink, and signer. No code changes inside `defineAgent` or `defineTool`. ## Invariants - Tools never receive sibling tools. - Tools cannot call models. - Guardrails get a restricted model handle. - Providers run with `maxRetries: 0`. - `Ctx.secrets` returns opaque `SecretRef`. Plaintext never reaches evidence. - Cerbos engine error is fail-stop. - Lawful-basis mismatch refuses at run start. - `subjectRef` is required for non-public data classifications. - Annex III domain non-`'none'` requires an oversight tool. - Hash chain is non-bypassable. --- ## Glossary Source: https://fuze-ai.tech/docs/agent/reference/glossary/ # Glossary **Annex III**, The list of high-risk AI use cases in the EU AI Act. Domains include: biometrics, critical infrastructure, education, employment, essential services, law enforcement, migration, administration of justice. Encoded as the `annexIIIDomain` field on `defineAgent`. Non-`'none'` triggers AI Act Art. 14 oversight, Art. 26 deployer obligations, and Annex IV technical-file generation. **Art. 6 (GDPR)**, Lawfulness of processing. Six bases: consent, contract, legal obligation, vital interests, public task, legitimate interests. Encoded as `lawfulBasis` on `defineAgent`. Checked against each tool's retention at run start. **Art. 9 (GDPR)**, Special-category data. Health, biometric for identification, genetic, racial/ethnic origin, political opinions, religious beliefs, trade-union membership, sex life, sexual orientation. Encoded as `art9Basis` on tools defined with `defineTool.specialCategory`. **Art. 14 (AI Act)**, Human oversight requirement for high-risk systems. Implemented by the HITL primitive: a tool returns `Suspend(...)`, the loop persists state, an overseer approves/denies, the loop resumes. The `human.oversight.decision` span is the audit-grade record. **Art. 22 (GDPR)**, Solely automated decision-making with legal or similarly significant effects. Encoded as `producesArt22Decision: boolean` on `defineAgent`. Setting it true requires HITL. **Art. 26 (AI Act)**, Deployer obligations: maintain logs, document use, monitor for serious incidents. The evidence bundle export satisfies the logs/traceability requirement. **Bundle hash**, SHA-256 of the compiled Cerbos policy bundle (`bundle.wasm`). Included in the evidence export so the auditor can confirm the policy in force at run time. **Ctx**, Per-run request context passed to every tool. Carries `tenant`, `principal`, `subjectRef`, `secrets` (opaque `SecretRef`), `invoke`, the restricted model handle (for guardrails), the run id. **Discriminated union (FuzeTool)**, TypeScript pattern: a union of variants tagged by `dataClassification`. Each variant has a different required-field set. The compiler statically refuses constructions missing required fields. **Egress domains**, `'none'` (no network), `'eu'`, `'eea'`, allowlist of hostnames, or `'any'`. Field on `ThreatBoundary`. Combined with model residency to enforce Art. 6/9 cross-border rules. **Evidence record**, One element in the hash-chained log. Shape: `{ sequence, prevHash, payload, hash }`. Payload contains the span name, attributes, run id, and timestamp. **Fail-stop**, Behavior where a specific failure halts the run rather than continuing. Cerbos engine error is fail-stop. Hash-chain validation on resume is fail-stop. Lawful-basis mismatch at run start is fail-stop. **Fingerprint**, SHA-256 of an MCP server's identity material (binary, manifest, signing key). Used by the MCP host's admission policy to allowlist connections. **Hash chain**, Forward-linked hash stream over the evidence records. Each record's `prevHash` is the hash of the canonical serialization of the prior record. **HITL**, Human-in-the-loop. The Art. 14 primitive. `Suspend(...)`, `resumeAgent(...)`, `evaluateApproval`. **Lawful basis**, See Art. 6. **MCP**, Model Context Protocol. The Anthropic-led standard for tool-server interfaces. `@fuze-ai/agent-mcp` is the host; `@fuze-ai/agent-mcp-server` exposes Fuze tools as a server. **RFC 8785**, JSON Canonicalization Scheme. Deterministic JSON serialization. Required for the hash chain so byte-for-byte equality is preserved across implementations. **Run-root**, The final hash in the chain for a run. Equal to `result.evidenceHashChainHead`. Signed by the `Signer` and anchored to the transparency log. **Sovereign tier**, Customer-operated deployment with no Fuze-hosted control plane. See [the sovereign guide](/docs/agent/guides/sovereign). **Special category**, See Art. 9. **Subject reference**, Stable identifier for the natural person whose data is being processed. Field `subjectRef: string` on `Ctx`. Required for non-public classifications. Indexed in the evidence backend so Art. 15/17 queries can find every span by subject. **Suspend store**, `@fuze-ai/agent-suspend-store`. SQLite locally; Postgres in Sovereign. Holds run state across human-oversight pauses. **Threat boundary**, Per-tool declaration of capability surface: `trustedCallers`, `observesSecrets`, `egressDomains`, `readsFilesystem`, `writesFilesystem`. Drives both the policy engine input and the runtime sandbox configuration. **Transparency anchor**, The inclusion proof issued by the transparency log when a run-root is published. Lets an auditor verify the run-root was committed to a write-once log before any decision was acted upon. --- ## How it works Source: https://fuze-ai.tech/docs/agent/reference/how-it-works/ # How it works A technical walkthrough for engineers new to compliance-grade agent systems. Assumes familiarity with LLMs and TypeScript; introduces the parts most ML engineers haven't met yet, hash chains, transparency logs, KMS, EU AI Act mechanics, policy engines, sandbox tiers, and replay-protected HITL. If you read code, [`packages/agent/src/loop/loop.ts`](https://github.com/fuze-ai/fuze) is the authoritative answer for every claim here. ## 1. The big picture The 2024 EU AI Act and GDPR put concrete obligations on anyone running AI agents in the EU: - Art. 12, automatic logging for the lifetime of a high-risk system. - Art. 14, humans must be able to detect anomalies, interpret outputs, intervene, halt. - GDPR Art. 6 / 9, declare a lawful basis; special-category data has nine narrow gates. - GDPR Art. 13–22, answer "show me", "delete me", "explain" inside 30 days. Most frameworks treat this as someone-else's-problem. Fuze's wedge: **make compliance evidence a type-system invariant.** A tool that handles personal data cannot be defined without declaring its lawful basis, and a run cannot proceed if that basis is incompatible. ``` +------------------------------------------------------------+ | Customer process (Node.js) | | | | +----------------------+ | | | @fuze-ai/agent loop | | | +----------+-----------+ | | | | | v | | +----------------------+ deny / engine_error | | | Policy gate (Cerbos) +------------------> halt | | +----------+-----------+ | | | allow | | v | | +----------------------+ +--------------------------+ | | | Tool execute | | Sandbox tier | | | | (per dispatch) +-->| +--------------------+ | | | +----------+-----------+ | | vm-self-hosted (EU)| | | | | | +--------------------+ | | | v | | vm-managed (E2B) | | | | +----------------------+ | +--------------------+ | | | | Evidence emitter | | | in-process (bash) | | | | +----------+-----------+ | +--------------------+ | | | | +--------------------------+ | | v | | +----------------------+ | | | ChainedRecord stream |--> sink / sign / anchor | | +----------------------+ | +------------------------------------------------------------+ ``` ## 2. Two products Fuze ships two coupled products in one repo family: - **Fuze Compliance** (`fuze-ai` + `fuze-cloud-dashboard`), the safety SDK that wraps any agent framework: loop detection, budgets, side-effect tracking, hash-chained traces. - **Fuze Agent** (`@fuze-ai/agent` + 23 sibling packages), the opinionated framework with compliance baked in. They share a wire format. Fuze Agent emits trace events that Fuze Compliance ingests via the same hash-chain protocol the safety SDK already uses. This page is about Fuze Agent; the SDK has its own [docs section](/docs/introduction). ## 3. Runtime tiers Where the agent code actually runs: | Tier | Customer process | Fuze API | Where data lives | |---|---|---|---| | **Dev** | Anywhere | Local in-process | Local SQLite | | **Cloud** | Anywhere | Fuze-hosted | Fuze EU region (default non-Annex-III) | | **EU Sovereign** | Customer's EU infra | Self-hosted in customer's EU infra | Customer-owned; nothing leaves the perimeter | The public surface is identical across tiers. Switching tiers swaps the policy engine, sink, and signer, nothing inside `defineAgent` or `defineTool` changes. ## 4. The agent loop When you call `runAgent(deps, input)`: ``` 1. Validate definition compatibility (compile-time + runtime) - lawfulBasis ⊂ ⋂ tools.allowedLawfulBases - subjectRef present if any tool is non-public - annexIIIDomain != 'none' ⇒ art14OversightPlan required - model.residency compatible with tool residency [if any check fails, halt with status='error' before any spans] 2. Emit span: agent.invoke (genesis of hash chain) 3. Run input guardrails (PII / injection / residency) [tripwire ⇒ halt] 4. While stepsUsed < maxSteps: a. model.generate({messages, tools}) → model.generate span b. for each tool_call: - Cerbos.evaluate({tool, args, ctx}) → policy.evaluate span [deny ⇒ halt; requires-approval ⇒ suspend; engine error ⇒ fail-stop] - Tool.run(parsedInput, ctx) → tool.execute span [Result, loop owns retry, not the tool] - guardrail.toolResult → guardrail span c. Append assistant + tool messages; persist DurableRunSnapshot 5. Validate final output against zod schema 6. Run output guardrails 7. Return AgentRunResult { status, output, runId, evidenceHashChainHead } ``` A clean run produces ~8 spans. **Every path emits evidence.** There is no way to call a tool that bypasses the policy gate, no way to call a model that doesn't get token-counted. Tools never receive sibling tools, they get `ctx.invoke(name, input)`, which re-enters the pipeline. ## 5. The evidence pipeline We are not just *logging*. We produce records a third party can verify without trusting us. ### 5.1 Spans Every event is a span (OpenTelemetry GenAI conventions): `span` name, `role`, `runId`/`stepId`, `startedAt`/`endedAt`, `common` attributes (tenant, principal, lawful basis, Annex III domain, retention), `attrs`, plus `contentHash` and `contentRef`. The full payload is captured only when `captureFullContent: true`. ### 5.2 Hash chain Spans are linked into an append-only chain: ``` Span 0 (genesis) prevHash: 0x000...000 hash: H(canonical({sequence: 0, prevHash: 0x000..., payload: span0})) Span n prevHash: hash of span n-1 hash: H(canonical({sequence: n, prevHash: , payload: spanN})) ``` Structurally a blockchain without consensus: any byte change invalidates the chain from that point. `verifyChain([records])` recomputes every hash; if any linkage breaks, returns `false`. **Tamper-evidence is a math property, not a permission.** ### 5.3 Canonicalization (RFC 8785) `{"a":1,"b":2}` and `{"b":2,"a":1}` are the same logical object but hash differently. [RFC 8785 (JCS)](https://datatracker.ietf.org/doc/html/rfc8785) is the byte-exact JSON serialization standard: keys sorted, no whitespace, integers without `.0`, control characters escaped, undefined dropped, no trailing newline. About 50 lines in `packages/agent/src/evidence/canonical.ts`, property-tested with [fast-check](https://fast-check.dev) over 200 random JSON values × shuffled keys. ### 5.4 Redaction Before any payload reaches the chain it goes through redaction: - Pattern-based: emails, phones, IBANs (mod-97), credit cards (Luhn), SSNs, IPs, JWTs, OAuth, API keys (`sk-…`, AWS, GitHub, Slack, etc.). - Structural: walks nested objects; `SecretRef` becomes `<>`. - Optional ML: Microsoft Presidio sidecar via JSON-RPC. The hash is over the **canonical, redacted** form. The original payload exists only in memory; what is stored, transmitted, and chained is already redacted. ### 5.5 Run-root signing (Ed25519, customer-managed) At end of run (or at suspend, for HITL): ``` runRoot = Ed25519.sign(privKey, chainHead || runId || nonce) ``` The signing key is the **customer's**, held in their KMS (AWS/GCP/Azure/Vault). Fuze never sees the private key, we call `kms.sign(key, payload)` and get a signature. A compromised Fuze deployment cannot forge audit records. ### 5.6 Transparency log Run-roots are anchored to an append-only public log. Two adapters: `SqliteTransparencyLog` (self-hosted Merkle, default for sovereign) and `RekorTransparencyLog` (Sigstore Rekor, opt-in). The log returns a Merkle inclusion proof so anyone can verify "this run-root was in the log at this position" without the full log. ``` Root / \ N01 N23 / \ / \ L0 L1 L2 L3 ← leaves (run-roots) inclusion proof for L0 = [L1, N23] ``` This is what lets the customer prove a run happened before time T, without a transparency log, the auditor must trust your timestamps. ## 6. HITL, the human-oversight primitive Art. 14 needs more than an approve button: the human can **see** state up to the suspend point, **decide** with rationale (which becomes evidence), and the decision is **non-replayable**. ### 6.1 Suspend When a tool hits `effect: requires-approval`, the loop: 1. Records the suspended state (tool, args, current chain head). 2. Mints a resume token: ``` token = { runId, suspendedAtSequence, chainHeadAtSuspend, nonce: random(16 bytes), signature: Ed25519.sign(customerKey, runId || sequence || chainHead || nonce), publicKeyId } ``` 3. Persists the SuspendedRun (durable snapshot survives a restart). 4. Returns to caller with `status: 'suspended'`. ### 6.2 Resume ``` overseer reviews evidence panel → submits decision → resumeRun() 1. Verify resume token signature (with customer's public key) 2. Check definitionFingerprint (refuse if agent definition drifted) 3. Consume the nonce (replay attempt → ResumeTokenReplayError) 4. Emit oversight.decision span (action, rationale, overseerId, trainingId) 5. Continue or halt ``` Nonces matter: an approved token is otherwise a permanent ticket. The fingerprint check closes the "ship innocuous tool, get approved, redefine before approve hits" attack. ## 7. Compliance type system `FuzeTool` is a **discriminated union**: ```ts type FuzeTool = | PublicTool // 'public' | PersonalTool // 'personal' | 'business' | SpecialCategoryTool // 'special-category' ``` - `PublicTool`, no extra requirements. - `PersonalTool`, `allowedLawfulBases` and `residencyRequired` are required by the type. - `SpecialCategoryTool`, also requires `art9Basis` (one of nine Art. 9(2) gates) and forces `residencyRequired: 'eu'`. ```ts defineTool.specialCategory({ name: 'lookupHealthRecord', // ❌ TS error: Property 'art9Basis' is missing ... }) ``` The compiler refuses the bad shape. Not a lint warning, the code doesn't compile. The framework is six primitives: `Tool`, `Model`, `Agent`, `Memory`, `Guardrail`, `Tracer`. Anything else is composition. The `Ctx` passed to tools exposes only `tenant`, `principal`, `runId`, `stepId`, `subjectRef`, `deps` (frozen), `secrets` (opaque refs), `attribute(k, v)`, and `invoke(name, input)`. No tracer access, no raw secrets, no sibling tools. Bypass tests with `// @ts-expect-error` prove the bad shapes don't typecheck. ## 8. Policy gating with Cerbos [Cerbos](https://cerbos.dev) is open-source. Embedded WASM mode: YAML+CEL policies compile to a bundle that evaluates in-process in ~100µs. ```yaml apiVersion: api.cerbos.dev/v1 resourcePolicy: resource: transfer_funds rules: - actions: ["invoke"] effect: EFFECT_REQUIRES_APPROVAL condition: match: { expr: R.attr.amount > 1000 } - actions: ["invoke"] effect: EFFECT_ALLOW condition: match: { expr: R.attr.amount <= 1000 && P.attr.role == "operator" } ``` [CEL](https://github.com/google/cel-spec) is deliberately not Turing-complete: compare, arithmetic, list membership; no loops, recursion, function calls. Policies always terminate in tiny constant time. Two reasons over if-statements: compliance officers can review YAML (not TS control flow), and policies survive code refactors. **Fail-stop:** a policy engine error halts the run with `engine_error=true`. There is no `--allow-on-engine-error` runtime flag (a build-time dev flag exists; production builds disable it). A security review flagged this Critical-1. ## 9. Sandbox tiers | Threat | Defense | |---|---| | Tool args contain a payload that, if eval'd, owns the host | Run in a sandbox | | Tool fetches a URL that returns a billion bytes | Sandbox enforces output cap | | Tool reads `/etc/passwd` | Sandbox has its own FS; host FS not mounted | | Tool exfiltrates via DNS | Sandbox egress allowlist | | Multi-tenant: tool A reads tool B's secrets | Per-tenant sandbox process | Three tiers: - **In-process (`just-bash`)**, TypeScript bash interpreter with virtual FS. No `child_process`. Usable only with the `TrustedInputOnly` brand and a single-tenant deployment (a watchdog refuses if a second tenant ID appears within an hour). - **vm-managed (E2B Cloud)**, each sandbox in a Firecracker microVM. ~150ms cold, <30ms from a paused snapshot. Default for Cloud tier. Caveat: managed cloud is US-region by default. - **vm-self-hosted (Sovereign)**, E2B is Apache-2.0; the Sovereign tier runs it on customer's Hetzner / Scaleway / OVHcloud / AWS-Frankfurt. CIS-benchmark Packer image, pinned kernel, WireGuard mesh, mTLS-only control plane, deny-all-inbound default firewall. We ship the Terraform. Tier is recorded in every `tool.execute` span: `fuze.sandbox.tier: 'in-process' | 'vm-managed' | 'vm-self-hosted'`. ## 10. MCP, sharing tools across the ecosystem [MCP](https://modelcontextprotocol.io) is Anthropic's open standard for "agents talk to tool servers over JSON-RPC". Fuze Agent is both host and server. **As a host:** `@fuze-ai/agent-mcp` wraps `@modelcontextprotocol/sdk` Client. Every `tools/call` is intercepted by `RecordingTransport` and emitted as evidence. Server fingerprints are pinned at admission; rotation without re-approval throws `FingerprintMismatchError`. Tools discovered from MCP servers go through `unverifiedTool()`, which **requires the operator to supply Fuze metadata** (classification, lawful basis allowlist, retention) before the tool can be called, otherwise it defaults to `special-category` and Cerbos default-denies. **As a server:** `serveFuzeAgent({tools, policy, transport})` exposes Fuze tools as MCP. Inbound `tools/call` gets the same evidence pipeline. Special-category tools are refused unless `allowSpecialCategory: true`. Fuze tools become usable from Claude Desktop, Cursor, Cline with audit trail intact. ## 11. EU AI Act mapping | Article | Requirement | Fuze mechanism | |---|---|---| | **Art. 9** | Risk management for high-risk | `annexIIIDomain` field forces declaration; non-`'none'` requires `art14OversightPlan` | | **Art. 12** | Automatic logging | Every span hash-chained, signed, anchored. Logs include responsible person (`fuze.principal.id`), retention (`fuze.retention.policy_id`) | | **Art. 13** | Transparency to deployers | `definitionFingerprint` lets deployers verify the agent didn't drift | | **Art. 14** | Human oversight | HITL with replay-protected tokens; decision rationale, overseer ID, training reference all captured | | **Art. 22** (also GDPR) | Solely-automated decisions | `producesArt22Decision: boolean` flag forces approval gate | | **Art. 26** | Deployer obligations | DPA + sub-processor manifest + TIA in `@fuze-ai/agent-legal-templates` | | **Art. 33/34** (GDPR) | Breach notification 72h | Incident-event generator produces Art. 33 + Art. 34 packets | | **Art. 73** | Serious incident reporting | Same machinery; `IncidentEvent` schema flags severity | Annex III is a finite enum: `employment | credit | education | essential-services | law-enforcement | migration | justice | democratic-processes | biometric | critical-infrastructure | none`. The Cloud tier **refuses** to start an agent whose `annexIIIDomain !== 'none'` without an explicit signed waiver, wilful blindness becomes wilful refusal at the type system. ## 12. GDPR mapping | Article | Fuze mechanism | |---|---| | **Art. 5(1)(e)** retention | `RetentionPolicy` is a required type field; `@fuze-ai/agent-compliance` ships a partition function that drops expired records | | **Art. 6** lawful basis | `lawfulBasis` is a run-level field; compatible bases per tool checked at run start | | **Art. 9** special category | `SpecialCategoryTool` requires `art9Basis` | | **Art. 13/14** info to subject | Every span carries `fuze.subject.ref` (HMAC of stable identifier with tenant secret) | | **Art. 15** access | `GET /v1/subjects/:hmac/spans` | | **Art. 17** erasure | `eraseBySubjectRef(hmac)` cascades across spans, suspend store, durable store, memory | | **Art. 22** automated decisions | Same as AI Act Art. 22 | | **Art. 28** processor obligations | DPA template generator | | **Art. 33/34** breach | Incident packets generator | | **Art. 35** DPIA | Auto-fills DPIA from agent definition | | **Art. 44–49** transfers | TIA generator per non-EU sub-processor; SCC selector | ## 13. Where to read the actual code | Concept | Source | |---|---| | Loop | `packages/agent/src/loop/loop.ts` | | Hash chain | `packages/agent/src/evidence/hash-chain.ts` | | Canonicalization | `packages/agent/src/evidence/canonical.ts` | | Discriminated `FuzeTool` | `packages/agent/src/types/tool.ts` | | `Ctx` and `ctx.invoke` | `packages/agent/src/types/ctx.ts` | | Resume tokens + nonces | `packages/agent/src/loop/suspend.ts` | | Definition fingerprint | `packages/agent/src/loop/fingerprint.ts` | | Cerbos engine | `packages/agent-policy-cerbos/` | | just-bash sandbox | `packages/agent-sandbox-justbash/` | | E2B sandbox | `packages/agent-sandbox-e2b/` | | Transparency log | `packages/agent-transparency/` | | KMS signers | `packages/agent-signing-kms/` | | MCP host / server | `packages/agent-mcp/`, `packages/agent-mcp-server/` | | API contract / server | `packages/agent-api/`, `packages/agent-api-server/` | | Annex IV mapping | `packages/agent-annex-iv/` | | Eval framework | `packages/agent-eval/` | | Sovereign Terraform | `packages/agent-sovereign-terraform/modules/` | Reference agents: `agent-employment-screening` (Annex III), `agent-customer-support` (PII), `agent-code-gen` (sandbox), `agent-hitl-demo` (full HITL roundtrip). ## 14. Beyond this page - OpenTelemetry GenAI conventions, [the spec](https://github.com/open-telemetry/semantic-conventions/tree/main/docs/gen-ai), and our [architecture reference](/docs/agent/reference/architecture). - Cerbos policy authoring, [Cerbos docs](https://docs.cerbos.dev). - Sigstore Rekor, [Sigstore docs](https://docs.sigstore.dev). - TypeScript discriminated unions, [TS handbook](https://www.typescriptlang.org/docs/handbook/2/narrowing.html#discriminated-unions). - Firecracker microVMs, [the AWS paper](https://www.usenix.org/conference/nsdi20/presentation/agache). - Operating Fuze in production, [operations guide](/docs/agent/guides/operations). - Building your first agent, [first agent tutorial](/docs/agent/tutorial/01-first-agent). If something here is unclear, the source is the authoritative answer. Every claim about Fuze's behavior maps to a test under `packages/*/test/`. --- ## Planning, dispatch, and oversight Source: https://fuze-ai.tech/docs/agent/reference/planning-and-dispatch/ # Planning, dispatch, and oversight Fuze Agent ships three runtime surfaces that turn an evidence-grade audit story into a usable agent framework: a **plan tool** that the model commits to before touching sensitive data, **capability envelopes** for typed sub-agent dispatch, and an **Article 14 oversight primitive** that suspends a run durably until a human reviewer resolves it. All three feed the same hash-chained evidence ledger. ## The model in one paragraph A Fuze project has top-level **agents** (defined via `defineAgent`) and **roles** (defined via `defineAgentRole`). Agents carry full compliance metadata — `purpose`, `lawfulBasis`, `annexIIIDomain`, `producesArt22Decision`, `art14OversightPlan`. Roles are capability envelopes: they declare what tools, data classes, and EU residency a child can operate under, without claiming a specific task. Agents dispatch to roles at runtime with a freeform task brief. The runtime auto-injects three plan tools when planning is enabled, and one typed `dispatch_` tool per envelope listed in `canDispatch`. ## Folder convention ``` my-app/ agents/ roles/ researcher.ts ← defineAgentRole(...) personal-data-researcher.ts loan-orchestrator/ agent.ts ← defineAgent(...) imports the markdown instructions.md ← short behavioral prompt context/ underwriting-policy.md edge-cases.md ``` The convention is documented but not required — `defineAgent` and `defineAgentRole` work programmatically. `fromMarkdown(path)` reads + sha256-hashes a file at module-load time so the resolved string and its hash become part of the agent fingerprint. ## Planning When `def.annexIIIDomain !== 'none'` or `def.producesArt22Decision === true`, the runtime defaults to requiring a plan before any tool with `dataClassification: 'personal' | 'special-category'` can fire. Override per-agent via `planning: { required: false }`. The override leaves an evidence trail. ```ts defineAgent({ // ... planning: { required: 'auto-when-high-risk', minSteps: 2, maxSteps: 10 }, }) ``` Three tools are auto-injected and visible to the model under their declared names: - `commit_plan({ steps: [{ content, active_form, parent_step_id? }] })` — version 1 - `update_plan_step({ step_id, status, evidence_refs?, unlink_refs?, note? })` — append-only - `revise_plan({ add_steps?, remove_steps?, reorder?, rationale })` — produces v2..n Step IDs are minted on creation (`step_`) and never reassigned across revisions. Splits create new IDs with `derived_from: [old_id]` edges. Removed steps get `status: 'superseded'` — their evidence stays linked. Once a step transitions to `done`, status is locked. **Auto-capture.** Evidence rows emitted while a step is `in_progress` are auto-linked to that step. Each transition records `linkage_source: 'auto' | 'explicit' | 'corrected'` so auditors see which links the agent declared vs. which the runtime inferred. Use `evidence_refs` to add links the runtime missed; use `unlink_refs` to correct auto-captured mistakes. **Per-step lifecycle timestamps.** `createdAt`, `startedAt`, `suspendedAt`, `resumedAt`, `endedAt` are populated automatically on transitions. Regulators care how long humans were in the loop. ## Dispatch — capability envelopes `defineAgentRole` produces a typed envelope. The role's `roleHash` covers name, instructions hash, context manifest, tools, data classification, residency, and view names — so changes are auditable. ```ts const researcher = defineAgentRole({ name: 'researcher', instructions: 'Answer with citations.', tools: [searchPolicies, getPrecedent], dataClassification: 'public', outputSchema: z.object({ summary: z.string() }), outputViews: { citations: z.object({ sources: z.array(z.object({ url: z.string(), quote: z.string() })) }), table: z.object({ rows: z.array(z.record(z.string(), z.unknown())) }), }, maxSteps: 8, }) ``` The orchestrator declares which envelopes it can dispatch into: ```ts const orchestrator = defineAgent({ // ... canDispatch: [researcher, personalDataResearcher, computational], }) ``` The runtime synthesizes one tool per role: `dispatch_researcher`, `dispatch_personal_data_researcher`, etc. Each accepts `{ task, view?, forward_context?, forward? }`. The model picks `view` from the role's `outputViews` enum at call time; the return type narrows accordingly. **No metadata inheritance.** A child does not operate under the parent's `lawfulBasis` or `annexIIIDomain`. Roles either declare their own (when their data classification requires one) or declare none (and then can't process the relevant data classes). Annex IV documentation reads the envelope, not the call site. **Forwarding is opt-in, with role-level pull.** - `requiresTenant: true` on a role auto-forwards tenant when the parent has one. If the parent has no tenant, dispatch fails closed. - `requiresPrincipal: true` is the same for principal. - The parent can additionally pass `forward: ['principal', 'subjectRef']` to flow more context per call. The `runChild` callback you provide to `runAgent({ runChild })` receives a fully-typed `RunChildInput` and returns a `DispatchResult`: ```ts type DispatchResult = | { ok: true; output: T; runId: RunId; chainRoot: string } | { ok: false; failure: AgentRunFailure; runId: RunId; chainRoot: string } ``` Children's exceptions never bubble to the parent — they become typed failures. The parent sees `{ ok: false, failure: { category, message, attribution, retriable, attempt, childFailure? } }` and decides whether to retry. ## Article 14 oversight (`requestOversight`) `ctx.requestOversight()` (or the standalone `requestOversight` helper) suspends the run durably through a `DurableExecutionAdapter`. Restate is the intended production substrate; an `InMemoryDurableAdapter` ships for tests and local dev. ```ts import { requestOversight, InMemoryDurableAdapter } from '@fuze-ai/agent' const adapter = new InMemoryDurableAdapter() const decision = await requestOversight( { adapter, emitSuspendEvent: (req, hash) => emitter.emit({ /* ... */ }), emitResumeEvent: (req, dec, entryHash) => emitter.emit({ /* ... */ }), }, { runId, reason: 'tool_high_risk', evidence: { tool: 'send_email', to: redactedRef }, reviewerHint: 'team-compliance', timeoutMs: 60 * 60_000, proposedArgs: { subject: 'original', body: '...' }, }, ) if (decision.decision === 'approve') { /* proceed */ } if (decision.decision === 'modify') { useArgs(decision.modifiedArgs) } if (decision.decision === 'reject') { halt() } if (decision.decision === 'timeout') { fall back to deny } ``` Two distinct evidence entries chain across the suspend gap: - `oversight_suspend` carries `evidencePayloadHash` and the awakeable id. - `oversight_resume` carries `humanInputEntryHash` plus the reviewer's signature. A `modify` decision creates a chain fork — downstream tool calls reference the modify event as parent, not the model's original args. Without this the chain would misrepresent what executed. The dashboard resolves a pending oversight via: ```ts import { resolveOversight } from '@fuze-ai/agent' await resolveOversight(adapter, { awakeableId, decision: 'modify', modifiedArgs: { subject: 'reviewer-edited', body: '...' }, reviewerId: 'reviewer-42', reviewerSignature: ed25519DetachedSignature, }) ``` ## Drift refusal on resume A run paused for review at 9am Monday and resumed Thursday may face model snapshot drift: OpenAI rotates `system_fingerprint` routinely; customers bump `gpt-4o-2024-08-06` to `gpt-4o-2024-11-20`. Fuze's `resumeRun` enforces: - **Snapshot drift + `producesArt22Decision: true` → refuse** with `ModelDriftAtResumeError`. Reviewer can pass `allowModelDrift: true` to override after explicit acknowledgment (which itself becomes evidence). - **Snapshot drift + non-Art22 → warn**, record drift in evidence, proceed. - **System-fingerprint-only drift → never blocks.** Fingerprints rotate on infra changes, not model changes. ## Soft-cancel on operator stop Tools declare their I/O profile via `softCancelTimeoutMs` (default 10s): ```ts defineTool.personal({ name: 'send_email', // ... softCancelTimeoutMs: 30000, // wait up to 30s for SMTP I/O }) defineTool.public({ name: 'search_docs', // ... softCancelTimeoutMs: 2000, }) ``` When the operator hits stop, the in-flight tool gets up to its declared grace period. After that it's hard-killed with a `cancellation_truncated` evidence row noting the truncation. A separate emergency-abort path skips the grace period entirely; that decision is also evidenced. ## What ships in `@fuze-ai/agent` today - `defineAgent`, `defineAgentRole`, `fromMarkdown` (+ `fromMarkdown.dir`) - `runAgent`, `resumeRun`, `ModelDriftAtResumeError` - `PlanState`, `buildPlanTools`, `synthesizeDispatchTool`, `buildDispatchTools` - `requestOversight`, `resolveOversight`, `InMemoryDurableAdapter` - Type exports: `AgentRoleDefinition`, `DispatchResult`, `AgentErrorCategory`, `PlanEvent`, `PlanStep`, `ReplayMode`, `OversightDecision`, full ledger entry types ## What's deferred - **Real Restate adapter** — the `DurableExecutionAdapter` interface is stable; the production package binds to `restate.send` / `ctx.awakeable`. - **`run.replay()` execution** — types are in place (`ReplayMode`, `ReplayResult`, drift shapes). Runtime needs a tool-output cache substrate. - **OTel/OpenInference exporter** — adapter, not contract; can ship anytime. - **Four-eyes mode** for biometric Annex III §1(a) — until a design partner needs it. --- ## First agent Source: https://fuze-ai.tech/docs/agent/tutorial/01-first-agent/ # 1. First agent Build the smallest end-to-end agent: one tool, one model, one run, and a verified hash chain. **What you'll build:** a runnable greeter agent with one custom tool whose hash chain verifies end-to-end. **Prerequisites:** [Quickstart](/docs/agent/quickstart) for install, and a Node 20+ project with TypeScript strict mode. **Next:** [add human-in-the-loop oversight](/docs/agent/tutorial/02-hitl). ## Setup ```bash mkdir my-first-agent && cd my-first-agent npm init -y npm install @fuze-ai/agent zod npm install -D typescript @types/node tsx npx tsc --init --target es2022 --module nodenext --moduleResolution nodenext --strict true ``` ## Define the tool Create `src/greet.ts`: ```ts import { z } from 'zod' import { defineTool, Ok, type ThreatBoundary } from '@fuze-ai/agent' const threatBoundary: ThreatBoundary = { trustedCallers: ['agent-loop'], observesSecrets: false, egressDomains: 'none', readsFilesystem: false, writesFilesystem: false, } export const greet = defineTool.public({ name: 'greet', description: 'returns a greeting', input: z.object({ name: z.string() }), output: z.object({ greeting: z.string() }), threatBoundary, retention: { id: 'demo.v1', hashTtlDays: 30, fullContentTtlDays: 7, decisionTtlDays: 90, }, run: async (input) => Ok({ greeting: `hello, ${input.name}` }), }) ``` ## A scripted model Real agents use a real provider. For this tutorial, use a scripted model that returns predetermined steps. Create `src/model.ts`: ```ts import type { FuzeModel, ModelStep } from '@fuze-ai/agent' export const scriptedModel = (steps: readonly ModelStep[]): FuzeModel => { let i = 0 return { providerName: 'fake', modelName: 'demo-1', residency: 'eu', generate: async () => { const s = steps[i++] if (!s) throw new Error('model exhausted') return s }, } } ``` ## Define the agent and run Create `src/index.ts`: ```ts import { z } from 'zod' import { defineAgent, inMemorySecrets, runAgent, StaticPolicyEngine, verifyChain, makeTenantId, makePrincipalId, } from '@fuze-ai/agent' import { greet } from './greet.js' import { scriptedModel } from './model.js' const agent = defineAgent({ purpose: 'demo-greeter', lawfulBasis: 'consent', annexIIIDomain: 'none', producesArt22Decision: false, model: scriptedModel([ { content: '', toolCalls: [{ id: 'c1', name: 'greet', args: { name: 'world' } }], finishReason: 'tool_calls', tokensIn: 10, tokensOut: 5, }, { content: '{"final":"hello, world"}', toolCalls: [], finishReason: 'stop', tokensIn: 12, tokensOut: 4, }, ]), tools: [greet], output: z.object({ final: z.string() }), maxSteps: 5, retryBudget: 0, deps: {}, }) const records: any[] = [] const policy = new StaticPolicyEngine([ { id: 'allow.greet', toolName: 'greet', effect: 'allow' }, ]) const result = await runAgent( { definition: agent, policy, evidenceSink: (r) => records.push(r) }, { tenant: makeTenantId('demo-tenant'), principal: makePrincipalId('demo-user'), secrets: inMemorySecrets({}), userMessage: 'please greet world', }, ) console.log({ status: result.status, output: result.output, hashChainHead: result.evidenceHashChainHead, hashChainValid: verifyChain(records), }) ``` ## Run ```bash npx tsx src/index.ts ``` Expected output: ```json { "status": "ok", "output": { "final": "hello, world" }, "hashChainHead": "", "hashChainValid": true } ``` If `hashChainValid` is `false`, your records were re-ordered or mutated. Every span goes through `EvidenceEmitter`; the chain is non-bypassable. Next: [add human-in-the-loop oversight](/docs/agent/tutorial/02-hitl). --- ## Human-in-the-loop Source: https://fuze-ai.tech/docs/agent/tutorial/02-hitl/ # 2. Human-in-the-loop AI Act Art. 14 requires effective human oversight for high-risk agents. The HITL primitive makes this a runtime invariant: a tool suspends the run, the loop persists state via `@fuze-ai/agent-suspend-store`, an overseer approves or denies, and the loop resumes from the suspension point. **What you'll build:** a tool that pauses the run, captures an overseer's decision, and resumes, with a replay-protected resume token. **Prerequisites:** [First agent](/docs/agent/tutorial/01-first-agent) so you have a working `runAgent` baseline. **Next:** [write a Cerbos policy](/docs/agent/tutorial/03-policy) to govern which tools require approval. ## Flow ``` Agent Suspend Store Overseer Resume ----- ------------- -------- ------ | | | | | Suspend(reason) | | | |------------------>| | | | | save | | | | SuspendedRun | | | | | | | resumeToken | | | |<------------------| | | | status=suspended | | | | | | | | | evidence panel | | | +----------------->| | | | | review + decide | | | | | | | | submit decision | | | +----------------->| | | | | | | consume nonce (replay-protected) | | |<------------------------------------| | | | | | resumeAgent() | | | |<------------------------------------+| | | spans continue | | | | from suspend pt | | | v v v v ``` ## When required The framework requires HITL when any of the following holds: - `producesArt22Decision: true` - `annexIIIDomain` is non-`'none'` - a tool declares `requiresOversight: true` If required and absent, the run halts with `fuze.run.missing_oversight=true`. ## Define an oversight tool ```ts import { z } from 'zod' import { defineTool, Suspend, type ThreatBoundary, } from '@fuze-ai/agent' const threatBoundary: ThreatBoundary = { trustedCallers: ['agent-loop'], observesSecrets: false, egressDomains: 'none', readsFilesystem: false, writesFilesystem: false, } export const requestApproval = defineTool.public({ name: 'request_approval', description: 'pauses the run for human approval of the proposed action', input: z.object({ action: z.string(), rationale: z.string(), }), output: z.object({ approved: z.boolean(), overseer: z.string(), decidedAt: z.string(), }), threatBoundary, retention: { id: 'oversight.v1', hashTtlDays: 365, fullContentTtlDays: 365, decisionTtlDays: 365, }, requiresOversight: true, run: async (input, ctx) => { return Suspend({ reason: 'human_approval_required', payload: { action: input.action, rationale: input.rationale }, resumeSchema: z.object({ approved: z.boolean(), overseer: z.string(), decidedAt: z.string(), }), }) }, }) ``` `Suspend` returns a typed control-flow result that the loop recognizes. The loop persists run state to the suspend store, returns a `resumeToken`, and exits with `status: 'suspended'`. ## Resume The overseer service receives the suspend payload, presents the decision UI, and resumes: ```ts import { resumeAgent } from '@fuze-ai/agent' const resumed = await resumeAgent( { definition: agent, policy, evidenceSink: (r) => records.push(r) }, { resumeToken: result.resumeToken!, resumeValue: { approved: true, overseer: 'overseer:dpo@example.org', decidedAt: new Date().toISOString(), }, }, ) ``` The token is single-use and bound to the run; reuse is rejected with `fuze.suspend.replay=true`. ## Evidence Resumption emits a `human.oversight.decision` span: ```json { "span": "human.oversight.decision", "attributes": { "human.oversight.principal": "overseer:dpo@example.org", "human.oversight.decision": "approved", "human.oversight.decided_at": "2026-04-30T12:34:56Z" } } ``` `evaluateApproval` produces this span. Auditors verify AI Act Art. 14 compliance by querying the chain for `human.oversight.decision` spans on each high-risk run. Next: [write Cerbos policies](/docs/agent/tutorial/03-policy). --- ## Cerbos policies Source: https://fuze-ai.tech/docs/agent/tutorial/03-policy/ # 3. Cerbos policies Cerbos is the production policy engine; `StaticPolicyEngine` is for tests. Every tool dispatch goes through Cerbos. An engine error halts the run with `fuze.policy.engine_error=true`, there is no allow-on-error path. **What you'll build:** a YAML+CEL Cerbos policy that allows public tools, gates personal data on lawful basis, and denies special-category data. **Prerequisites:** [Human-in-the-loop](/docs/agent/tutorial/02-hitl) so suspend/resume is wired up. **Next:** [wire an MCP server](/docs/agent/tutorial/04-mcp) into the same evidence pipeline. ## Install ```bash npm install @fuze-ai/agent-policy-cerbos ``` The package ships an embedded WASM build of Cerbos. Production deployments use a Cerbos pod, see [Operations](/docs/agent/guides/operations). ## Write a policy Create `policies/agent.tools.yaml`: ```yaml apiVersion: api.cerbos.dev/v1 resourcePolicy: version: default resource: agent.tool rules: - actions: ['invoke'] effect: EFFECT_ALLOW roles: ['operator'] condition: match: all: of: - expr: request.resource.attr.tool_name == "greet" - expr: request.resource.attr.classification == "public" - actions: ['invoke'] effect: EFFECT_ALLOW roles: ['operator'] condition: match: all: of: - expr: request.resource.attr.tool_name == "lookup_user" - expr: request.resource.attr.classification == "personal" - expr: request.principal.attr.lawful_basis in ['consent', 'contract'] - actions: ['invoke'] effect: EFFECT_DENY roles: ['operator'] condition: match: expr: request.resource.attr.classification == "special_category" ``` The default in this bundle is deny: any tool not matched is rejected. ## Wire it in ```ts import { CerbosPolicyEngine } from '@fuze-ai/agent-policy-cerbos' const policy = await CerbosPolicyEngine.fromBundle({ bundlePath: './policies/bundle.wasm', }) await runAgent( { definition: agent, policy, evidenceSink }, ctx, ) ``` The bundle is built ahead of time: ```bash cerbos compile policies/ -o policies/bundle.wasm ``` The bundle hash is included in the evidence export. ## Decision evidence Every dispatch emits a `fuze.policy` span: ```json { "span": "fuze.policy", "attributes": { "fuze.policy.decision": "allow", "fuze.policy.bundle_hash": "", "fuze.policy.rule_id": "agent.tools.public-tools", "fuze.policy.engine_error": false } } ``` ## Engine errors are fail-stop If Cerbos throws, returns malformed output, or times out, the loop halts: ```json { "fuze.policy.engine_error": true, "fuze.run.status": "halted" } ``` Do not introduce an allow-on-error path; the rule is fail-closed on engine error by default. Next: [wire up an MCP server](/docs/agent/tutorial/04-mcp). --- ## MCP server Source: https://fuze-ai.tech/docs/agent/tutorial/04-mcp/ # 4. MCP server MCP (Model Context Protocol) servers expose tools to agents over a standard protocol. The Fuze MCP host (`@fuze-ai/agent-mcp`) intercepts the transport so every MCP tool call flows through the same evidence pipeline as a native tool. **What you'll build:** a filesystem MCP server attached to your agent with fingerprint pinning, tool allowlist, and audit spans. **Prerequisites:** [Cerbos policies](/docs/agent/tutorial/03-policy), MCP admission re-uses the same engine. **Next:** [verify the audit chain end to end](/docs/agent/tutorial/05-evidence). ## Install ```bash npm install @fuze-ai/agent-mcp @modelcontextprotocol/sdk ``` ## Wire the filesystem MCP server ```ts import { MCPHost } from '@fuze-ai/agent-mcp' import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js' const host = new MCPHost({ servers: [ { name: 'filesystem', transport: new StdioClientTransport({ command: 'npx', args: ['-y', '@modelcontextprotocol/server-filesystem', '/tmp/agent-workspace'], }), fingerprint: 'sha256:', allowedTools: ['read_file', 'write_file', 'list_directory'], }, ], }) await host.connect() const tools = await host.listTools() ``` ## Admission policy The MCP host enforces an admission policy via Cerbos (`mcp.admission.yaml`). Servers not on the fingerprint allowlist are rejected at connect time. Tools not on the per-server allowlist are rejected at dispatch. ```yaml # policies/mcp.admission.yaml apiVersion: api.cerbos.dev/v1 resourcePolicy: version: default resource: mcp.server rules: - actions: ['connect'] effect: EFFECT_ALLOW roles: ['operator'] condition: match: all: of: - expr: request.resource.attr.fingerprint in request.principal.attr.allowed_fingerprints - expr: request.resource.attr.sandbox_tier == "isolated" ``` A non-isolated sandbox tier on an MCP server is refused. ## Use MCP tools in your agent The host exposes MCP tools as `FuzeTool` instances; pass them directly to `defineAgent`: ```ts const agent = defineAgent({ // ... tools: [...host.toFuzeTools(), greet], }) ``` Each MCP-backed tool's threat boundary is derived from the server's declared capabilities and the host's policy. ## Cap the tool-list token budget Cap the size of `tools/list` results to keep prompt tokens predictable: ```ts const host = new MCPHost({ servers: [/* ... */], toolListBudget: { maxTokens: 4000, softWarnAtPercent: 80 }, }) ``` When the budget is exceeded, the host emits a `fuze.mcp.budget` warning span and returns a truncated tool list. A hard cap fails closed. ## Evidence Every MCP dispatch emits both `fuze.tool` (as for native tools) and `fuze.mcp` spans: ```json { "span": "fuze.mcp", "attributes": { "fuze.mcp.server": "filesystem", "fuze.mcp.fingerprint": "sha256:...", "fuze.mcp.tool": "read_file" } } ``` Next: [verify the audit chain end to end](/docs/agent/tutorial/05-evidence). --- ## Verifying evidence Source: https://fuze-ai.tech/docs/agent/tutorial/05-evidence/ # 5. Verifying evidence Prove the audit chain end-to-end, from spans collected during a run, to the run-root, to the signed root, to the transparency log proof. **What you'll build:** a verification pipeline that walks the chain, signs the run-root, anchors it, and exports an auditor-ready bundle. **Prerequisites:** [MCP server](/docs/agent/tutorial/04-mcp), or any tutorial run that produced an `evidenceHashChainHead`. **Next:** [run an eval suite](/docs/agent/tutorial/06-eval) to regression-test compliance behavior. ## Span lifecycle ``` input arrives | v +---------------------+ | policy.evaluate | Cerbos: allow / deny / requires-approval +---------+-----------+ | allow v +---------------------+ | tool.execute | Tool.run(parsedInput, ctx) +---------+-----------+ | v +---------------------+ | guardrail.toolResult| PII / injection / residency scan +---------+-----------+ | v +---------------------+ | hash-chain append | prevHash + RFC 8785 + sha256 +---------+-----------+ | v +---------------------+ | run-root sign | Ed25519 (KMS or local) +---------+-----------+ | v +---------------------+ | transparency anchor | inclusion proof + checkpoint +---------------------+ ``` Every step on this timeline emits a span; the chain head after the last step is `evidenceHashChainHead`. ## Collect records ```ts import { runAgent } from '@fuze-ai/agent' const records: any[] = [] const result = await runAgent( { definition: agent, policy, evidenceSink: (r) => records.push(r) }, ctx, ) ``` Each record has a `sequence`, a `prevHash`, and a `payload`. The hash of record `n` is computed from the canonical (RFC 8785) serialization of `{ sequence, prevHash, payload }`. ## Verify the chain ```ts import { verifyChain } from '@fuze-ai/agent' const ok = verifyChain(records) console.log({ ok, finalHash: records.at(-1)?.payload.hash, runRoot: result.evidenceHashChainHead, }) ``` `verifyChain` returns `false` if: - a record is out of order - a `prevHash` does not match the prior record's hash - a payload byte was modified after recording A single byte flip is detected. ## Sign the run-root ```ts import { LocalKeySigner } from '@fuze-ai/agent-signing' const signer = await LocalKeySigner.fromFile('~/.fuze/agent-key') const signature = await signer.sign(result.evidenceHashChainHead) ``` For Sovereign deployments, replace `LocalKeySigner` with `KmsSigner` from `@fuze-ai/agent-signing-kms`. The KMS-backed signer enforces deny-on-export at the key policy level. ## Verify the signature ```ts import { verifySignature } from '@fuze-ai/agent-signing' const valid = await verifySignature({ message: result.evidenceHashChainHead, signature, publicKey: await signer.publicKey(), }) ``` ## Anchor to the transparency log ```ts import { anchorToTransparencyLog } from '@fuze-ai/agent-transparency' const proof = await anchorToTransparencyLog({ runRoot: result.evidenceHashChainHead, signature, publicKey: await signer.publicKey(), logUrl: 'https://transparency.example.org', }) ``` The returned `proof` includes the inclusion path and the witnessed checkpoint. This is the artifact an auditor checks to confirm the run-root was published to a write-once log before any decision was acted upon. ## Verify the inclusion proof ```ts import { verifyInclusion } from '@fuze-ai/agent-transparency' const included = await verifyInclusion(proof) ``` ## Export the bundle ```ts import { exportEvidenceBundle } from '@fuze-ai/agent' await exportEvidenceBundle({ runId: result.runId, records, runRoot: result.evidenceHashChainHead, signature, proof, outPath: './bundle.zip', }) ``` The bundle is what an auditor receives. Every claim about the run is reproducible from this archive plus the public verifying key and the transparency log's published checkpoints. Next: [run an eval suite against your agent](/docs/agent/tutorial/06-eval). --- ## Eval suite Source: https://fuze-ai.tech/docs/agent/tutorial/06-eval/ # 6. Eval suite `@fuze-ai/agent-eval` runs cases against an agent definition, capturing pass/fail plus the full evidence stream. It is the regression-test surface for compliance behavior. **What you'll build:** a runnable eval harness with cases that prove the loop fail-stops on the right signals. **Prerequisites:** [Verifying evidence](/docs/agent/tutorial/05-evidence), eval inspects the same span stream. **Next:** [EU Sovereign tier](/docs/agent/guides/sovereign) for production deployment. ## Install ```bash npm install -D @fuze-ai/agent-eval ``` ## Define cases Create `eval/cases.ts`: ```ts import { defineCase } from '@fuze-ai/agent-eval' export const cases = [ defineCase({ name: 'greets-named-user', userMessage: 'please greet alice', expect: { status: 'ok', outputMatches: { final: /hello, alice/ }, hashChainValid: true, }, }), defineCase({ name: 'rejects-special-category-without-art9-basis', userMessage: 'lookup health record for patient 42', expect: { status: 'halted', spans: { contains: { 'fuze.policy.decision': 'deny' } }, }, }), defineCase({ name: 'fail-stop-on-policy-engine-error', inject: { cerbosThrows: true }, expect: { status: 'halted', spans: { contains: { 'fuze.policy.engine_error': true } }, }, }), defineCase({ name: 'requires-oversight-for-art22-decision', overrides: { agent: { producesArt22Decision: true } }, expect: { status: 'halted', spans: { contains: { 'fuze.run.missing_oversight': true } }, }, }), ] ``` ## Run the suite ```ts import { runEval } from '@fuze-ai/agent-eval' import { agent, policy } from '../src/index.js' import { cases } from './cases.js' const report = await runEval({ agent, policy, cases, outDir: './eval-out', }) console.log({ total: report.total, passed: report.passed, failed: report.failed, }) if (report.failed > 0) process.exit(1) ``` The `outDir` receives one subdirectory per case with the full evidence stream and the result. Failures include the span that violated the expectation and the surrounding context. ## CI integration ```bash npx tsx eval/run.ts ``` Wire this into CI as a separate job from unit tests. A failing eval is a regression on agent behavior, not on code shape. ## What to put in the eval suite Promote the bypass tests from `@fuze-ai/agent`'s test suite: - `lawful-basis-mismatch`, tools and basis disagree - `policy-engine-error`, Cerbos throws - `replay-attack`, resume token reuse - `tampered-evidence`, byte flip in records - `in-process-multi-tenant`, tenant isolation - `dynamic-tool-no-metadata`, unmetadated tool - `secret-in-args`, `SecretRef` plaintext leak Each maps to a regulatory obligation. A failure is the signal for an Art. 33/34 incident review. --- ## Full-interaction spans Source: https://fuze-ai.tech/docs/spans/ `guard()` records tamper-evident step records for individual tool calls. The span API records the **whole interaction** — user input, retrieval, LLM messages, tool args, assistant output — into the same hash chain, with semantic roles, parent linkage, and optional inline content capture. It's the right primitive when you want a conversation timeline and aggregate optimization views (slow paths, stuck tools, retrieval quality) without adopting an agent framework. ## Three primitives - **`run(opts, fn)`** — establishes an implicit run scope. Anything inside the callback inherits `runId` via `AsyncLocalStorage` (JS) or `contextvars` (Python). No threading required. - **`span(opts)`** — record a leaf span at the current scope: role, optional captured content, optional `attrs` bag. - **`traced(fn, opts)`** — wrap a function so its invocation becomes a span. Nested `traced`/`span` calls inherit `parentStepId` automatically. ## Quickstart ### TypeScript ```typescript import { run, span, traced, configure } from 'fuze-ai' configure({ redactor }) // required only when capture === 'full+redact' await run({ sessionId, userId, tenant }, async () => { await span({ role: 'user', capture: 'full', content: { kind: 'text', text: userInput }, }) const hits = await traced(searchKnowledge, { role: 'retrieval', capture: 'full', })(query) const reply = await traced(callLLM, { role: 'llm', capture: 'full+redact', })(messages) await span({ role: 'assistant', capture: 'full', content: { kind: 'text', text: reply }, }) }) ``` ### Python ```python from fuze_ai import run, span, traced, configure configure({"redactor": redactor}) # required only for capture='full+redact' async with run(session_id=..., user_id=..., tenant=...): await span(role='user', capture='full', content={'kind': 'text', 'text': user_input}) hits = traced(search_knowledge, role='retrieval', capture='full')(query) reply = traced(call_llm, role='llm', capture='full+redact')(messages) await span(role='assistant', capture='full', content={'kind': 'text', 'text': reply}) ``` ## Roles A span's `role` drives both dashboard rendering and the cross-run optimization queries. | Role | When to use | |---|---| | `user` | The user's message that started this turn. | | `assistant` | The model's final text reply to the user. | | `system` | A system/setup span (e.g., context window injection). Optional — most apps don't emit these. | | `llm` | A call to a language model. `attrs.model` is conventional. Content is typically `kind: 'messages'`. | | `tool` | A tool invocation that isn't retrieval (writes, planners, browsers, etc.). | | `retrieval` | Vector / hybrid / FTS / graph search. Content is `kind: 'retrieval'` with `query` and `results[]` — this is what powers the retrieval-quality dashboard view. | ## Capture modes Capture is a deliberate per-span decision. Defaults to `hash` so existing code is unaffected. | Mode | Behaviour | |---|---| | `hash` (default) | Tamper-evident only. No content stored. Same as legacy `guard()` records. | | `full` | Inline raw content. Replayable but stored. Use for non-PII data (policy doc IDs, public planning text). | | `full+redact` | Inline content **after** redaction. The unredacted form never enters the hash chain. **Fail-closed**: throws `FuzeError` if no `redactor` is configured. | | `sampled` | Reserved for future sampling policies. | The redactor is a single-method interface you supply at `configure(...)` time: ```typescript configure({ redactor: { redactContent(content) { // strip emails, phone numbers, etc. Return the modified shape. return content }, }, }) ``` There is no built-in redactor — the SDK stays neutral about what counts as PII for your domain. ## Content shapes `content` is a discriminated union keyed by `kind`. Authoritative schema lives at [`data/trace-schema.json`](https://github.com/nericarcasci/fuze-ai/blob/main/data/trace-schema.json). - **`{ kind: 'text', text }`** — for `user`, `assistant`, `system` spans. - **`{ kind: 'messages', messages: [{ role, text }] }`** — for `llm` spans. - **`{ kind: 'tool_call', args, result? }`** — auto-generated by `traced(fn)`; you rarely emit this manually. - **`{ kind: 'retrieval', query, results: [{ docId, chunkId, score, cited?, snippet? }] }`** — for `retrieval` spans. The `cited` flag drives the retrieval-quality scorecard. `attrs` is an open record for span-type-specific fields (`attrs.model` on LLM spans, `attrs.jurisdiction` on retrieval spans, etc.). Promote frequently-used keys to typed fields in a future revision rather than letting `attrs` grow load-bearing. ## What you get in the dashboard Once spans are flowing, the cloud dashboard exposes two views you couldn't build before: - **`/runs/:runId/timeline`** — a per-run conversation view. Spans render by role (chat bubbles for user/assistant, collapsible message arrays for LLM, score-tagged hits for retrieval). Indented by `parentStepId`. Errors get a red border. - **`/optimization`** — five panels backed by SQL over the span table: stuck tool calls, runs above P95 step count, slow steps by `(role, tool_name)`, retrieval quality (cited vs uncited scores per jurisdiction), token hotspots. Each row drills into the timeline. ## Compliance gating For tenants in the Cloud or Daemon mode, the ingest endpoint **rejects** `capture !== 'hash'` unless `organisations.allow_content_capture = true`. The default is `false`, so content can never accidentally enter storage. Operators flip the gate explicitly once data-residency and DPA terms are in place. ## Common patterns ### Bracketing a conversation turn ```typescript async function handleUserMessage(req) { await run({ sessionId: req.body.conversation_id, userId: req.user.id, tenant: req.org.id }, async () => { await span({ role: 'user', capture: 'full', content: { kind: 'text', text: req.body.message } }) const reply = await runAgent(req.body.message) // tools + LLM inside use traced()/span() await span({ role: 'assistant', capture: 'full', content: { kind: 'text', text: reply } }) }) } ``` ### Recording a retrieval call with cited results ```typescript const hits = await retrieve(query) await span({ role: 'retrieval', capture: 'full', attrs: { jurisdiction: 'dcc' }, content: { kind: 'retrieval', query, results: hits.map((h) => ({ docId: h.doc_id, chunkId: h.chunk_id, score: h.score, cited: citedChunkIds.has(h.chunk_id), })), }, }) ``` ### Wrapping a streaming LLM call `traced()` records the span when the wrapped function settles. For streaming calls, record the span at stream-end: ```typescript async function callLLM(messages) { const stream = await openai.chat.completions.create({ ..., stream: true }) let text = '' let usage = null for await (const chunk of stream) { text += chunk.choices[0]?.delta?.content ?? '' if (chunk.usage) usage = chunk.usage } await span({ role: 'llm', capture: 'full+redact', attrs: { model, finish_reason: 'stop' }, content: { kind: 'messages', messages: [...messages, { role: 'assistant', text }] }, }) return text } ``` ## Compatibility - `guard()`, `createRun()`, `guardMethod`, `guarded` continue to work unchanged. The span API is additive. - Pre-v2 records (no `role`, no `capture`) validate against the new schema after defaults apply (`role='tool'`, `capture='hash'`). - `verifyChain()` succeeds with mixed pre-v2 and v2 records in the same chain. ## Reference - Wire schema: [`data/trace-schema.json`](https://github.com/nericarcasci/fuze-ai/blob/main/data/trace-schema.json) - Design rationale: [`.context/proposal-full-spans.md`](https://github.com/nericarcasci/fuze-ai/blob/main/.context/proposal-full-spans.md) - Parity rules (JS ↔ Python): [`.context/parity.md`](https://github.com/nericarcasci/fuze-ai/blob/main/.context/parity.md) # SDK --- ## CrewAI Source: https://fuze-ai.tech/docs/adapters/crewai/ Add Fuze protection to CrewAI tools through `FuzeToolMixin` for `BaseTool`, or batch-wrap existing tool instances. ## Installation ```bash pip install fuze-ai crewai ``` ## Usage ```python from crewai import BaseTool from fuze_ai.adapters.crewai import FuzeToolMixin class SearchTool(FuzeToolMixin, BaseTool): name = "search" description = "Search the vector database" fuze_config = { "max_cost": 0.50, "max_retries": 3, } def _run(self, query: str) -> str: return vector_db.search(query) ``` ## Side-effects ```python class SendEmailTool(FuzeToolMixin, BaseTool): name = "send_email" description = "Send an email" fuze_config = { "side_effect": True, } def _run(self, to: str, body: str) -> str: return ses.send_email(to, body) def _compensate(self, result): ses.recall_email(result["message_id"]) ``` ## Wrapping existing tools ```python from fuze_ai.adapters.crewai import fuze_crew_tools tools = [SearchTool(), CalculatorTool(), EmailTool()] guarded = fuze_crew_tools(tools, side_effects={"send_email": recall_email}) ``` ## Full example ```python from crewai import Agent, Task, Crew tools = fuze_crew_tools( [SearchTool(), EmailTool()], side_effects={"send_email": recall_email}, ) agent = Agent( role="Research Assistant", goal="Find and summarize information", tools=tools, ) crew = Crew(agents=[agent], tasks=[Task(description="Research quarterly revenue", agent=agent)]) result = crew.kickoff() ``` Expected `fuze-traces.jsonl` excerpt: ```jsonl {"event":"step.end","tool":"search","cost":0.008,"duration_ms":420} {"event":"step.end","tool":"send_email","cost":0.0,"side_effect":true,"compensable":true} ``` --- ## LangGraph Source: https://fuze-ai.tech/docs/adapters/langgraph/ Add Fuze protection to a LangGraph project by wrapping tools registered with `ToolNode`, names, descriptions, and types are preserved. ## Installation ```bash pip install fuze-ai langgraph ``` ## Usage ```python from fuze_ai.adapters.langgraph import fuze_tools from langgraph.prebuilt import ToolNode # Your existing tools tools = [search_tool, calculator_tool, email_tool] # Wrap all tools with Fuze protection guarded_tools = fuze_tools(tools, config={ "max_cost_per_step": 0.50, "max_iterations": 25, }) # Use in your graph as normal tool_node = ToolNode(guarded_tools) ``` ## What it does `fuze_tools()` wraps each tool's `_run` method with `@guard`, preserving the tool name, description, and input/output types, while adding loop detection, budget tracking, and audit logging. ## Marking side-effects ```python guarded_tools = fuze_tools(tools, side_effects={ "send_email": cancel_email, "create_record": delete_record, }) ``` ## Per-tool configuration ```python guarded_tools = fuze_tools(tools, per_tool={ "search": {"max_cost": 0.10}, "send_email": {"side_effect": True, "max_retries": 1}, }) ``` ## Full example ```python from langgraph.graph import StateGraph, MessagesState from langgraph.prebuilt import ToolNode from fuze_ai.adapters.langgraph import fuze_tools tools = [search, calculator, send_email] guarded = fuze_tools(tools, side_effects={"send_email": recall_email}) graph = StateGraph(MessagesState) graph.add_node("tools", ToolNode(guarded)) # ... rest of graph setup result = graph.compile().invoke({"messages": [("user", "find recent revenue")]}) ``` Expected output (from `fuze-traces.jsonl` after the run): ```jsonl {"event":"step.start","tool":"search","run_id":"run_..."} {"event":"step.end","tool":"search","cost":0.012,"tokens_in":18,"tokens_out":94} {"event":"step.start","tool":"calculator","run_id":"run_..."} {"event":"step.end","tool":"calculator","cost":0.0,"tokens_in":0,"tokens_out":0} ``` --- ## Raw SDK Source: https://fuze-ai.tech/docs/adapters/raw-sdk/ Use Fuze directly with the raw OpenAI or Anthropic SDK, no agent framework required. Wrap each tool with `guard()` and dispatch from the LLM's tool-call response. ## TypeScript ```typescript import { guard } from 'fuze-ai' import OpenAI from 'openai' const client = new OpenAI() const tools = { search: guard(async (query: string) => { return await vectorDb.search(query) }), send_email: guard( async (to: string, body: string) => { return await ses.sendEmail(to, body) }, { sideEffect: true, compensate: recallEmail } ), } async function handleToolCall(toolCall: any) { const fn = tools[toolCall.function.name] const args = JSON.parse(toolCall.function.arguments) return await fn(...Object.values(args)) } ``` ## Python ```python from fuze_ai import guard from anthropic import Anthropic client = Anthropic() @guard def search(query: str): return vector_db.search(query) @guard(side_effect=True, compensate=recall_email) def send_email(to: str, body: str): return ses.send_email(to, body) tools = {"search": search, "send_email": send_email} def handle_tool_use(tool_use): fn = tools[tool_use.name] return fn(**tool_use.input) ``` ## Batch wrapping ```python from fuze_ai import guard_all guarded = guard_all( {"search": search_fn, "calculate": calc_fn, "send_email": email_fn}, side_effects=["send_email"] ) ``` ```typescript import { guardAll } from 'fuze-ai' const guarded = guardAll( { search: searchFn, calculate: calcFn, sendEmail: emailFn }, { sideEffects: ['sendEmail'] } ) ``` Run any of the snippets above and Fuze writes a JSONL trace per call to `./fuze-traces.jsonl`: ```jsonl {"event":"step.end","tool":"search","cost":0.005,"tokens_in":12,"tokens_out":80} {"event":"step.end","tool":"send_email","cost":0.0,"side_effect":true,"idempotency_key":"se_..."} ``` --- ## Budget Enforcement Source: https://fuze-ai.tech/docs/budget/ Fuze enforces hard budget ceilings per-step and per-run. A call that would exceed the ceiling is blocked before execution. ## How it works 1. Before each guarded call, Fuze estimates the cost based on the model and expected token usage 2. If `estimated_cost + accumulated_cost > ceiling`, the call is **blocked before execution** 3. After each call, the actual cost is recorded and accumulated ```typescript const search = guard(searchFn, { maxCost: 0.50, // Per-call override, or set via registerTools() defaults }) ``` ```python search = guard(search_fn, max_cost=0.50) # Per-call override ``` ## Configuration ```toml [defaults] max_cost_per_step = 1.00 # USD per individual call max_cost_per_run = 10.00 # USD for the entire run max_tokens = 100000 # Token ceiling per run timeout = "30s" # Time ceiling per call ``` ## Budget types | Type | Scope | What it limits | |---|---|---| | `max_cost_per_step` | Single function call | Prevents expensive individual operations | | `max_cost_per_run` | Entire agent run | Total spend ceiling across all steps | | `max_tokens` | Entire agent run | Total token usage | | `timeout` | Single function call | Wall-clock time | ## Provider price registry Fuze maintains a price registry for major LLM providers: - OpenAI (GPT-4o, GPT-4, GPT-3.5) - Anthropic (Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku) - Google (Gemini Pro, Gemini Flash) Override pricing for enterprise discounts: ```toml [providers] "openai/gpt-4o" = { input = 0.0020, output = 0.008 } "anthropic/claude-3-5-sonnet" = { input = 0.003, output = 0.015 } ``` ## Timer cleanup When a `timeout` is configured, Fuze sets up an internal timer for each guarded call. On successful completion, the timer is properly cleaned up, there are no leaked timers. If the timeout fires before the call completes, Fuze throws a `GuardTimeout` error. ## Error handling Fuze throws specific error classes for budget violations so you can catch and inspect them programmatically. ```typescript import { BudgetExceeded, GuardTimeout } from 'fuze-ai' try { await guardedSearch('query') } catch (err) { if (err instanceof BudgetExceeded) { console.log(err.accumulated) // Total spent so far console.log(err.ceiling) // The ceiling that was hit console.log(err.estimated) // Estimated cost of blocked call } if (err instanceof GuardTimeout) { // The guarded call exceeded its configured timeout console.log(err.message) } } ``` ```python from fuze_ai import BudgetExceeded, GuardTimeout try: guarded_search("query") except BudgetExceeded as err: print(err.accumulated) # Total spent so far print(err.ceiling) # The ceiling that was hit print(err.estimated) # Estimated cost of blocked call except GuardTimeout as err: # The guarded call exceeded its configured timeout print(err) ``` | Error class | When thrown | |---|---| | `BudgetExceeded` | Estimated cost would push accumulated spend past the ceiling | | `GuardTimeout` | A guarded call exceeded its configured `timeout` duration | --- ## Configuration Source: https://fuze-ai.tech/docs/configuration/ Configure Fuze via `fuze.toml` in your project root, programmatically with `configure()`, or skip both, every setting has a sensible default. ## Full reference ```toml [defaults] max_retries = 3 # Max retries per guarded function timeout = "30s" # Per-call timeout max_cost_per_step = 1.00 # USD ceiling per individual call max_cost_per_run = 10.00 # USD ceiling for the entire run max_iterations = 25 # Hard iteration cap on_loop = "kill" # "kill", "warn", or "skip" trace_output = "./fuze-traces.jsonl" # Path for local trace output [loop_detection] window_size = 5 # Number of recent outputs to compare repeat_threshold = 3 # Consecutive identical outputs before triggering max_flat_steps = 4 # Max steps with no cost change before flagging cost_velocity_window = 60 # Window in seconds for cost velocity check cost_velocity_threshold = 1.0 # USD/min threshold that triggers an alert [cloud] api_key = "" # API key from app.fuze-ai.tech (or FUZE_API_KEY env var) endpoint = "" # Override default endpoint (self-hosted only) [project] project_id = "default" # Project identifier (or FUZE_PROJECT_ID env var) [daemon] enabled = false # Enable local daemon for self-hosted deployments socket_path = "/tmp/fuze-daemon.sock" # Unix socket path # socket_path = "\\\\.\\pipe\\fuze-daemon" # Windows named pipe api_port = 7821 # HTTP API port for the daemon [daemon.budget] org_daily_budget = 100.00 # Org-wide daily spend ceiling (USD) per_agent_daily_budget = 20.00 # Per-agent daily spend ceiling (USD) alert_threshold = 0.80 # Alert at this fraction of ceiling [daemon.alerts] dedup_window_ms = 60000 # Suppress duplicate alerts within this window (ms) webhook_urls = [] # List of webhook URLs for alert delivery [providers] # Override default pricing for enterprise discounts # "openai/gpt-4o" = { input = 0.0020, output = 0.008 } [compliance] enabled = false risk_level = "minimal" # "minimal", "limited", or "high" log_pii = false # Anonymize personal data in traces ``` ## Programmatic configuration Use `configure()` as an alternative to `fuze.toml`. Call it before any `guard()` or `createRun()` calls: ```typescript import { configure } from 'fuze-ai' configure({ cloud: { apiKey: process.env.FUZE_API_KEY }, project: { projectId: 'my-agent' }, defaults: { maxCostPerRun: 5.00, maxIterations: 50 }, }) ``` ```python from fuze_ai import configure import os configure( cloud={"api_key": os.getenv("FUZE_API_KEY")}, project={"project_id": "my-agent"}, defaults={"max_cost_per_run": 5.00, "max_iterations": 50}, ) ``` Values set via `configure()` override `fuze.toml`. Per-function guard options override both. ## Configuration priority Settings are merged in this order (last wins): 1. Built-in defaults, sensible values for all options 2. `fuze.toml`, project-level configuration 3. `configure()`, programmatic override 4. Dashboard tool config, per-tool overrides fetched from the Fuze cloud (see [Tools](/docs/tools)) 5. Per-function options, `guard(fn, { maxCost: 0.50 })` has highest precedence ## Sections ### `[defaults]` | Key | Type | Default | Description | |---|---|---|---| | `max_retries` | number | `3` | Maximum retry attempts | | `timeout` | string | `"30s"` | Per-call timeout | | `max_cost_per_step` | number | `1.00` | USD ceiling per call | | `max_cost_per_run` | number | `10.00` | USD ceiling per run | | `max_iterations` | number | `25` | Hard iteration cap | | `on_loop` | string | `"kill"` | Behavior on loop detection: `"kill"`, `"warn"`, or `"skip"` | | `trace_output` | string | `"./fuze-traces.jsonl"` | File path for local trace output | ### `[loop_detection]` Controls how Fuze detects repetitive agent behavior. | Key | Type | Default | Description | |---|---|---|---| | `window_size` | number | `5` | Number of recent outputs to compare | | `repeat_threshold` | number | `3` | Consecutive identical outputs before triggering | | `max_flat_steps` | number | `4` | Max steps with no cost change before flagging | | `cost_velocity_window` | number | `60` | Window in seconds for cost velocity check | | `cost_velocity_threshold` | number | `1.0` | USD/min threshold that triggers an alert | ### `[cloud]` Connects the SDK to Fuze Cloud for remote configuration and telemetry. Leave unset for free in-process-only mode. | Key | Type | Default | Description | |---|---|---|---| | `api_key` | string | `""` | API key from app.fuze-ai.tech. Can also be set via `FUZE_API_KEY` env var | | `endpoint` | string | `"https://api.fuze-ai.tech"` | Cloud API endpoint. Override only for self-hosted deployments | ### `[project]` | Key | Type | Default | Description | |---|---|---|---| | `project_id` | string | `"default"` | Project identifier shown in the dashboard. Can also be set via `FUZE_PROJECT_ID` env var | ### `[daemon]` Self-hosted only. Connects the SDK to a locally-running Fuze daemon for cross-process budget enforcement and audit logging without sending data to the cloud. | Key | Type | Default | Description | |---|---|---|---| | `enabled` | boolean | `false` | Enable daemon connection | | `socket_path` | string | platform default | UDS socket path (Unix) or named pipe (Windows) | | `api_port` | number | `7821` | HTTP API port for the daemon | ### `[daemon.budget]` | Key | Type | Default | Description | |---|---|---|---| | `org_daily_budget` | number | `100.00` | Organization-wide daily spend ceiling (USD) | | `per_agent_daily_budget` | number | `20.00` | Per-agent daily spend ceiling (USD) | | `alert_threshold` | number | `0.80` | Alert at this fraction of the ceiling | ### `[daemon.alerts]` | Key | Type | Default | Description | |---|---|---|---| | `dedup_window_ms` | number | `60000` | Suppress duplicate alerts within this window (ms) | | `webhook_urls` | array | `[]` | List of webhook URLs for alert delivery | ### `[compliance]` | Key | Type | Default | Description | |---|---|---|---| | `enabled` | boolean | `false` | Enable compliance features | | `risk_level` | string | `"minimal"` | AI system risk classification | | `log_pii` | boolean | `false` | Store raw args/results (GDPR warning) | --- ## Daemon Source: https://fuze-ai.tech/docs/daemon/ Run the optional Fuze daemon to add cross-run pattern detection, org-wide budget enforcement, kill switches, and persistent hash-chained audit storage. ## Starting the daemon ```bash npx fuze-ai daemon ``` The daemon listens on a Unix Domain Socket (or Windows named pipe) and exposes an HTTP API for the dashboard and external integrations. ## What the daemon adds | Capability | SDK only | SDK + Daemon | |---|---|---| | Per-run budget | Yes | Yes | | **Org-wide daily budget** | No | Yes | | **Per-agent daily budget** | No | Yes | | Loop detection | Per-run | Per-run + **cross-run patterns** | | Trace storage | JSONL file | **SQLite with hash-chained audit log** | | Kill switch | No | **Yes, via API or dashboard** | | Compensation/rollback | In-process only | **Persistent + API-triggered** | | Alerts | No | **Webhooks + dashboard WebSocket** | | Pattern analysis | No | **Repeated failures, cost spikes, reliability drops** | ## Architecture ``` ┌─────────────────┐ UDS/pipe ┌──────────────────────────────┐ │ Your Agent │ ──────────────── │ Fuze Daemon │ │ (fuze-ai) │ │ │ └─────────────────┘ │ ┌────────────┐ │ │ │ SQLite │ audit.db │ │ │ (hashed) │ │ │ └────────────┘ │ │ ┌────────────┐ │ │ │ConfigCache │ tool configs │ │ └────────────┘ │ │ REST API :7821 │ └──────────────────────────────┘ │ (optional, if FUZE_API_KEY set) ▼ api.fuze-ai.tech (hybrid mode) ``` The SDK communicates with the daemon over a Unix Domain Socket (Linux/macOS) or named pipe (Windows). The daemon stores all audit data in SQLite with SHA-256 hash chains for tamper detection. There is no embedded web dashboard. The REST API at `:7821` is for direct queries, custom dashboards, or automation. For a full web UI, use Cloud mode (set `FUZE_API_KEY`). ## Configuration ```toml [daemon] socket_path = "/tmp/fuze.sock" # Unix socket path api_port = 7821 # HTTP API port storage_path = "~/.fuze/traces.db" # SQLite database retention_days = 180 # Min 180 for EU AI Act Art. 19 [daemon.budget] org_daily_budget = 100.00 # Org-wide daily ceiling (USD) per_agent_daily_budget = 20.00 # Per-agent daily ceiling (USD) alert_threshold = 0.80 # Alert at 80% of ceiling [daemon.alerts] dedup_window_ms = 60000 # Suppress duplicate alerts within window webhook_urls = ["https://hooks.slack.com/..."] ``` On Windows, use a named pipe: ```toml [daemon] socket_path = "\\\\.\\pipe\\fuze.sock" ``` ## Tool config cache The daemon maintains a local config cache (`tool_config_cache` table in `audit.db`). When the SDK sends a `register_tools` message, the daemon stores default configs for any tool not already in the cache. The SDK reads tool configs synchronously from the cache on every `guard()` call, zero network latency on the hot path. ### Hybrid mode If `FUZE_API_KEY` is set in the daemon process environment, the daemon runs an additional background sync loop every 30 seconds that: 1. Pulls tool configs from `api.fuze-ai.tech/v1/tools/config` → writes to local cache 2. Pushes buffered telemetry to the cloud API This lets you run a local daemon (data stored on-prem) while still getting the cloud dashboard for visibility. The SDK talks only to the daemon, it never calls the cloud API directly in daemon mode. ## API endpoints | Method | Path | Description | |---|---|---| | `GET` | `/api/health` | Daemon liveness check | | `GET` | `/api/runs` | Paginated run list with filters | | `GET` | `/api/runs/:id` | Single run with steps and events | | `POST` | `/api/runs/:id/kill` | Kill an active run | | `GET` | `/api/runs/:id/compensation` | Compensation records for a run | | `POST` | `/api/runs/:id/rollback` | Trigger manual rollback | | `GET` | `/api/budget` | Org and per-agent spend | | `GET` | `/api/agents/:id/health` | Agent reliability stats | | `GET` | `/api/compliance/report/:id` | EU AI Act incident report | | `WS` | `/ws` | Live alerts stream | ## Audit integrity Every record in the SQLite database is hash-chained using SHA-256. The daemon maintains four independent chains: 1. **Runs**, immutable fields hashed at insertion 2. **Steps**, every tool call recorded and chained 3. **Guard events**, loop detections, budget blocks, kills 4. **Compensation records**, rollback actions and outcomes Verify chain integrity: ```bash # Via API curl http://localhost:7821/api/compliance/report/RUN_ID ``` The `verifyHashChain()` method checks all four chains and reports the first broken record if tampering is detected. ## Data retention The daemon automatically purges data older than `retention_days`. Purging cascades across all related tables: - Runs - Steps - Guard events - Compensation records - Idempotency keys This ensures no orphaned records remain after purge. ## Resource management The daemon caps in-memory ended runs at 1,000 entries (FIFO eviction) to prevent unbounded memory growth during long-running deployments. Active runs are always retained. --- ## Dashboard Source: https://fuze-ai.tech/docs/dashboard/ The Fuze dashboard at [app.fuze-ai.tech](https://app.fuze-ai.tech) gives you live run monitoring, tool analytics, workflow patterns, and compliance reporting. Set `FUZE_API_KEY` and the SDK streams telemetry automatically. ## Views ### Live Runs Real-time view of all active and recent agent runs. Shows: - **Activity-based status**, runs are shown as `active` (last event < 5 min), `idle` (< 1 hr), or `stale` (> 1 hr) without requiring an explicit `run.end()` call - Cost accumulation per run with step count - Filter by status, search by agent ID - One-click kill switch for any active run This model matches conversational agents where "completion" is not a well-defined event, a ChatGPT-style session stays active for as long as the user keeps messaging. ### Trace Replay Step-by-step replay of any run: - Timeline of all steps with tool names, latency, and cost - Side-effect and compensation indicators - Guard event details (loops detected, budget blocks, kill-switch activations) - Full args hash and token counts per step ### Tools Per-tool analytics and remote configuration: - **Stats table**, call count, total cost, avg cost, avg latency, P95 latency, failure rate for every registered tool - **Hero cards**, most expensive tool, most-called tool, highest failure rate - **Expand a row** to see the configured budget, retries, and timeout - **Configure button**, edit `maxRetries`, `maxBudget`, `timeout`, and `enabled` state directly. Changes take effect within 30 seconds, no redeployment required See [Tools](/docs/tools) for the SDK side of this feature. ### Workflows Server-side analysis of tool call patterns: - **Tool Chains**, single-interaction sequences detected via n-gram analysis. If your agent calls `search → retrieve → summarize` in 60% of runs, that chain appears here - **Recurring Patterns**, cross-run recurring sequences. Fuze infers interaction boundaries from timing gaps and identifies tool sequences that recur across ≥ 40% of runs - **Friction Points**, tools flagged for high failure rate (> 20%) or unusually high cost (> 3× average). These are candidates for budget or timeout tuning Trigger a fresh analysis with the **Run analysis** button, or wait for the automatic 6-hour background job. ### Budget Org-wide and per-agent spend tracking: - Daily spend with trend visualization - Per-agent breakdown - Budget ceiling indicators ### Agent Health Per-agent reliability metrics: - Success rate, total runs, average cost per run - Failure hotspot detection (which tool fails most for each agent) ### Compliance Panel EU AI Act compliance checklist: - Hash chain integrity status - Audit log coverage and retention period - Human oversight controls (kill switches) - Risk classification status ### Systems Per-AI-system inventory. Each registered system carries a provider-vs-deployer classification, risk tier (minimal / limited / high), intended purpose, and the data categories it processes. Evidence from runs, guard events, and questionnaires is attached to the system, which is what the Art. 12/14/15 conformity-assessment export packages up. ### Vendor VRA Vendor risk-assessment auto-responder. Paste a questionnaire (one question per line), click **Auto-respond**, and each question is matched against a built-in seed corpus plus any custom entries under **Settings → VRA corpus**. Every draft carries a confidence score and one of `approved / drafted / needs_review / unanswered`; low-confidence answers never auto-submit. Approved answers are editable inline and tracked per questionnaire (`draft / in_review / sent / archived`). ## Team and roles (RBAC) Members hold one of five roles, checked server-side on every mutating call: | Role | Scope | |---|---| | **owner** | everything, plus organisation deletion | | **admin** | team, settings, billing, compliance writes, run writes, audit read | | **member** | compliance writes, run writes, audit read | | **viewer** | audit read only | | **billing** | billing and audit read only | Role changes go through **Settings → Team** and are written to the admin audit log along with actor, target, outcome, IP, and user-agent. The client never carries authority: viewers can load the UI, but the API rejects any write they attempt. ## Admin audit log Every privileged action, role change, invite, retention update, OTEL configuration, account export, account erasure, is recorded to `admin_audit_log` with actor UID, org, target type/id, outcome, and request metadata. The log is append-only and surfaced to owners and admins; it is also included in the `GET /account/export` payload. ## OTEL export Forward completed runs as OTLP/HTTP JSON traces to your own observability backend (Datadog, Honeycomb, Grafana Cloud, any OTLP receiver). Configure under **Settings → OTEL export**: - Endpoint URL (plain-http endpoints are rejected in production) - Per-header auth (exporter headers are encrypted at rest with AES-256-GCM; the master key is held outside the database) - **Test connection** sends a synthetic span so you see failures before enabling export - `last_export_at` and `last_error` are surfaced on the same page Export is opt-in per organisation and only mutates with the `settings:manage` permission. ## Account export and erasure Under **Settings → Retention** (owners only): - **Export**, downloads a JSON bundle of the organisation, members, projects, runs, steps, guard events, retention policies, billing invoices, and the admin audit log - **Erase**, requires typing `ERASE` to confirm. Deletes guard events, steps, runs, alert deliveries, retention policies, API keys, projects, and members, then anonymises the organisation row and timestamps `erased_at`. The erasure itself is recorded in the admin audit log with per-table status. Both actions satisfy GDPR Articles 15 and 17 for platform-stored data. Managed-backup purge follows the sub-processor's schedule (documented in the trust pack). ## Deployment modes The cloud dashboard is only available in Cloud mode. The Daemon mode exposes a REST API at `:7821` for direct queries but has no web UI. | Mode | Config | Web UI | Audit storage | |---|---|---|---| | **Cloud** | `FUZE_API_KEY` env var | [app.fuze-ai.tech](https://app.fuze-ai.tech) | Supabase (cloud) | | **Daemon** | `daemon.enabled = true` | REST API at `:7821`, no web UI | SQLite on-prem | | **Standalone** | no config | None | `fuze-traces.jsonl` | All three modes preserve the same `guard()` / `createRun()` API. See [Deployment Modes](/docs/deployment-modes) for a full comparison and setup guide. ## Generating compliance reports Navigate to any run's trace replay and click **Generate Report** to produce an EU AI Act Art. 12 compliant incident report. The report includes: - Full step trace with timestamps and costs - Side-effect inventory - Compensation actions and outcomes - Guard events with severity classification - Human oversight status - Audit chain integrity verification --- ## Deployment Modes Source: https://fuze-ai.tech/docs/deployment-modes/ Pick from three deployment modes, Standalone, Daemon, or Cloud, sharing the same `guard()` / `createRun()` API. Switching modes is config, not code. ## Choosing a mode | | Standalone | Daemon | Cloud | |---|---|---|---| | **Setup** | Nothing | `npx fuze-ai daemon` | `FUZE_API_KEY` | | **Protection** (guard, budget, loop detection) | Yes | Yes | Yes | | **Audit trail** | JSONL file | SQLite (hash-chained) | Supabase (cloud) | | **Dashboard** | No | No | [app.fuze-ai.tech](https://app.fuze-ai.tech) | | **Remote tool config** | No | Via API key (hybrid) | Yes | | **Cross-run analytics** | No | SQLite | Yes | | **Kill switch** | No | Via daemon REST API | Yes (dashboard) | | **Cost** | Free | Free | Paid | **Use Standalone** when you're developing locally, running in CI, or don't need persistent storage. Traces write to `fuze-traces.jsonl`. **Use Daemon** when you need persistent cross-run storage, on-prem data (nothing leaves the machine), or air-gapped deployments. Add `FUZE_API_KEY` to get the daemon syncing to cloud as well. **Use Cloud** when you want the full dashboard, live run monitoring, tool analytics, workflow patterns, compliance reports, and remote tool configuration from the UI. ## How mode selection works The SDK selects a mode automatically based on config, no API to call: ```typescript import { configure } from 'fuze-ai' // Cloud mode, set FUZE_API_KEY env var, or: configure({ cloud: { apiKey: 'fz_...' } }) // Daemon mode configure({ daemon: { enabled: true } }) // Standalone, default, no configure() needed ``` ```python from fuze_ai import configure # Cloud mode, set FUZE_API_KEY env var, or: configure(cloud={"api_key": "fz_..."}) # Daemon mode configure(daemon={"enabled": True}) # Standalone, default, no configure() needed ``` Priority when multiple are configured: **Cloud > Daemon > Standalone**. ## Cloud mode ``` Your agent │ (guard / createRun) │ ▼ fuze-ai SDK (ApiService) │ HTTPS batched to api.fuze-ai.tech │ ▼ Supabase (runs, steps, tool configs) │ ▼ app.fuze-ai.tech dashboard ``` The SDK batches telemetry and ships it to `api.fuze-ai.tech`. Tool configs flow the other direction, the SDK pulls them every 30s so dashboard edits (change a budget, disable a tool) take effect without redeployment. **Setup:** 1. Sign up at [app.fuze-ai.tech](https://app.fuze-ai.tech) and create a project 2. Copy your API key 3. `export FUZE_API_KEY=fz_...` (or set in `fuze.toml`) 4. Call `registerTools()` at startup so your tools appear in the dashboard See [Tools & Remote Config](/docs/tools) and [Dashboard](/docs/dashboard). ## Daemon mode ``` Your agent │ (guard / createRun) │ ▼ fuze-ai SDK (DaemonService) │ Unix Domain Socket / Windows named pipe │ ▼ Fuze Daemon process ├── SQLite (hash-chained audit.db) ├── ConfigCache (tool configs) └── REST API at :7821 GET /api/runs GET /api/runs/:id POST /api/runs/:id/kill GET /api/budget WS /ws (alerts) ``` The daemon stores everything locally. There is no web UI, the REST API at `:7821` is for direct queries or custom dashboards. If you add `FUZE_API_KEY` to the daemon process environment, it also syncs configs and telemetry to the cloud API (hybrid mode). **Setup:** ```bash npx fuze-ai daemon & # or run as a service ``` ```toml # fuze.toml [daemon] enabled = true socket_path = "/tmp/fuze.sock" api_port = 7821 storage_path = "~/.fuze/audit.db" retention_days = 180 ``` See [Daemon](/docs/daemon). ## Standalone mode No daemon, no API key. The SDK runs entirely in-process. Protection (guard, budget, loop detection) works the same. Audit output goes to `fuze-traces.jsonl` in the current directory. No configuration needed: ```typescript import { createRun } from 'fuze-ai' const run = createRun('my-agent', { maxCostPerRun: 2.00 }) // Everything works, guard, budget, loop detection // Traces written to fuze-traces.jsonl ``` ```python from fuze_ai import create_run run = create_run("my-agent", max_cost_per_run=2.00) # Everything works, guard, budget, loop detection # Traces written to fuze-traces.jsonl ``` This is the right default for development and CI. --- ## Examples Source: https://fuze-ai.tech/docs/examples/ Self-contained programs that exercise each core capability. Run any example and inspect the resulting `fuze-traces.jsonl` to see what was recorded. ## TypeScript ### 01, Basic Guard The simplest possible example. Wrap a function with `guard()` and see the trace output. ```typescript import { guard } from 'fuze-ai' const search = guard(async function searchDocuments(query: string) { return await vectorDb.search(query) }) const results = await search('AI agent safety') // Check ./fuze-traces.jsonl for the trace ``` ```bash cd examples/typescript/01-basic-guard && npm install && npx tsx index.ts ``` ### 02, Budget Ceiling Set a $1.00 budget. Each step costs $0.30. The 4th call gets blocked. ```typescript import { guard, configure, BudgetExceeded } from 'fuze-ai' configure({ defaults: { maxCostPerRun: 1.00 } }) const analyse = guard(analyseFn, { maxCost: 0.30 }) try { await analyse('doc') } catch (err) { if (err instanceof BudgetExceeded) { // Budget ceiling hit, step blocked before execution } } ``` ```bash cd examples/typescript/02-budget-ceiling && npm install && npx tsx index.ts ``` ### 03, Loop Detection Simulate an agent stuck retrying the same failed search. Fuze catches it after 3 identical calls. ```typescript import { guard, configure, LoopDetected } from 'fuze-ai' configure({ defaults: { maxIterations: 20, onLoop: 'kill' }, loopDetection: { repeatThreshold: 3 }, }) const search = guard(searchFn) // 3rd identical call throws LoopDetected ``` ```bash cd examples/typescript/03-loop-detection && npm install && npx tsx index.ts ``` ### 04, Side-Effect Tracking Create an invoice (side-effect with compensation), then fail on email send. Shows how Fuze tracks which steps need rollback. ```typescript const invoice = guard(createInvoice, { sideEffect: true, compensate: cancelInvoice, }) ``` ```bash cd examples/typescript/04-side-effects && npm install && npx tsx index.ts ``` ### 05, Multi-Agent Two agents (researcher + writer) share a single budget using `createRun()`. ```typescript import { createRun } from 'fuze-ai' const run = createRun('research-team', { maxCostPerRun: 5.00 }) const search = run.guard(webSearch, { maxCost: 0.10 }) const draft = run.guard(writeDraft, { maxCost: 1.00 }) await search('query') await draft('outline') console.log(run.getStatus()) // { totalCost, stepCount, ... } await run.end() ``` ```bash cd examples/typescript/05-multi-agent && npm install && npx tsx index.ts ``` ### 06, MCP Proxy Protect any MCP server with zero code changes. ```bash # Before: unprotected npx @modelcontextprotocol/server-postgres postgres://localhost/mydb # After: Fuze protection npx fuze-ai proxy -- npx @modelcontextprotocol/server-postgres postgres://localhost/mydb ``` See the [full MCP Proxy docs](/docs/mcp-proxy) for configuration options. ## Python ### 01, Basic Guard ```python from fuze_ai import guard @guard async def search(query: str) -> list[str]: return await vector_db.search(query) ``` ### 02, Budget Ceiling ```python from fuze_ai import guard, configure configure({'defaults': {'max_cost_per_run': 1.00}}) @guard(max_cost=0.30) async def analyse(text: str) -> str: return f'Analysis of {text}' ``` ### 03, Loop Detection ```python from fuze_ai import guard, configure configure({ 'defaults': {'max_iterations': 20}, 'loop_detection': {'repeat_threshold': 3} }) @guard async def search(query: str) -> str: return 'No results found.' ``` ### 04, Side-Effects ```python @guard(side_effect=True, compensate=cancel_invoice) async def create_invoice(customer_id: str, amount: float) -> dict: return {'invoice_id': f'INV-{customer_id}'} ``` ### 05, LangGraph Adapter ```python from fuze_ai.adapters.langgraph import fuze_tool @fuze_tool(max_cost=0.10) def search_web(query: str) -> str: return f'Results for: {query}' ``` ## Running all examples ```bash # TypeScript cd examples/typescript/01-basic-guard && npm install && npx tsx index.ts # Python cd examples/python/01-basic-guard && pip install fuze-ai && python main.py ``` Every example produces a `fuze-traces.jsonl` file. Each line is a JSON record with timestamps, costs, and guard events. --- ## guard() API Source: https://fuze-ai.tech/docs/guard/ `guard()` wraps any function with runtime safety: budget, loop detection, side-effect tracking, and audit logging. ## Basic usage ```typescript import { guard } from 'fuze-ai' // Basic, defaults from registerTools() and fuze.toml const search = guard(async function search(query: string) { return await vectorDb.search(query) }) // Specify the model for automatic cost tracking const generateSummary = guard( async function generateSummary(docs: string[]) { return await openai.chat.completions.create({ model: 'gpt-4o', messages: [{ role: 'user', content: docs.join('\n') }], }) }, { model: 'openai/gpt-4o' } ) // Side-effect with compensation const sendEmail = guard( async function sendEmail(to: string, body: string) { return await ses.sendEmail(to, body) }, { sideEffect: true, compensate: async (result) => { await ses.recallEmail(result.messageId) }, } ) ``` ```python from fuze_ai import guard # Basic, defaults from register_tools() and fuze.toml @guard def search(query: str): return vector_db.search(query) # Specify the model for automatic cost tracking @guard(model="openai/gpt-4o") def generate_summary(docs: list): return openai.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "\n".join(docs)}] ) # Side-effect with compensation @guard(side_effect=True, compensate=recall_email) def send_email(to: str, body: str): return ses.send_email(to, body) ``` Operational defaults (`maxRetries`, `timeout`, `maxBudget`) come from `registerTools()` and can be tuned from the dashboard without redeploying. See [Tools & Remote Config](/docs/tools). ## Options All options are optional. Defaults come from `fuze.toml` and `registerTools()`, then from dashboard tool config (see [Tools](/docs/tools)). Use guard options for things that describe the tool's nature (`sideEffect`, `compensate`, `model`), not for operational config that belongs in `registerTools()`. | Option | Type | Default | Description | |---|---|---|---| | `maxRetries` | `number` | `3` | Max retry attempts for this function | | `timeout` | `number` | `30000` | Timeout in milliseconds | | `maxCost` | `number` | `∞` | Max cost in USD for this call. Overrides dashboard config when lower | | `maxTokens` | `number` | `undefined` | Max tokens for this call | | `maxIterations` | `number` | `25` | Hard iteration cap for this call | | `onLoop` | `'kill' \| 'warn' \| 'skip'` | `'kill'` | Behavior when a loop is detected | | `model` | `string` | `undefined` | Model identifier for auto cost extraction (e.g. `'openai/gpt-4o'`) | | `costExtractor` | `Function` | `undefined` | Custom function to read `{ tokensIn, tokensOut }` from the return value | | `sideEffect` | `boolean` | `false` | Whether this call has real-world consequences | | `compensate` | `Function` | `undefined` | Rollback function called on failure | ## Remote config override When the SDK is connected to the Fuze cloud (`FUZE_API_KEY` set) or daemon, tool configurations set from the dashboard are applied at call time, **without redeploying your code**. The override logic is: - `maxRetries` and `timeout` from the dashboard replace local values entirely - `maxBudget` from the dashboard takes the **minimum** of local and remote values (the tighter limit always wins) - If `enabled: false` is set for a tool in the dashboard, the call throws immediately with `FuzeError` Per-function guard options take the highest precedence and are never overridden remotely. ## Return value `guard()` returns a new function with the same signature. Call it exactly as you would the original. ```typescript const guardedSearch = guard(originalSearch) // Same signature, same return type const results = await guardedSearch('my query') ``` ## Guard events When Fuze intervenes, it throws typed errors: ```typescript import { BudgetExceeded, LoopDetected, GuardTimeout, FuzeError } from 'fuze-ai' try { await guardedSearch('query') } catch (err) { if (err instanceof BudgetExceeded) { // Budget ceiling hit } if (err instanceof LoopDetected) { // Loop pattern detected } if (err instanceof GuardTimeout) { // Per-call timeout exceeded } if (err instanceof FuzeError) { // Tool disabled via remote config, or kill-switch activated } } ``` | Error | Thrown when | |---|---| | `BudgetExceeded` | Cost ceiling reached for the call or run | | `LoopDetected` | Repeated output pattern detected | | `GuardTimeout` | The guarded function exceeded its `timeout` | | `FuzeError` | Tool is disabled remotely, or a kill switch was activated | ## `createRun()` API `createRun()` creates a scoped run context that shares budget and loop state across multiple steps. Use it when a single logical task spans several guarded calls and you want aggregate limits to apply. ```typescript import { createRun } from 'fuze-ai' const run = createRun('research-agent', { maxCostPerRun: 2.00, maxIterations: 50 }) // Wrap functions using the run's own guard, they share the run's budget const search = run.guard(async function search(q: string) { return vectorDb.search(q) }) const summarize = run.guard(async function summarize(docs: string[]) { return llm.summarize(docs) }) const docs = await search('climate policy') const summary = await summarize(docs) // Check accumulated cost at any point const { totalCost, stepCount } = run.getStatus() // run.end() is optional, runs without an explicit end are shown as ongoing in the dashboard await run.end() ``` ```python from fuze_ai import create_run run = create_run("research-agent", max_cost_per_run=2.00, max_iterations=50) @run.guard def search(q: str): return vector_db.search(q) @run.guard def summarize(docs: list): return llm.summarize(docs) docs = search("climate policy") summary = summarize(docs) ``` Each step within the run draws from the shared `maxCostPerRun` and `maxIterations`. If any step pushes the run over its ceiling, a `BudgetExceeded` error is thrown. `run.end()` is optional. If omitted, the run appears as **idle** in the dashboard once activity stops for more than 5 minutes, and **stale** after an hour. This matches conversational agents where "completion" is not a defined event. ## Auto cost extraction Fuze automatically reads token counts from the return value of guarded functions. It recognises the response shapes of all major providers (OpenAI, Anthropic, Google, Cohere, Mistral, Groq, Together, AWS Bedrock). When a `model` is set, it uses the built-in price table to convert tokens to USD. Provide a custom `costExtractor` only when your provider returns tokens in a non-standard format: ```typescript const call = guard( async (prompt: string) => myCustomLlm.generate(prompt), { model: 'my-provider/my-model', costExtractor: (result) => ({ tokensIn: result.usage.prompt_tokens, tokensOut: result.usage.completion_tokens, }), } ) ``` Return `null` from `costExtractor` to fall back to the pre-flight token estimate. ## Timer cleanup Fuze properly cleans up internal `setTimeout` timers on successful execution using a `.finally(() => clearTimeout(timer))` pattern. This means: - **No resource leaks**, timers are cleared whether the guarded function resolves or rejects. - **Safe for long-running processes**, you can wrap thousands of calls without accumulating dangling timers. --- ## Introduction Source: https://fuze-ai.tech/docs/introduction/ Fuze is runtime safety middleware for AI agents. Wrap any framework, LangGraph, CrewAI, Google ADK, raw OpenAI/Anthropic SDK, to add loop detection, budget enforcement, side-effect tracking, and EU AI Act-compliant audit trails. ## What Fuze does Fuze is **not a framework**. It's a middleware layer that wraps your existing agent tool calls: - Loop detection: 5-layer detection catches ping-pong patterns, semantic stalls, and repetitive tool calls - Budget enforcement: hard token, cost, and time ceilings per run with kill switches - Side-effect tracking: knows which tool calls changed the real world, with compensation functions for rollback - Audit trails: full replayable trace of every agent decision, Art. 12 compliant - Smart recovery: retry with modified prompt, rollback to checkpoint, fork to alternate path, or escalate to human - MCP Proxy: wrap any MCP server with zero code changes ## How it works Wrap any tool function with `guard()`, or group multiple steps into a run with `createRun()`. Fuze handles the rest. ```typescript import { createRun } from 'fuze-ai' async function search(query: string) { return await vectorDb.search(query) } const run = createRun('my-agent', { maxCostPerRun: 5.00, maxIterations: 50 }) const guardedSearch = run.guard(search) const results = await guardedSearch('quarterly revenue') ``` ```python from fuze_ai import create_run async def search(query: str): return await vector_db.search(query) run = create_run("my-agent", max_cost_per_run=5.00, max_iterations=50) guarded_search = run.guard(search) results = await guarded_search("quarterly revenue") ``` That's it. Every guarded function and every run has loop detection, budget tracking, and audit logging built in. ## Deployment modes | Mode | Dashboard | Infrastructure | |---|---|---| | **Standalone** | No | None, just `npm install` | | **Daemon** | REST API at `:7821` | One background process | | **Cloud** | [app.fuze-ai.tech](https://app.fuze-ai.tech) | `FUZE_API_KEY` env var | ## Next steps - [Quickstart](/docs/quickstart), get running in 30 seconds - [Deployment Modes](/docs/deployment-modes), choose Cloud, Daemon, or Standalone - [guard() API](/docs/guard), full API reference - [Configuration](/docs/configuration), `fuze.toml` reference - [Examples](/docs/examples), end-to-end usage examples --- ## Loop Detection Source: https://fuze-ai.tech/docs/loop-detection/ Fuze layers five detection strategies to catch stuck agents. Layers 1 and 2 are enforced today; Layers 3-5 accept configuration but are not yet active. ## Layer 1: Hard iteration cap The simplest check. If a function has been called more than `maxIterations` times in a run, kill it. ```toml [defaults] max_iterations = 25 ``` Catches: simple infinite loops, unbounded recursion. ## Layer 2: Tool+args hash dedup Fuze hashes the function name and arguments for each call. If the same hash appears more than `repeat_threshold` times within a sliding window of size `window_size`, it's a loop. Catches: identical repeated calls (e.g., searching for the same query over and over). ## Layer 3: Cost velocity anomaly *(Planned)* > **Coming soon**, this layer accepts configuration but is not yet enforced by the guard. Tracks cost accumulation rate. If cost is increasing faster than the configured threshold with no meaningful output, flag it. Catches: agents that are "working" but burning money without producing useful results. ## Layer 4: Progress check *(Planned)* > **Coming soon**, this layer accepts configuration but is not yet enforced by the guard. Compares the output of consecutive calls. If the last N calls produced no new tokens or identical results, the agent is stuck. Catches: agents that get slightly different inputs but produce the same output. ## Layer 5: Semantic similarity *(Planned)* > **Coming soon**, this layer accepts configuration but is not yet enforced by the guard. Optional. Uses embedding similarity to detect when an agent is rephrasing the same question or getting semantically identical responses. Catches: ping-pong patterns where two agents bounce variations of the same request back and forth. ## Configuration ```toml [defaults] max_iterations = 25 # Layer 1 kill_on_loop = true # What to do when loop detected [loop_detection] window_size = 10 # Layer 2: sliding window size repeat_threshold = 3 # Layer 2: max identical hashes before flagging cost_velocity_window = 5 # Layer 3 (planned): steps to measure progress_window = 4 # Layer 4 (planned): steps to compare ``` ## Recovery actions When a loop is detected, Fuze can: 1. **kill**, terminate the run immediately (default) 2. **warn**, log a warning but allow the call to proceed 3. **skip**, skip the current step and move on Configure per-function: ```typescript const search = guard(searchFn, { onLoop: 'warn' }) ``` ```python @guard(on_loop='warn') def search(query: str): return vector_db.search(query) ``` ```typescript const fetchData = guard(fetchFn, { onLoop: 'skip' }) ``` ```python @guard(on_loop='skip') def fetch_data(url: str): return requests.get(url).text ``` --- ## MCP Proxy Source: https://fuze-ai.tech/docs/mcp-proxy/ Wrap any MCP server with Fuze protection, budget, loop detection, audit logging, without touching the server's code. ## Usage ```bash # Without Fuze npx @modelcontextprotocol/server-postgres # With Fuze (same server, fully protected) npx fuze-ai proxy -- npx @modelcontextprotocol/server-postgres ``` The MCP server doesn't know Fuze exists. Fuze sits between the MCP client and the real server. ## What gets intercepted Every `tools/call` JSON-RPC request passes through Fuze: 1. Budget check, is there budget remaining for this call? 2. Loop detection, have we seen this tool+args combination before? 3. Side-effect check, is this tool known to have side-effects? 4. Audit logging, record the call in the trace If all checks pass, the request is forwarded to the real MCP server unchanged. ## CLI options ```bash npx fuze-ai proxy [options] -- ``` | Option | Description | |---|---| | `--max-cost ` | Override `max_cost_per_run` from config | | `--max-iterations ` | Override `max_iterations` from config | | `--trace` | Write all call traces to `./fuze-proxy-traces.jsonl` | | `--verbose` | Print intercepted calls and decisions to stderr | | `--daemon` | Connect to the Fuze daemon for cross-run enforcement | ## Configuration Configure the proxy in your `fuze.toml`. Per-tool settings use the `[proxy.tools.TOOLNAME]` table format: ```toml [defaults] max_cost_per_step = 1.00 kill_on_loop = true [proxy] max_cost_per_run = 5.00 max_iterations = 50 [proxy.tools.query] estimated_cost = 0.02 [proxy.tools.execute] estimated_cost = 0.05 side_effect = true max_calls_per_run = 10 [proxy.tools.delete_record] estimated_cost = 0.01 side_effect = true max_calls_per_run = 5 timeout = "10s" ``` Each `[proxy.tools.]` section supports: | Key | Type | Description | |---|---|---| | `estimated_cost` | float | Cost attributed to each invocation of this tool | | `side_effect` | bool | Whether this tool mutates external state | | `max_calls_per_run` | int | Maximum times this tool can be called in one run | | `timeout` | string | Per-call timeout (e.g. `"30s"`, `"2m"`) | ## Graceful shutdown When the proxy receives a shutdown signal (SIGINT, SIGTERM, or the MCP client disconnects), it performs a clean teardown: 1. Calls `await router.stop()` to flush pending traces and finalize the run 2. Writes any remaining trace entries to `./fuze-proxy-traces.jsonl` (if `--trace` is enabled) 3. Closes the connection to the daemon (if running in daemon mode) 4. Exits the process This ensures no trace data is lost, even if the client disconnects abruptly. ## JSON-RPC request/response ID tracking The proxy matches JSON-RPC responses to their originating requests by ID, not by payload shape. This is important because: - MCP servers may return responses out of order - Multiple `tools/call` requests can be in flight simultaneously - Matching by ID ensures the correct budget and trace entries are updated for each response Each intercepted request's JSON-RPC `id` is stored in a pending map. When a response arrives, the proxy looks up the original request by `id`, attributes the cost, and records the result in the trace. ## Trace output When `--trace` is enabled, every tool call is logged to `./fuze-proxy-traces.jsonl` as one JSON object per line. Traces are written only at result time (not at intercept time), so each line contains both the request and the response: ```json {"timestamp":"2026-03-28T12:00:00Z","tool":"query","args":{"sql":"SELECT ..."},"result":{"rows":42},"cost":0.02,"duration_ms":150} ``` This avoids duplicate trace entries and ensures every logged call has a known outcome. ## Transports The proxy supports all MCP transport types: - Stdio, pipes stdin/stdout between client and server - SSE, proxies HTTP Server-Sent Events streams - Streamable HTTP, proxies HTTP request/response ## Combined with the daemon ```bash # Terminal 1: Start daemon npx fuze-ai daemon # Terminal 2: Start proxied MCP server npx fuze-ai proxy -- npx @modelcontextprotocol/server-postgres ``` With the daemon running, all MCP calls participate in cross-run pattern detection and org-wide budget enforcement. ## Windows support On Windows, the daemon socket uses a named pipe instead of a Unix domain socket: ``` \\.\pipe\fuze-daemon ``` The proxy detects the platform automatically -- no configuration needed. Path traversal checks also use `path.relative()` for correct behavior on Windows paths. --- ## Quickstart Source: https://fuze-ai.tech/docs/quickstart/ Get Fuze running in 30 seconds. ## Install ```bash # TypeScript / Node.js npm install fuze-ai # Python pip install fuze-ai ``` ## Register your tools Call `registerTools()` once at startup. This sends tool metadata to the Fuze cloud so you can configure budgets and retries from the dashboard without redeploying. ```typescript import { configure, registerTools } from 'fuze-ai' configure({ cloud: { apiKey: process.env.FUZE_API_KEY }, // from app.fuze-ai.tech project: { projectId: 'my-agent' }, }) registerTools([ { name: 'search', description: 'Vector database search', sideEffect: false, defaults: { maxRetries: 3, maxBudget: 0.10, timeout: 10_000 }, }, { name: 'sendInvoice', description: 'Create and send a Stripe invoice', sideEffect: true, defaults: { maxRetries: 1, maxBudget: 0.05, timeout: 10_000 }, }, ]) ``` ```python from fuze_ai import configure, register_tools import os configure(cloud={"api_key": os.getenv("FUZE_API_KEY")}, project={"project_id": "my-agent"}) register_tools([ {"name": "search", "description": "Vector search", "side_effect": False, "defaults": {"max_retries": 3, "max_budget": 0.10, "timeout": 10_000}}, {"name": "send_invoice", "description": "Send Stripe invoice", "side_effect": True, "defaults": {"max_retries": 1, "max_budget": 0.05, "timeout": 10_000}}, ]) ``` No API key? Skip `configure()`, Fuze works fully in-process with zero config. ## Wrap your functions ```typescript import { guard } from 'fuze-ai' // Read-only tool const search = guard(async function search(query: string) { return await vectorDb.search(query) }) // Side-effect with compensation (rollback on failure) const sendInvoice = guard( async function sendInvoice(id: string, amount: number) { return await stripe.createInvoice(id, amount) }, { sideEffect: true, compensate: cancelInvoice } ) // Use normally, Fuze protects automatically const results = await search('quarterly revenue') ``` ```python from fuze_ai import guard @guard def search(query: str): return vector_db.search(query) @guard(side_effect=True, compensate=cancel_invoice) def send_invoice(customer_id: str, amount: float): return stripe.create_invoice(customer_id, amount) results = search("quarterly revenue") ``` ## What happens automatically When you wrap a function with `guard()`, Fuze: 1. Tracks iterations and counts how many times the function is called within a run 2. Detects loops by hashing tool+args to catch identical repeated calls 3. Monitors budget against configured cost ceilings 4. Extracts actual token usage from LLM responses automatically 5. Records traces with timestamps, cost, and results for every call 6. Applies any per-tool config set from the Fuze dashboard (retries, budget, timeout) ## Multi-step runs with `createRun()` Use `createRun()` to group multiple tool calls into a single tracked run with shared budget and loop detection: ```typescript import { createRun } from 'fuze-ai' const run = createRun('research-agent', { maxCostPerRun: 5.00, maxIterations: 50 }) const search = run.guard(async function search(q: string) { return vectorDb.search(q) }) const summarize = run.guard(async function summarize(docs: string[]) { return llm.summarize(docs) }) const docs = await search('quarterly revenue') const summary = await summarize(docs) const { totalCost } = run.getStatus() // { totalCost, totalTokensIn, totalTokensOut, stepCount } // run.end() is optional, runs without an explicit end are treated as ongoing conversations await run.end() ``` ```python from fuze_ai import create_run run = create_run("research-agent", max_cost_per_run=5.00, max_iterations=50) @run.guard def search(q: str): return vector_db.search(q) @run.guard def summarize(docs: list): return llm.summarize(docs) docs = search("quarterly revenue") summary = summarize(docs) ``` ## MCP Proxy Wrap any MCP server with Fuze protection, no code changes needed: ```bash npx fuze-ai proxy -- # Example: protect a Postgres MCP server npx fuze-ai proxy -- npx @modelcontextprotocol/server-postgres ``` The proxy intercepts every `tools/call` request, applies budget checks and loop detection, then forwards to the real server. See [MCP Proxy](/docs/mcp-proxy) for full details. ## Add configuration Create a `fuze.toml` in your project root: ```toml [defaults] max_retries = 3 timeout = "30s" max_cost_per_run = 10.00 max_iterations = 25 [cloud] api_key = "" # or set FUZE_API_KEY env var [project] project_id = "my-agent" ``` See [Configuration](/docs/configuration) for the full reference. ## Framework adapters Fuze works with any framework. See the adapter guides: - [LangGraph](/docs/adapters/langgraph) - [CrewAI](/docs/adapters/crewai) - [Raw SDK](/docs/adapters/raw-sdk) - [MCP Proxy](/docs/mcp-proxy) --- ## Side-Effect Tracking Source: https://fuze-ai.tech/docs/side-effects/ Mark which tools change the real world so retries and rollback know what to undo. Without this distinction, you either retry everything (duplicate writes) or retry nothing (lost progress). ## Marking side-effects ```typescript // Read, safe to retry const search = guard(searchFn) // Changes the world, needs special handling const sendInvoice = guard(invoiceFn, { sideEffect: true, compensate: cancelInvoice, }) ``` ```python from fuze_ai import guard # Read, safe to retry @guard def search(query: str): return vector_db.search(query) # Changes the world, needs special handling @guard(side_effect=True, compensate=cancel_invoice) def send_invoice(customer_id: str, amount: float): return stripe.create_invoice(customer_id, amount) ``` ## Why it matters When an agent run fails at step 5, Fuze needs to know: - Steps 1-3 were reads, safe to ignore on rollback - Step 4 sent an invoice, must call compensation function - Step 5 failed, this is where we are Without side-effect tracking, you either retry everything (duplicate invoices) or retry nothing (lost progress). ## Compensation functions A compensation function undoes a side-effect: ```typescript const sendInvoice = guard( async function sendInvoice(customerId: string, amount: number) { return await stripe.createInvoice(customerId, amount) }, { sideEffect: true, compensate: async (result) => { // result is the return value of the original call await stripe.voidInvoice(result.invoiceId) }, } ) ``` ```python def cancel_invoice(result): stripe.void_invoice(result["invoice_id"]) @guard(side_effect=True, compensate=cancel_invoice) def send_invoice(customer_id: str, amount: float): # result is passed to cancel_invoice on rollback return stripe.create_invoice(customer_id, amount) ``` On rollback, Fuze calls compensation functions in **reverse order**, last side-effect first. ## Compensation timestamp accuracy When a compensation handler runs, Fuze captures `compensationEndedAt` **after** the handler completes (not before). This ensures the timestamp accurately reflects when the compensation finished, which is important for audit trails and SLA tracking. ## Compensation hash chain verification Every compensation record is included in the audit hash chain. You can verify the integrity of the full chain, including compensation entries, by calling `verifyHashChain()`. This ensures that no records have been tampered with or inserted after the fact. ```typescript const result = await sideEffectLog.verifyHashChain() // result includes both original side-effect records and compensation records ``` ## Idempotency keys Fuze generates idempotency keys for side-effect calls. If the same call is retried with the same arguments, the key ensures no duplication. Idempotency keys are automatically cleaned up when you call `purgeOlderThan()` as part of data retention. This prevents stale keys from accumulating over time. ## Rollback flow ``` Step 1: search(query) → read, skip Step 2: search(refined) → read, skip Step 3: send_invoice(cust, $) → SIDE EFFECT → call compensate() Step 4: send_email(to, body) → SIDE EFFECT → call compensate() Step 5: update_db(record) → FAILED HERE ``` Fuze walks back from step 4 to step 3, calling each compensation function in reverse. ## Non-compensable side-effects If a side-effect has no compensation function, Fuze logs it with status `no_compensation` and sets `escalated: true`. The incident is recorded in the audit trail for human review. ## Data retention The `purgeOlderThan()` method cleans up old records including side-effect entries, compensation records, and idempotency keys. Both compensation records and side-effect records are included in `verifyHashChain()`, so you should verify chain integrity before purging if you need a final audit check. --- ## Tools & Remote Config Source: https://fuze-ai.tech/docs/tools/ Register tool metadata once at startup, then tune retries, budgets, and timeouts per tool from the dashboard. Changes propagate to the SDK within 30 seconds without redeployment. ## Register at startup ```typescript import { configure, registerTools } from 'fuze-ai' configure({ cloud: { apiKey: process.env.FUZE_API_KEY }, project: { projectId: 'my-agent' }, }) registerTools([ { name: 'search', description: 'Vector database search over company docs', sideEffect: false, defaults: { maxRetries: 3, maxBudget: 0.10, timeout: 10_000 }, }, { name: 'sendEmail', description: 'Send transactional email via SES', sideEffect: true, defaults: { maxRetries: 1, maxBudget: 0.02, timeout: 8_000 }, }, ]) ``` ```python from fuze_ai import configure, register_tools import os configure(cloud={"api_key": os.getenv("FUZE_API_KEY")}, project={"project_id": "my-agent"}) register_tools([ { "name": "search", "description": "Vector database search", "side_effect": False, "defaults": {"max_retries": 3, "max_budget": 0.10, "timeout": 10_000}, }, { "name": "send_email", "description": "Send transactional email", "side_effect": True, "defaults": {"max_retries": 1, "max_budget": 0.02, "timeout": 8_000}, }, ]) ``` Call this once during application startup, before any agents run. If no API key is configured, `registerTools()` is a no-op, nothing breaks. ## Tool name matching The `name` in `registerTools()` must match the **function name** that `guard()` wraps. Fuze uses `fn.name` automatically: ```typescript // ✅ name matches, remote config applies registerTools([{ name: 'search', ... }]) const search = guard(async function search(q: string) { ... }) // ✅ also works with named arrow functions assigned to const const search = guard(search_impl) // fn.name = 'search_impl', register as 'search_impl' // ❌ anonymous function, remote config won't apply (no name to match) const search = guard(async (q: string) => { ... }) ``` ## `registerTools()` API ```typescript registerTools(tools: ToolRegistration[]): void ``` | Field | Type | Required | Description | |---|---|---|---| | `name` | `string` | ✅ | Function name, must match `fn.name` in `guard()` | | `description` | `string` |, | Human-readable description shown in the dashboard | | `schema` | `object` |, | JSON Schema of the function's parameters | | `sideEffect` | `boolean` | ✅ | Whether the tool modifies external state | | `defaults.maxRetries` | `number` | ✅ | Default retry count (editable from dashboard) | | `defaults.maxBudget` | `number` | ✅ | Default USD budget cap per call (editable from dashboard) | | `defaults.timeout` | `number` | ✅ | Default timeout in ms (editable from dashboard) | ## How remote config works ``` 1. SDK starts → registerTools() → POST /v1/tools/register Creates default tool_configs in the database if absent 2. SDK starts → GET /v1/tools/config → populates in-memory config cache Refreshed automatically every 30 seconds 3. Dashboard user edits tool config → PUT /api/tools/:name/config Written to database immediately 4. Within 30 seconds → SDK's background refresh picks up new config No restart needed 5. guard(fn) is called → getToolConfig(fn.name) reads from cache synchronously Zero added latency on the execution hot path ``` ## Override precedence The tighter limit always wins: | Level | Applied | |---|---| | `guard(fn, { maxCost: X })` | Highest, never overridden remotely | | Dashboard `maxBudget` | Takes minimum of local and remote | | Dashboard `maxRetries` / `timeout` | Replaces local values entirely | | `defaults` in `registerTools()` | Baseline, used when dashboard hasn't been configured | | `fuze.toml` / `configure()` | Global baseline | ## Disabling a tool remotely Set `enabled: false` for a tool in the dashboard. Any call to that tool will immediately throw a `FuzeError`, no LLM request is made, and the error propagates to your agent for handling: ```typescript import { FuzeError } from 'fuze-ai' try { await search('query') } catch (err) { if (err instanceof FuzeError) { // Tool was disabled from the dashboard console.log(err.message) // "Tool 'search' is disabled via remote configuration" } } ``` This is useful for emergency stops: if a tool is causing unexpected costs or failures, disable it from the dashboard without a deployment. ## Without an API key `registerTools()` is always safe to call. When no `FUZE_API_KEY` is set: - The call is a no-op, no network request is made - `guard()` uses local defaults from `fuze.toml` and per-function options - All protection logic (budget, loop detection, tracing) works exactly as normal - Dashboard tool configuration simply isn't available # Compliance --- ## Compliance Matrix Source: https://fuze-ai.tech/docs/compliance/compliance-matrix/ > **Disclaimer:** This matrix describes what Fuze supports today. Claims labelled "Partial" or "Not implemented" are on the roadmap. Labels updated: 2026-04-19. Article-by-article mapping of EU AI Act requirements to Fuze features. **Coverage legend:** Covered = Fuze directly satisfies the requirement | Partial = Fuze addresses part of the requirement; gaps noted | Not implemented = not yet available | Outside scope = provider/deployer responsibility, no runtime tool can address it ## High-Risk System Requirements (Art. 8-15) | Article | Description | Coverage | Notes | |---|---|---|---| | Art. 8 | Compliance with requirements | Outside scope | Organisational responsibility | | Art. 9 | Risk Management System | Partial | Risk questionnaire + evidence upload in dashboard; no automated risk-tracking loop yet | | Art. 10 | Data and Data Governance | Outside scope | Deployer responsibility | | Art. 11 | Technical Documentation | Partial | Annex IV export (PDF) available; model cards not auto-generated | | Art. 12 | Automatic Logging | Covered | Full JSONL trace per guarded call; HMAC hash chain (Python); TS hash chain on roadmap | | Art. 13 | Transparency to Deployers | Partial | Trace replay with full decision context; model cards not auto-generated | | Art. 14 | Human Oversight | Partial | Kill switch via dashboard and CLI; approval gates not yet implemented | | Art. 15 | Robustness | Covered | Loop detection (iteration cap, hash dedup, stalled progress), side-effect compensation with LIFO rollback, token/step/wall-clock limits | ## Provider and Deployer Obligations (Art. 16-27) | Article | Description | Coverage | Notes | |---|---|---|---| | Art. 16 | Provider Obligations | Outside scope | Organisational responsibility | | Art. 17 | Quality Management System | Outside scope | Organisational responsibility | | Art. 18 | Documentation Keeping | Partial | Annex IV export covers technical documentation; QMS records are deployer responsibility | | Art. 19 | Auto-Generated Logs | Covered (Python) / Partial (TS) | Python: append-only store + HMAC hash chain; TS: append-only store, no hash chain yet | | Art. 20 | Corrective Actions | Partial | Guard events and traces surface issues; corrective workflow is deployer responsibility | | Art. 26 | Deployer Monitoring Obligations | Partial | Dashboard provides runs list, agent health, trace replay; audit log of dashboard actions not yet implemented | | Art. 27 | Fundamental Rights Impact Assessment | Partial | FRIA builder is on the Pro tier roadmap; not yet available | ## Post-Market and Incident Reporting | Article | Description | Coverage | Notes | |---|---|---|---| | Art. 72 | Post-Market Monitoring | Partial | Runtime metrics collected (tokens, steps, latency, guard-event rate); automated drift detection not implemented | | Art. 73 | Serious Incident Reporting | Not implemented | Roadmap, no automated 72h/15d filing; manual process required | ## GPAI Transparency | Article | Description | Coverage | Notes | |---|---|---|---| | Art. 50 | Transparency for GPAI outputs (chatbots, deepfakes) | Not implemented | Disclosure system not yet designed or built; roadmap | ## Summary | Status | Articles | |---|---| | Covered | Art. 12, Art. 15, Art. 19 (Python) | | Partial | Art. 9, Art. 11, Art. 13, Art. 14, Art. 18, Art. 19 (TS), Art. 20, Art. 26, Art. 27, Art. 72 | | Not implemented | Art. 50, Art. 73 | | Outside scope | Art. 8, Art. 10, Art. 16, Art. 17 | ## Art. 12 in detail What Fuze logs for every guarded function call: | Data Point | Source | |---|---| | Start/end timestamps (ISO 8601) | Every `@guard` call | | Agent identity | agent_id, version, model, provider | | Tool call details | Name, args hash (raw opt-in via `log_pii`), result summary | | Token counts | Tokens in/out extracted from LLM response; USD estimate where pricing table available | | Guard decisions | proceed, loop_detected, limit_exceeded | | Human oversight events | Who intervened, what they decided | | Side-effect status | Real-world write flag, compensation status | All Python records: **append-only**, **HMAC-SHA256 hash-chained**, queryable, exportable (JSON, CSV, PDF). TypeScript records: **append-only**; hash chain on roadmap. Configurable retention minimum 6 months. ## Art. 14 in detail | Requirement | Status | Notes | |---|---|---| | Understand capabilities, monitor operation | Covered | Dashboard with live runs, agent health, trace replay | | Correctly interpret output | Covered | Trace replay with full decision context | | Decide not to use output | Covered | Override capability via dashboard | | Intervene or interrupt (stop button) | Covered | Kill switch: dashboard and CLI | | Approval gates before agent proceeds | Not implemented | Roadmap | --- ## EU AI Act Overview Source: https://fuze-ai.tech/docs/compliance/overview/ The EU AI Act enters full enforcement on **August 2, 2026**, with a maximum penalty of **35M EUR or 7% of global annual turnover**. Fuze directly covers 8 articles and assists with 6 more. ## Why this matters for agents AI agents that make autonomous decisions, especially those that interact with external systems, process personal data, or operate in regulated industries, may be classified as **high-risk AI systems** under the Act. High-risk systems must comply with Articles 8-27, which include requirements for: - Automatic event logging (Art. 12) - Human oversight mechanisms (Art. 14) - Robustness and fault resilience (Art. 15) - Post-market monitoring (Art. 72) - Incident reporting within 72 hours (Art. 73) ## What Fuze provides ### Art. 12, Record-Keeping Fuze's `TraceRecorder` and `AuditStore` automatically log every guarded function call: - Timestamps (start and end, ISO 8601) - Agent identity (agent_id, version, model, provider) - Tool call details (name, arguments hash, result summary) - Cost (tokens in/out, USD) - Guard decisions (proceed, loop_detected, budget_exceeded) - Human oversight events All records are **append-only** with **hash chain** for tamper detection. Minimum **6-month retention** (configurable). ### Art. 14, Human Oversight The Act literally requires a stop button. Fuze provides: - Kill switch, dashboard, CLI, TUI - Approval gates, pause and wait for human decision - Anomaly alerts, notify humans when something looks wrong - Override capability, humans can override any Fuze decision ### Art. 15, Robustness - Smart recovery (retry, rollback, fork, escalate) - Loop detection prevents stuck agents - Budget enforcement prevents resource exhaustion - Side-effect tracking prevents duplicate actions ### Art. 72, Post-Market Monitoring Continuous runtime monitoring with agent health scores, cost trend analysis, failure pattern detection, and performance tracking over time. ### Art. 73, Incident Reporting Fuze generates structured incident reports containing: system identification, full trace, timeline of events, actions taken, side-effects, and compensation status. ## Enabling compliance mode ```toml [compliance] enabled = true risk_level = "high" # "minimal", "limited", or "high" log_pii = false # Keep false unless you have GDPR lawful basis ``` ## Next steps See the [full compliance matrix](/docs/compliance/compliance-matrix) for article-by-article coverage details.