All posts

28 March 2026

The $1.6M Weekend

A single retry loop. One weekend. What we learned about AI agent safety the hard way.

incidents
agents
safety

It started with a contract review agent. An MCP server wrapping a legal document API. Standard setup — LangGraph, GPT-4o, a handful of tools.

The agent hit an edge case on a Friday afternoon. A malformed API response triggered a retry loop. The framework's max_iterations was set to 50, but the agent was clever — it varied its queries just enough to avoid the cap while still hitting the same failing endpoint.

The numbers

By Monday morning:

  • 4,847 API calls to the legal document provider (billed at $330/call for enterprise tier)
  • 2.3M tokens consumed across GPT-4o
  • $1.6M in combined API and compute costs
  • One very uncomfortable board meeting

What went wrong

The framework did its job. It saved state. It managed the graph. It even logged the iterations.

What it didn't do:

  1. Detect the loop — The agent varied its queries enough to stay under max_iterations, but a cost velocity check would have caught it in minutes
  2. Track the side-effects — Every API call was billed. There was no mechanism to know which calls had financial consequences
  3. Enforce a budget — There was no ceiling on total spend. The agent had an unlimited credit card
  4. Alert anyone — The team found out Monday morning from an AWS billing alert, not from the agent framework

The fix

import { guard } from 'fuze-ai'

const reviewContract = guard(
  async (contractId: string) => {
    return await legalApi.review(contractId)
  },
  {
    maxCost: 50.00,      // $50 ceiling per call
    sideEffect: true,    // This costs money
    compensate: cancelReview,
  }
)

With Fuze, this incident would have been caught at multiple layers:

  • Layer 2 (hash dedup): Similar queries flagged within the first 10 calls
  • Layer 3 (cost velocity): $330/call rate flagged within 5 minutes
  • Budget enforcement: $50 ceiling would have stopped the run early
  • Daemon alerts: Team notified via webhook within minutes, not days

The broader problem

This isn't an isolated incident:

  • $400M in collective Fortune 500 unbudgeted cloud spend from runaway agents in 2025
  • $67.4B in global losses attributed to AI hallucinations
  • 80% failure rate for multi-step agent workflows at 85% per-step accuracy

Every major framework — LangGraph, CrewAI, Google ADK — provides max_iterations and little else. State management is excellent. Failure management is nonexistent.

What we built

Fuze is the missing safety layer. It wraps your existing framework with:

  • 5-layer loop detection that catches patterns max_iterations misses
  • Budget enforcement that kills runs before they overspend
  • Side-effect tracking that knows what's safe to retry
  • EU AI Act compliance because enforcement is months away

One decorator. No framework migration. No infrastructure required.

npm install fuze-ai

Your agents are running. Make sure they're safe.