All posts

28 March 2026

The $1.6M Weekend

A single retry loop. One weekend. What we learned about AI agent safety the hard way.

incidents
agents
safety

It started with a contract review agent. An MCP server wrapping a legal document API. Standard setup, LangGraph, GPT-4o, a handful of tools.

The agent hit an edge case on a Friday afternoon. A malformed API response triggered a retry loop. The framework's max_iterations was set to 50, but the agent was clever, it varied its queries just enough to avoid the cap while still hitting the same failing endpoint.

The numbers

By Monday morning:

  • 4,847 API calls to the legal document provider (billed at $330/call for enterprise tier)
  • 2.3M tokens consumed across GPT-4o
  • $1.6M in combined API and compute costs
  • One very uncomfortable board meeting

What went wrong

The framework did its job. It saved state. It managed the graph. It even logged the iterations.

What it didn't do:

  1. Detect the loop, The agent varied its queries enough to stay under max_iterations, but a cost velocity check would have caught it in minutes
  2. Track the side-effects, Every API call was billed. There was no mechanism to know which calls had financial consequences
  3. Enforce a budget, There was no ceiling on total spend. The agent had an unlimited credit card
  4. Alert anyone, The team found out Monday morning from an AWS billing alert, not from the agent framework

The fix

typescript
import { guard } from 'fuze-ai'

const reviewContract = guard(
  async (contractId: string) => {
    return await legalApi.review(contractId)
  },
  {
    maxCost: 50.00,      // $50 ceiling per call
    sideEffect: true,    // This costs money
    compensate: cancelReview,
  }
)

With Fuze, this incident would have been caught at multiple layers:

  • Layer 2 (hash dedup): Similar queries flagged within the first 10 calls
  • Layer 3 (cost velocity): $330/call rate flagged within 5 minutes
  • Budget enforcement: $50 ceiling would have stopped the run early
  • Daemon alerts: Team notified via webhook within minutes, not days

The broader problem

This isn't an isolated incident:

  • $400M in collective Fortune 500 unbudgeted cloud spend from runaway agents in 2025
  • $67.4B in global losses attributed to AI hallucinations
  • 80% failure rate for multi-step agent workflows at 85% per-step accuracy

Every major framework, LangGraph, CrewAI, Google ADK, provides max_iterations and little else. State management is excellent. Failure management is nonexistent.

What we built

Fuze is the missing safety layer. It wraps your existing framework with:

  • 5-layer loop detection that catches patterns max_iterations misses
  • Budget enforcement that kills runs before they overspend
  • Side-effect tracking that knows what's safe to retry
  • EU AI Act compliance because enforcement is months away

One decorator. No framework migration. No infrastructure required.

bash
npm install fuze-ai

Your agents are running. Make sure they're safe.