28 March 2026
The $1.6M Weekend
A single retry loop. One weekend. What we learned about AI agent safety the hard way.
It started with a contract review agent. An MCP server wrapping a legal document API. Standard setup — LangGraph, GPT-4o, a handful of tools.
The agent hit an edge case on a Friday afternoon. A malformed API response triggered a retry loop. The framework's max_iterations was set to 50, but the agent was clever — it varied its queries just enough to avoid the cap while still hitting the same failing endpoint.
The numbers
By Monday morning:
- 4,847 API calls to the legal document provider (billed at $330/call for enterprise tier)
- 2.3M tokens consumed across GPT-4o
- $1.6M in combined API and compute costs
- One very uncomfortable board meeting
What went wrong
The framework did its job. It saved state. It managed the graph. It even logged the iterations.
What it didn't do:
- Detect the loop — The agent varied its queries enough to stay under
max_iterations, but a cost velocity check would have caught it in minutes - Track the side-effects — Every API call was billed. There was no mechanism to know which calls had financial consequences
- Enforce a budget — There was no ceiling on total spend. The agent had an unlimited credit card
- Alert anyone — The team found out Monday morning from an AWS billing alert, not from the agent framework
The fix
import { guard } from 'fuze-ai'
const reviewContract = guard(
async (contractId: string) => {
return await legalApi.review(contractId)
},
{
maxCost: 50.00, // $50 ceiling per call
sideEffect: true, // This costs money
compensate: cancelReview,
}
)With Fuze, this incident would have been caught at multiple layers:
- Layer 2 (hash dedup): Similar queries flagged within the first 10 calls
- Layer 3 (cost velocity): $330/call rate flagged within 5 minutes
- Budget enforcement: $50 ceiling would have stopped the run early
- Daemon alerts: Team notified via webhook within minutes, not days
The broader problem
This isn't an isolated incident:
- $400M in collective Fortune 500 unbudgeted cloud spend from runaway agents in 2025
- $67.4B in global losses attributed to AI hallucinations
- 80% failure rate for multi-step agent workflows at 85% per-step accuracy
Every major framework — LangGraph, CrewAI, Google ADK — provides max_iterations and little else. State management is excellent. Failure management is nonexistent.
What we built
Fuze is the missing safety layer. It wraps your existing framework with:
- 5-layer loop detection that catches patterns
max_iterationsmisses - Budget enforcement that kills runs before they overspend
- Side-effect tracking that knows what's safe to retry
- EU AI Act compliance because enforcement is months away
One decorator. No framework migration. No infrastructure required.
npm install fuze-aiYour agents are running. Make sure they're safe.