28 March 2026
The $1.6M Weekend
A single retry loop. One weekend. What we learned about AI agent safety the hard way.
It started with a contract review agent. An MCP server wrapping a legal document API. Standard setup, LangGraph, GPT-4o, a handful of tools.
The agent hit an edge case on a Friday afternoon. A malformed API response triggered a retry loop. The framework's max_iterations was set to 50, but the agent was clever, it varied its queries just enough to avoid the cap while still hitting the same failing endpoint.
The numbers
By Monday morning:
- 4,847 API calls to the legal document provider (billed at $330/call for enterprise tier)
- 2.3M tokens consumed across GPT-4o
- $1.6M in combined API and compute costs
- One very uncomfortable board meeting
What went wrong
The framework did its job. It saved state. It managed the graph. It even logged the iterations.
What it didn't do:
- Detect the loop, The agent varied its queries enough to stay under
max_iterations, but a cost velocity check would have caught it in minutes - Track the side-effects, Every API call was billed. There was no mechanism to know which calls had financial consequences
- Enforce a budget, There was no ceiling on total spend. The agent had an unlimited credit card
- Alert anyone, The team found out Monday morning from an AWS billing alert, not from the agent framework
The fix
With Fuze, this incident would have been caught at multiple layers:
- Layer 2 (hash dedup): Similar queries flagged within the first 10 calls
- Layer 3 (cost velocity): $330/call rate flagged within 5 minutes
- Budget enforcement: $50 ceiling would have stopped the run early
- Daemon alerts: Team notified via webhook within minutes, not days
The broader problem
This isn't an isolated incident:
- $400M in collective Fortune 500 unbudgeted cloud spend from runaway agents in 2025
- $67.4B in global losses attributed to AI hallucinations
- 80% failure rate for multi-step agent workflows at 85% per-step accuracy
Every major framework, LangGraph, CrewAI, Google ADK, provides max_iterations and little else. State management is excellent. Failure management is nonexistent.
What we built
Fuze is the missing safety layer. It wraps your existing framework with:
- 5-layer loop detection that catches patterns
max_iterationsmisses - Budget enforcement that kills runs before they overspend
- Side-effect tracking that knows what's safe to retry
- EU AI Act compliance because enforcement is months away
One decorator. No framework migration. No infrastructure required.
Your agents are running. Make sure they're safe.