Eval suite
@fuze-ai/agent-eval runs cases against an agent definition, capturing pass/fail plus the full evidence stream. It is the regression-test surface for compliance behavior.
What you'll build: a runnable eval harness with cases that prove the loop fail-stops on the right signals. Prerequisites: Verifying evidence, eval inspects the same span stream. Next: EU Sovereign tier for production deployment.
Install
Define cases
Create eval/cases.ts:
Run the suite
The outDir receives one subdirectory per case with the full evidence stream and the result. Failures include the span that violated the expectation and the surrounding context.
CI integration
Wire this into CI as a separate job from unit tests. A failing eval is a regression on agent behavior, not on code shape.
What to put in the eval suite
Promote the bypass tests from @fuze-ai/agent's test suite:
lawful-basis-mismatch, tools and basis disagreepolicy-engine-error, Cerbos throwsreplay-attack, resume token reusetampered-evidence, byte flip in recordsin-process-multi-tenant, tenant isolationdynamic-tool-no-metadata, unmetadated toolsecret-in-args,SecretRefplaintext leak
Each maps to a regulatory obligation. A failure is the signal for an Art. 33/34 incident review.