Most natural language specifications have holes their authors didn't notice - and writing more rules tends to create more holes. I built Loophole to try a different approach: point adversarial agents at a spec until it stops breaking. You give the system a set of natural language principles. An AI drafts a formal codified version. Two adversarial agents go to work - one finds cases the code permits but the principles forbid, the other finds cases the code forbids but the principles allow. A judge agent patches the code when it can, but only if the fix doesn't contradict any prior ruling. When a contradiction can't be resolved, it escalates to you. Every decision becomes binding precedent, so the constraint space tightens round after round. I started with moral and legal reasoning as the demo, and on its own that's already interesting - it turns into a kind of game where you discover contradictions in your own beliefs that you didn't know were there. But the pattern generalizes well past that. The same loop works for company policies that need to survive contact with edge cases. For making chatbot system prompts adversarially robust. For stress-testing eval rubrics. And, taking the long view, for something like a smarter legislative process - where proposed laws get checked against the public's stated values before they pass, and the contradictions surface before they hit a courtroom. The talk walks through how the harness works, the design choices that matter (especially why precedent is the load-bearing piece), what kinds of specs it handles well, where it breaks, and what it would take to push it further. All code is open source.
Harness Engineering sessions at AI Engineer World's Fair 2026 in San Francisco.
Thursday, July 2, 2026
1:30 PM - 1:50 PM·20m
Main Stage
Capacity: 4000 attendees
Sign in to add this talk to your schedule.

Brendan Rappazzo
Machine learning researcher
Morgan Stanley
@brendanh0gan
Brendan Hogan is a machine learning research scientist in Morgan Stanley's ML Research group, where he works on LLM fine-tuning, reinforcement learning, and agentic workflows for frontier models. He holds a PhD in Computer Science from Cornell, where he worked with Carla Gomes on Computational Sustainability.