AI Engineer World's Fair 2026

The Art of Building Verifiers for Computer Use Agents

TalkIntermediate

Every team building browser agents has the same problem: you can't trust your own evals. Browser tasks are too open-ended for deterministic checks, so teams use LLM verifiers as judges, and the judges are wrong constantly. WebVoyager misses 45% of failures. WebJudge misses 22%. Used as RL reward, you're not training a better agent, you're training a more confident liar. This talk walks through the Universal Verifier, open-sourced with Microsoft Research: false positive rate near zero, Cohen's kappa matching human-human agreement. Four design principles, one open benchmark, and an honest account of where auto-research worked and where it plateaued. Speakers: Miguel González Fernández — Browserbase; Corby Rosset — Microsoft Research.

About the Expo Stage 1 Track

Expo Stage 1 sessions at AI Engineer World's Fair 2026 in San Francisco.

The Art of Building Verifiers for Computer Use Agents

About the Expo Stage 1 Track

When

Where

Speakers (2)

The Art of Building Verifiers for Computer Use Agents

About the Expo Stage 1 Track

When

Where

Speakers (2)