Expo Stage 2

Designing Evals That Earn User Trust

TalkIntermediate

Most teams measure their agent against a benchmark, ship it, and hope. But when your agent serves real users, a benchmark won't tell you if it's actually working. This session is about building an eval suite that captures what success looks like in production, runs against real user workflows, and feeds back into product decisions. Here's the flywheel we use in practice: start with what success looks like from the user's perspective, instrument production workflows to capture those signals, diagnose where the agent falls short, and feed those insights into the next thing you build. You'll see how it shaped concrete product bets, turning eval results from a report card into a discovery tool.

About the Expo Stage 2 Track

Expo Stage 2 sessions at AI Engineer World's Fair 2026 in San Francisco.

When

Thursday, July 2, 2026

2:50 PM - 3:10 PM·20m

Where

Expo Stage 2

Capacity: 250 attendees

Speaker

Felipe Blanes

Amazon

Felipe Blanes is speaking at AI Engineer World's Fair 2026.

AI Engineer World's Fair 2026