It is Summer 2026 and the world is burning for token consumption—figuratively and literally. Accelerating frontier model capabilities increasingly allow agents to operate across long-running, highly parallelized tasks at exponential inference growth. In this talk, I explain how dynamic model routing—intelligently directing agent requests to the best-suited model at the best price—can reduce token costs by up to 90% while maintaining or improving accuracy. I walk through how routing works, when it doesn't, and why the world (and your agent) need routing to scale intelligence to infinity and beyond.
Sandbox & Platform Engineering sessions at AI Engineer World's Fair 2026 in San Francisco.
Wednesday, July 1, 2026
2:50 PM - 3:10 PM·20m
Leadership 2 · Room 3020
Capacity: 550 attendees
Sign in to add this talk to your schedule.

Tomás Hernando Kofman
Notdiamond
Tomás Hernando Kofman is speaking at AI Engineer World's Fair 2026.