Agents have changed the economics of AI inference. A chatbot’s cost scales roughly linearly with the number of requests; an agent’s scales multiplicatively. A single task can fan out into hundreds of model calls, each carrying a repeated context prefix and adding latency that compounds across tool calls and reasoning steps. As open-weight models keep improving and agentic workloads grow, this shift exposes the limits of traditional request-level optimization. Inference infrastructure becomes a first-class concern, one that often shapes performance and cost as much as the model itself. In this talk, we explore what changes when you optimize for the whole task rather than the individual request, and how FriendliAI is rethinking the inference cloud for the era of agentic AI.
Inference sessions at AI Engineer World's Fair 2026 in San Francisco.
Thursday, July 2, 2026
3:45 PM - 4:05 PM·20m
Track 9 · Room 2016
Capacity: 250 attendees
Sign in to add this talk to your schedule.

Byung-Gon (Gon) Chun
Founder & CEO
FriendliAI
Founder and CEO of FriendliAI, an AI infrastructure company focused on efficient deployment and scaling of large language and multimodal models. Previously served as a professor at Seoul National University and held research roles at Facebook, Microsoft, Yahoo!, and Intel.