Every time the industry figures out how to serve tokens faster and cheaper, the appetite grows to match. Models get bigger, contexts get longer, agents start chaining thousands of calls together. The finish line keeps moving. This talk is a technical tour through everything the industry has done to keep up, led by two experts in high-performance inference. We'll start with the optimizations that made hardware work harder without changing the underlying architecture. Then we'll go up a level with techniques that work smarter across requests and across the model itself. And finally, a peek into the future with heterogeneous disaggregated inference, the architectural shift that splits prefill and decode across specialized hardware, and even more advanced forms of hardware specialization coming your way soon. Token demand is about to get a lot more insatiable. Let's see what the future has in store for us! Speakers: Daniel Kim — Cerebras Systems; Natalie Serrino — Gimlet Labs.
Inference sessions at AI Engineer World's Fair 2026 in San Francisco.
Thursday, July 2, 2026
11:40 AM - 12:00 PM·20m
Leadership 1 · Room 3016
Capacity: 550 attendees
Sign in to add this talk to your schedule.

Daniel Kim
Head of Growth
Cerebras Systems
@learnwdaniel
Head of Growth at Cerebras Systems; previously led Developer Relations at New Relic; based in San Francisco.

Natalie Serrino
Gimlet Labs
Natalie Serrino is speaking at AI Engineer World's Fair 2026.