Daniel Kim

Head of Growth

Cerebras Systems

Head of Growth at Cerebras Systems; previously led Developer Relations at New Relic; based in San Francisco.

Sessions (1)

All the Things We Have to Do to Satisfy Your Insatiable Need for Tokens

Every time the industry figures out how to serve tokens faster and cheaper, the appetite grows to match. Models get bigger, contexts get longer, agents start chaining thousands of calls together. The finish line keeps moving. This talk is a technical tour through everything the industry has done to keep up, led by two experts in high-performance inference. We'll start with the optimizations that made hardware work harder without changing the underlying architecture. Then we'll go up a level with techniques that work smarter across requests and across the model itself. And finally, a peek into the future with heterogeneous disaggregated inference, the architectural shift that splits prefill and decode across specialized hardware, and even more advanced forms of hardware specialization coming your way soon. Token demand is about to get a lot more insatiable. Let's see what the future has in store for us! Speakers: Daniel Kim — Cerebras Systems; Natalie Serrino — Gimlet Labs.

Inferenceintermediatetalk