To run state-of-the-art inference yourself, you must master the inference engine: vLLM, SGLang, TRT-LLM, or your own jawn. The inference engine manages the lifecycle of an inference request, from input to output. In this workshop, we'll examine the architecture of modern high performance inference engines, the key techniques that inference engines need to deliver that performance, and the traces and metrics that inference engines emit.
Workshops Day 1 sessions at AI Engineer World's Fair 2026 in San Francisco.
Monday, June 29, 2026
11:05 AM - 12:05 PM·1h
Track 8 · Room 2020
Capacity: 250 attendees
Sign in to add this talk to your schedule.

Charles Frye
AI Engineer
Modal
AI Engineer at Modal Labs focused on AI infrastructure and inference workloads. Holds a PhD in neural network optimization from UC Berkeley and previously worked at Weights & Biases; has contributed to Full Stack Deep Learning / LLM Bootcamp educational initiatives.