Charles Frye

AI Engineer

Modal

AI Engineer at Modal Labs focused on AI infrastructure and inference workloads. Holds a PhD in neural network optimization from UC Berkeley and previously worked at Weights & Biases; has contributed to Full Stack Deep Learning / LLM Bootcamp educational initiatives.

Sessions (1)

What is an Inference Engine, Anyway?

11:05 AM·Track 8 · Room 2020

To run state-of-the-art inference yourself, you must master the inference engine: vLLM, SGLang, TRT-LLM, or your own jawn. The inference engine manages the lifecycle of an inference request, from input to output. In this workshop, we'll examine the architecture of modern high performance inference engines, the key techniques that inference engines need to deliver that performance, and the traces and metrics that inference engines emit.

Workshops Day 1advancedworkshop