Inference

What's New in Inference Engineering

TalkIntermediate

More than 30,000 engineers have learned the fundamentals of inference since Inference Engineering was published. But the field keeps accelerating, so it's time for the first public addendum to the book. The past four months have seen a renewed focus on training-dependent inference optimization across the "big three" performance techniques of speculation, caching, and quantization. This talk provides structured guidance for training DFlash and EAGLE 3 draft models to accelerate LLM decode, introduces the concept of KV compaction, and explains the hype behind TurboQuant.

About the Inference Track

Inference sessions at AI Engineer World's Fair 2026 in San Francisco.

When

Thursday, July 2, 2026

1:30 PM - 1:50 PM·20m

Where

Track 9 · Room 2016

Capacity: 250 attendees

Speaker

Philip Kiely

Developer Relations

Baseten

@philip_kiely

Philip Kiely leads Developer Relations at Baseten. Prior to joining Baseten in 2022, he worked across software engineering and technical writing for a variety of startups. Outside of work, you'll find Philip practicing martial arts, reading a new book, or cheering for his adopted bay area sports teams.

AI Engineer World's Fair 2026