AI Engineer World's Fair 2026

Inference performance as a competitive advantage

TalkIntermediate

Most AI teams focus on model quality, but production success often comes down to inference performance. In this session, FriendliAI will explore the optimization techniques behind high-performance LLM serving, including continuous batching, speculative decoding, smart caching, and efficient GPU utilization. Learn how leading AI teams reduce infrastructure costs, improve latency, and scale inference workloads without sacrificing performance. We'll share practical insights and deployment strategies that separate experimental AI projects from production-grade systems.Whether you're an ML engineer, platform engineer, MLOps practitioner, or technical founder, you'll leave with a better understanding of how inference optimization can become a competitive advantage for your AI applications. Speakers: Alex Campos; Yunmo Koo.

About the Expo Stage 1 Track

Expo Stage 1 sessions at AI Engineer World's Fair 2026 in San Francisco.

Inference performance as a competitive advantage

About the Expo Stage 1 Track

When

Where

Speakers (2)

Inference performance as a competitive advantage

About the Expo Stage 1 Track

When

Where

Speakers (2)