Standardizing performance benchmarks for production-grade Large Language Models is currently a significant challenge across the industry. Conflicting data is prevalent, whether originating from server developers like vLLM and SGLang or from various analysts and competitive benchmarks, and these results often fail to hold up under real-world conditions. Our research into these inconsistencies identified several critical factors, including the constraints of single-process tools, specifically the Python Global Interpreter Lock (GIL) and the nuances of model-level settings like temperature. Furthermore, a lack of transparency regarding load generation parameters such as QPS and concurrency, paired with insufficient observability into the benchmarking clients themselves, contributes to these disparate outcomes. In this talk, we share key lessons learned from our benchmarking efforts, examining the primary pitfalls that distort performance data and offering strategies for mitigation. Additionally, we will introduce Inference Perf, an open-source, multi-process utility we developed to provide reliable stress-testing for production stacks. Our goal is to promote standardized, real-world benchmarking practices that allow the community to move beyond unreliable data. Join us to discover how to accurately measure, optimize, and report LLM performance with certainty. Speakers: Ashok Chandrasekar — Google; Jason Kramberger — Google.
Inference sessions at AI Engineer World's Fair 2026 in San Francisco.
Thursday, July 2, 2026
11:40 AM - 12:00 PM·20m
Track 9 · Room 2016
Capacity: 250 attendees
Sign in to add this talk to your schedule.

Ashok Chandrasekar
Ashok Chandrasekar is a Staff Software Engineer at Google working on AI Inference performance evaluation and optimization for Google Kubernetes Engine. He is a project lead and maintainer of Inference Perf and co-lead of SIG Benchmarking in the llm-d project. He holds a Master's degree from Carnegie Mellon University. Previously, he was a Staff Engineer at VMware. His interests lie in Distributed Systems with his current focus being Systems for AI/ML applications.

Jason Kramberger
Jason Kramberger is speaking at AI Engineer World's Fair 2026.