AI Engineer World's Fair 2026

Two Bugs That Hid in Plain Sight: A vLLM Debugging Detective Story

TalkIntermediate

Your model generates gibberish. Once every thousand prompts. High confidence scores. No crashes. No warnings. We hit this twice while building Jamba models. First: A request gets misclassified during scheduling, loads stale state from a previous prompt cache slot, and confidently generates nonsense. Second: Logprob spikes during RL training that looked like training instability-until we noticed they tracked with rollout count, then with cache size. In this talk, we'll walk through both debugging journeys-the false starts, how we instrumented vLLM to thread request IDs through the forward pass, the search for variables that change failure structure rather than magnitude, and the lesson both share: distributed inference systems fail silently. No stack trace. No sanitizer warning. Just wrong answers with perfect confidence. You'll learn how to build comparison scripts that expose logprob divergence, force memory pressure to surface rare bugs, and shrink a distributed RL training mystery into a reproducible single-script failure. Walk away knowing how to debug vLLM when it lies to you quietly. Speakers: Asaf Gardin — AI21; Yuval Belfer — AI21 Labs.

About the Inference Track

Inference sessions at AI Engineer World's Fair 2026 in San Francisco.

Two Bugs That Hid in Plain Sight: A vLLM Debugging Detective Story

TalkIntermediate

About the Inference Track

Inference sessions at AI Engineer World's Fair 2026 in San Francisco.

Two Bugs That Hid in Plain Sight: A vLLM Debugging Detective Story

About the Inference Track

When

Where

Speakers (2)

Two Bugs That Hid in Plain Sight: A vLLM Debugging Detective Story

About the Inference Track

When

Where

Speakers (2)