Qianru Lao

Member of Technical Staff, Inference

OpenAI

Qianru Lao is a Member of Technical Staff on the Inference team at OpenAI, where she works on infrastructure for large-scale model serving. Previously, she contributed to the open-source Delta Lake project at Databricks and worked on distributed storage systems at Alibaba Cloud and infrastructure tooling at Google. She holds degrees in Computational Science and Engineering from Harvard and Computer Science from Sun Yat-sen University.

Sessions (1)

Routing LLM Inference in Production: From Engine Signals to Policy

11:10 AM·Track 9 · Room 2016

Production LLM apps need more than a fast model: they need an inference routing layer that can choose where each request should run as engines, capacity, latency, and geography cost change. This talk shares a generalized Inference Load Balancer (ILB) proxy/controller architecture. A low-latency proxy applies routing weights and request-path signals, while a controller computes source-cluster-to-engine weights from demand, capacity/performance profiles, replica state, and geography cost. We will cover the practical debugging patterns AI engineers need: reading engine signals, explaining why a request went to one backend instead of another, handling retries and load shedding, and keeping routing behavior observable without exposing OpenAI-specific internals or non-public metrics. Speakers: Qianru Lao — OpenAI; Lu Zhang — OpenAI.

Inferenceintermediatetalk