Too many voice AI integrations still treat speech as fancier chat: audio in, audio out. But we're at a point where speech can be a control plane for software, and most developers are unaware that voice has become a capability overhang. Current realtime models can understand intent, call tools, speak while work is underway, recover from corrections, and decide what the user actually needs to hear. As a result, we're seeing three practical patterns emerge: voice-to-action, systems-to-voice, and voice-to-voice. We’ll show how each pattern changes the architecture, where Realtime 2’s reasoning and tool-calling matter, and why chained STT / LLM / TTS systems start to break down as the interaction patterns become richer.
Voice & Realtime AI sessions at AI Engineer World's Fair 2026 in San Francisco.
Tuesday, June 30, 2026
11:40 AM - 12:00 PM·20m
Track 6 · Room 2014
Capacity: 250 attendees
Sign in to add this talk to your schedule.

Charlie Guo
Founder & Author
OpenAI
@charlierguo
Charlie Guo works on developer experience at OpenAI, where he helps developers build with the latest AI models and capabilities. He is also the author of *Artificial Ignorance*, a leading AI newsletter read by employees across Silicon Valley.