Expo Stage 3

Voice is the universal interface

TalkIntermediate

Language models give us the ability to create natural language, conversational, interfaces for computers. We are seeing a rapid shift among early adopters to using general language instead of traditional user interfaces for tasks like writing code and editing spreadsheets. Join the cofounders of Pipecat, Gradium, and Daily as we discuss the future of realtime voice and AI interfaces. Voice is the most efficient input mode for natural-language systems, and often the most efficient output mode, as well. But good voice interfaces require a very high degree of conversational facility, intelligence, task-specific reliability, and robustness to real-world realities like multiple speakers and background noise. There's a long history of voice interfaces in science fiction: Star Trek, Iron Man, Her. We'll use these depictions of computing possibilities as a jumping off point for talking about the ideal voice interface. How close are we to being able to build these interfaces with today's models, hardware, orchestration tooling, and UI libraries? What are the most promising research directions? What did the movies get wrong, now that we actually have experience building natural language, open-ended, voice systems? Speakers: Kwindla Kramer — Daily; Neil Zeghidour — Gradium.

About the Expo Stage 3 Track

Expo Stage 3 sessions at AI Engineer World's Fair 2026 in San Francisco.

When

Thursday, July 2, 2026

11:40 AM - 12:00 PM·20m

Where

Expo Stage 3

Capacity: 250 attendees

Speakers (2)

Kwindla Kramer

Works on Pipecat // ᓚᘏᗢ // CEO at Daily

Daily

@kwindla

Kwin works on large-scale WebRTC infrastructure at Daily. He is the originator of Pipecat, the widely used, open source, vendor neutral voice agent framework supported by NVIDIA, Google, AWS and used by hundreds of startups. Before co-fonding Daily, Kwin built the sci-fi user interfaces in Minority Report and Iron Man.

Neil Zeghidour

CEO

Gradium

@neilzegh

Neil Zeghidour is the co-founder and CEO of Gradium. Neil founded Gradium after a decade of building and leading frontier generative audio teams at Meta and Google DeepMind. Being frustrated by slow and brittle voice assistants , he built the engineering teams that developed the first neural audio codecs and introduced the first audio LLMs, such as AudioLM, at Google. He later created Kyutai to launch Moshi, the world's first real-time, full-duplex conversational AI , and Hibiki, the first simultaneous speech-to-speech translation system. Today, Gradium is focused on helping developers build natural, real-time voice agents by providing ultra-low latency streaming APIs that transition these breakthroughs from the research lab to production.

Voice is the universal interface

TalkIntermediate

About the Expo Stage 3 Track

Expo Stage 3 sessions at AI Engineer World's Fair 2026 in San Francisco.