Midam Kim

Senior Linguist and ML Engineer

ServiceNow

Midam Kim is a Senior Linguist and ML Engineer at ServiceNow, where she builds and evaluates a multilingual voice AI platform spanning a dozen languages. She holds a PhD in Linguistics from Northwestern University and teaches at Fisher College of Business, Ohio State University. Her work sits at the rare intersection of production ML engineering and speech science—translating decades of linguistic research into the engineering decisions voice AI teams are making right now.

Sessions (1)

"My name is... my name is...": A Linguistic Map for Building and Debugging Voice Agents

3:20 PM·Track 6 · Room 2014

Every voice AI engineer has heard it: a caller repeating their name three times, getting more frustrated with each attempt. The logs look clean. Confidence scores look fine. The system looks like it's working, but it isn't. Building a voice agent today means chasing answers across a dozen scattered sources, ASR, TTS, turn-taking, prompts, and LLMs, without a comprehensive map of the thing you're actually building: a conversation. That map exists, and it's called linguistics. With it, the scattered pieces fall into their right spots and order, and you stop patching components and start conducting the orchestra. The map starts with a simple model. Every conversation runs on two channels: the form (sounds, words, syntax, turns) and the meaning (the task, the situation, the feeling). Users keep both channels aligned, with each other and with their partner's, continuously and without thinking. Your job is to build an agent that does the same: keeping its form and meaning channels aligned, with each other, and with the user's, constantly and seamlessly. The map survives every architecture shift, cascaded or speech-to-speech, because it describes the conversation, not the implementation. From it, you get both halves of the job: design questions for build time, and a matrix that turns "the agent just didn't get it" into concrete, debuggable failure modes. You'll leave with the map, the questions, and an open-source evaluation framework to run them with. Who this is for: voice AI engineers, ML practitioners on voice pipelines, and anyone who's watched clean logs while their agent quietly fails real users.

Voice & Realtime AIintermediatetalk