Vivek Raju Muppalla

VP of AI Engineering

Hippocratic AI

Vivek Raju Muppalla is VP of AI Engineering at Hippocratic AI, where he leads product engineering for healthcare agents, including systems behind AI Front Door, Nurse Co-Pilot, and 200M+ patient-agent interactions. His current work focuses on turning frontier AI into clinically safe, production-grade voice agents across real-time orchestration, evaluation, reliability, and patient-facing workflows. Vivek has spent over a decade building applied AI, ML, and large-scale production systems at Cohere, Scale AI, Unity Technologies, Amazon, Groupon, and Expedia. At Cohere, he served as VP of AI Engineering and Custom Models, helping launch GenAI applications across Fortune 500 enterprises and leading the development of Takane in partnership with Fujitsu, a high-performing Japanese large language model. At Scale AI, he helped start the synthetic data business, and at Unity he led AI and simulation product engineering, including Unity ML-Agents, Computer Vision, Robotics, and Simulation. His work has consistently focused on the hard part after the demo: making AI systems reliable, measurable, and useful in production.

Sessions (1)

200 Million Patient Interactions Later: What the Generic Voice Stack Misses

12:05 PM·Track 7 · Room 2024

A healthcare voice agent can be right on the benchmark and still fail in production. Real patients hesitate, interrupt, misremember medications, code-switch mid-sentence, and disclose risk indirectly. After **200M+ patient-agent interactions**, the lesson is clear: in clinical voice AI, interaction is a safety variable. This talk breaks down what Hippocratic AI had to rebuild beyond the generic voice stack: not just ASR, VAD, an LLM, TTS, and turn-taking heuristics, but a real-time safety system that treats silence, clarification, escalation, multilingual continuity, and medication-specific recognition as first-class engineering problems. We’ll walk through the production architecture behind Hippocratic AI’s voice agents: a **30+ model supervisor constellation**, including the **4.1T-parameter AI Front Door system**, designed to catch failures a single primary model misses. The talk covers how specialized models monitor medication identification, overdose risk, labs and vitals, escalation criteria, workflow confirmation, and other clinical safety surfaces while the patient conversation is still happening. We’ll focus on four production lessons: - **Benchmarks are not enough:** MedQA and USMLE-style accuracy do not capture the failure modes that appear in a 12-minute, multi-turn patient call. - **Interaction signals become training data:** pauses, interruptions, hesitation, clarification requests, and escalation markers are mined from production calls and turned into structured eval and training signals. - **One LLM is not a safety architecture:** supervisor models can overrule, block, or escalate when the primary model sounds plausible but misses a clinical risk. - **Voice infrastructure has clinical failure modes:** domain ASR, medication vocabulary, code-switching, latency, and turn-taking all affect whether the system makes the right next move.

AI in Healthcareintermediate