Many LLM deployment conversations focus on models, benchmarks, and prompting, but the hardest problems actually start after the model works. In this session, Senior Director of AI Software at NVIDIA and former CEO of CentML Gennady Pekhimenko and Gradient General Partner Zach Bratun-Glennon will explore the details and cutting edge of inference performance. They'll unpack what actually happens when you try to run large models in production, including lessons and patterns observed from real deployments, and what the next generation of compilers, frameworks and platform acceleration should look like to enable successful AI workloads.
AI Architects: AI Factories sessions at AI Engineer World's Fair 2026 in San Francisco.
Thursday, July 2, 2026
1:55 PM - 2:15 PM·20m
Leadership 2 · Room 3020
Capacity: 550 attendees
Sign in to add this talk to your schedule.

Zach Bratun-Glennon
General Partner
Gradient
@thezbg
General Partner at Gradient Ventures; invests in AI/ML, data science, vertical software, B2B marketplaces, fintech, and more. Prior to Gradient, led acquisitions and strategic investments for Google Cloud.