AI Architects: AI Factories

Why your LLM is slow and expensive: lessons learned from running models in production

TalkIntermediate

Many LLM deployment conversations focus on models, benchmarks, and prompting, but the hardest problems actually start after the model works. In this session, Senior Director of AI Software at NVIDIA and former CEO of CentML Gennady Pekhimenko and Gradient General Partner Zach Bratun-Glennon will explore the details and cutting edge of inference performance. They'll unpack what actually happens when you try to run large models in production, including lessons and patterns observed from real deployments, and what the next generation of compilers, frameworks and platform acceleration should look like to enable successful AI workloads.

About the AI Architects: AI Factories Track

AI Architects: AI Factories sessions at AI Engineer World's Fair 2026 in San Francisco.

When

Thursday, July 2, 2026

1:55 PM - 2:15 PM·20m

Where

Leadership 2 · Room 3020

Capacity: 550 attendees

Speaker

Zach Bratun-Glennon

General Partner

Gradient

@thezbg

General Partner at Gradient Ventures; invests in AI/ML, data science, vertical software, B2B marketplaces, fintech, and more. Prior to Gradient, led acquisitions and strategic investments for Google Cloud.

AI Engineer World's Fair 2026