The future of AI inference is not one-size-fits-all. This talk explores a multi-tiered architecture that supports the full AI lifecycle, from rapid, pay-per-token experimentation to dedicated, SLO-bound production and extreme-scale, self-managed deployments. Learn about lessons learned from CoreWeave’s inference stack as performance, cost, and control requirements evolve. Speakers: Rita Zhang — Coreweave; Sitanshu Gupta — Coreweave.
Inference sessions at AI Engineer World's Fair 2026 in San Francisco.
Thursday, July 2, 2026
12:05 PM - 12:25 PM·20m
Track 9 · Room 2016
Capacity: 250 attendees
Sign in to add this talk to your schedule.

Rita Zhang
Coreweave
Rita Zhang is speaking at AI Engineer World's Fair 2026.

Sitanshu Gupta
Coreweave
Sitanshu Gupta is speaking at AI Engineer World's Fair 2026.