12:05 PM·Track 9 · Room 2016
The future of AI inference is not one-size-fits-all. This talk explores a multi-tiered architecture that supports the full AI lifecycle, from rapid, pay-per-token experimentation to dedicated, SLO-bound production and extreme-scale, self-managed deployments. Learn about lessons learned from CoreWeave’s inference stack as performance, cost, and control requirements evolve.
Speakers: Rita Zhang — Coreweave; Sitanshu Gupta — Coreweave.
Inferenceintermediatetalk