Agentic coding workloads demand long contexts, multi-turn conversations, and throughput at a scale that most inference engines weren't built for. TokenSpeed is a new open-source engine purpose-built for this regime, built collaboratively by NVIDIA DevTech, AMD Triton, Qwen Inference, Together AI, and others. In this 2-hour hands-on workshop, Together Inference Research Engineers and a TokenSpeed co-creator will cover TokenSpeed architecture, deploying your first model, optimizing for agentic workloads, kernel and hardware tuning, and throughput/latency trade-offs. Speakers: Zain Hasan — Together AI; Yubo Wang — Together AI; Qingyang Wu — Together AI; Jue Wang — Together AI.
Workshops Day 1 sessions at AI Engineer World's Fair 2026 in San Francisco.
Monday, June 29, 2026
9:00 AM - 11:00 AM·2h
Track 8 · Room 2020
Capacity: 250 attendees
Sign in to add this talk to your schedule.

Zain Hasan
Staff AI/ML Engineer - DX
Together AI
AI/ML engineer and educator focused on large-scale models, tooling, and developer education.

Yubo Wang
LLM Inference
Together AI
Yubo Wang is speaking at AI Engineer World's Fair 2026.
Together AI
@QingyangWu1
Qingyang Wu is speaking at AI Engineer World's Fair 2026.

Jue Wang
Senior Staff Researcher
Together AI
Jue Wang is speaking at AI Engineer World's Fair 2026.