AI Engineer WF 2026
ScheduleSpeakers
Sign In
Sign In
Speakers/Harshul Jain
Harshul Jain

Harshul Jain

Senior Software Engineer

Audible Inc

@hj1393

I'm Harshul, Senior Software Engineer at Audible. I build distributed systems and GenAI platforms at scale - feature stores that handle 100K transactions per second, real-time streaming pipelines, and AI Search serving 10 million users. Outside work, I've reviewed 5 books for Manning, mentored over 300 engineers on Codementor and Topmate, and built AgentShip, an open-source framework agnostic solution for writing and deploying AI agents to production.

Sessions (2)

2 hr deep dive on LLM Inference at Scale — Part 1 of 2
12:10 PM·Track 3 · Room 2003

Most engineers using LLMs can call an API. Far fewer can explain why their model is slow, why it's running out of memory, or how the inference engines powering every major LLM API actually work. This workshop walks through the full inference stack — from how a transformer generates a single token to serving billions of tokens a day with vLLM, SGLang, TensorRT-LLM, Ray, and KServe/llm-d. 60% explanation with live demos, 40% hands-on exercises. Attendees leave with a running vLLM server they benchmarked themselves. Based on the open-source practitioners handbook being built live at github.com/harshuljain13/llm-inference-at-scale (NOTE: this is a 2 hour workshop that happens over lunch break - you should try to have lunch before or after if attending) compute kindly sponsored by Coreweave/Marimo!

Workshops Day 1advancedworkshop
2 hr deep dive on LLM Inference at Scale — Part 2 of 2
1:15 PM·Track 3 · Room 2003

Most engineers using LLMs can call an API. Far fewer can explain why their model is slow, why it's running out of memory, or how the inference engines powering every major LLM API actually work. This workshop walks through the full inference stack — from how a transformer generates a single token to serving billions of tokens a day with vLLM, SGLang, TensorRT-LLM, Ray, and KServe/llm-d. 60% explanation with live demos, 40% hands-on exercises. Attendees leave with a running vLLM server they benchmarked themselves. Based on the open-source practitioners handbook being built live at github.com/harshuljain13/llm-inference-at-scale (NOTE: this is a 2 hour workshop that happens over lunch break - you should try to have lunch before or after if attending)

Workshops Day 1intermediate
sponsor