Du'an Lightfoot

Senior AI Engineer

Akamai Technologies

Senior AI Engineer at Akamai Technologies specializing in artificial intelligence and network engineering. Previously served as a Senior Developer Advocate at AWS and is the founder of LabEveryDay.

Sessions (1)

Agents That Own Their Inference: Building Production AI Agents on Dedicated GPUs

9:00 AM·Track 7 · Room 2024

Every production agent today is renting its intelligence. You're paying per token, sending your customer's data to someone else's servers, and hoping the provider doesn't rate-limit you during your launch. For most teams, that's fine. But for a growing number of teams in regulated industries, with high-volume products, latency-sensitive workloads, or rising token bills, it's starting to look like a liability. In this 120-minute hands-on workshop you'll get a dedicated GPU and build an agent that runs on infrastructure you control. You'll stand up vLLM, point your agent at it, and drive concurrent load through the stack until you can see batching, KV cache pressure, and throughput limits in the metrics. Then you'll optimize the deployment to improve throughput while keeping per-request latency in line. The focus isn't agent frameworks. It's the inference layer underneath them. You'll leave with working code and a real understanding of continuous batching under real concurrency, KV cache tradeoffs, vLLM's metrics, and the bottlenecks that only show up when you operate the inference server yourself.

Track 7intermediatesponsor