AI Engineer World's Fair 2026

HTML Is All Agents Need

TalkIntermediate

LLMs are great at writing code. So the question we kept asking was: can they write code that produces a video? We thought it would be easy. The reality was a year of trying. We started with massive prompts to get very mediocre output. We made it more agentic to iterate and improve its output. This worked okay but wasn't production-ready. Eventually we tried Remotion. It got us deterministic video, but the React framework kept boxing the agent in. The more guardrails we added, the safer and more boring the outputs got. When we utilized plain HTML, CSS, and JavaScript, the creativity came back to the output. So we set out to build a video rendering framework on top of HTML. But it needed to work with Gemini Flash. Why? Because one tell that a framework is fighting an agent is needing the biggest model just to get usable output. So from there we shaped the framework around what small models could reliably author. That left one real engineering question: can we keep the freedom of HTML and still render a deterministic MP4? Browsers don't want to do that. Image decoders, font loaders, and animation clocks all run async on their own schedule. Great for performance. Terrible for "render the same pixels every time." Throughout, we iterated constantly with agentic loops and self-improving evals to test out the framework, find issues in our renderer, and shape a set of skills that gave the agents Taste instead of guardrails. This talk is what it took to get there.

About the Generative Media Track

Generative Media sessions at AI Engineer World's Fair 2026 in San Francisco.

HTML Is All Agents Need

TalkIntermediate

About the Generative Media Track

Generative Media sessions at AI Engineer World's Fair 2026 in San Francisco.

HTML Is All Agents Need

About the Generative Media Track

When

Where

Speaker

HTML Is All Agents Need

About the Generative Media Track

When

Where

Speaker