Louis-François Bouchard

Co-founder and CTO

Towards AI

Louis-François Bouchard is the co-founder of Towards AI, where he builds and teaches a practical toolkit for shipping reliable LLM products. He co-authored Building LLMs for Production, a hands-on guide to prompting, fine-tuning, retrieval augmented generation, and evaluation. Through Towards AI Academy, he has launched multiple in-depth courses for AI engineers, designed to turn developers into AI professionals who can transform prototypes into scalable, customer-ready systems. He also runs the What’s AI YouTube channel and newsletter, translating new research and best practices into clear engineering playbooks for 70K+ subscribers and tens of thousands of readers. Today, he partners with founders and organizations on AI strategy, training design, and production workflows that raise accuracy, reduce risk, and make generative AI useful for paying customers, and he speaks at events such as AIE, Uphill Conf.

Sessions (1)

Context Engineering in 2026: Compaction, Memory & Cost

2:20 PM·Track 6 · Room 2014

Every long agent session eventually breaks: the assistant that swore it would "never push to main" does exactly that forty turns later. The model didn't get dumber — its context did. This workshop is about engineering the context window so that stops happening, shown with Towards AI's open-source AI tutor, which answers questions for students of our AI-engineering courses. Context engineering is deciding what the model sees on every single call — instructions, history, retrieved course content, memory, and tool outputs — and it's the line between a tutor that holds a coherent session and one that forgets the student's setup halfway through. We'll move in three stages, mirroring how the project actually went. The concepts: the two root problems (a finite window, a stateless model), the full compaction toolkit (truncation, trimming, tool-result clearing, summarization, and offloading to files — and when each actually helps), memory that survives across sessions, skills loaded on demand, and production-grade retrieval (chunking, metadata, course scoping, hybrid search, reranking, and evaluating). We'll cover the tutor's architecture, and the evaluation harness we used to measure every run on Gemini — tokens, cost, latency, and memory probes instead of vibe-checks. At real volume, even Gemini Flash got expensive, so we tested whether open and local models could match the quality for a fraction of the cost and match result quality. Everything is open-source and will be shared during the workshop. Speakers: Louis-François Bouchard — Towards AI; Samridhi Vaid — Towards AI; Omar Solano — Towards AI.

Track 6intermediate