AI Engineer WF 2026
ScheduleSpeakers
Sign In
Sign In
Speakers/Filip Makraduli
Filip Makraduli

Filip Makraduli

Machine Learning Engineer

Superlinked

@f_makraduli

Filip Makraduli is an applied AI researcher and founding ML Developer Relations engineer at Superlinked, where he designs and ships small‑LLM inference systems for search, retrieval, and agents in production. He holds a master’s degree in Biomedical Data Science from Imperial College London. Before Superlinked, Filip worked in machine learning, data science, and developer relations roles across early‑stage AI startups and larger enterprises, building language understanding, retrieval‑augmented generation (RAG), and LLM pipeline tooling while partnering closely with product and platform teams. He is a frequent open‑source contributor, with contributions to kernel libraries, model‑inference providers, and hands‑on demos used by practitioners. Filip is a co‑author of several publications on efficient transformer architectures and inference, including work on faster normalization for LLMs. He is an experienced speaker at meetups and conferences such as AI Engineer Europe and Berlin Buzzwords, sharing practical lessons on efficient transformers, retrieval systems, and embedding inference for production AI teams.

Sessions (2)

Turning My Obsidian Vault Into a Local AI Engineer
1:15 PM·Track 8 · Room 2020

Personal knowledge bases are messy, but engineering agents need memory: decisions, docs, TODOs, old PRs, architecture notes, incident notes. This talk shows how I made an Obsidian vault usable by an agent using local-first retrieval and small-model inference. The point is not “chat with notes”; it is how to build durable, inspectable agent memory.

Workshops Day 1intermediatesponsor
Weight Folding, CUDA Streams, and the Bug That Made My Model Speak Backwards
3:20 PM·Track 9 · Room 2016

A talk about contributing GPU benchmarks to an open-source research paper (FlashNorm). I'll walk through the engineering journey: folding norm weights into projections, writing Triton kernels, accidentally making attention bidirectional (oops), and ultimately proving a 33-35% speedup on the norm+project operation. Practical lessons for anyone trying to optimize transformer inference.

Inferenceintermediate
talk