Test-time compute is widely believed to benefit only large reasoning models. We show it also helps small embedding models. Since modern embedding models are distilled from LLM backbones, a frozen encoder should benefit from extra inference compute without retraining. Using an agentic program-search loop spanning 144 generations, we explore 144 candidate programs over a frozen encoder API. The search produces twelve Pareto-optimal programs spanning cost ratios of c=1.2 to 14.7 over the single-pass baseline. The programs are structurally diverse: the search independently rediscovers Rocchio pseudo-relevance feedback, ColBERT-style MaxSim at sentence granularity, reciprocal rank fusion, and the Fisher linear discriminant, all without trainable parameters or external models. Every frontier program improves nDCG@10 over the frozen baseline across all 14 MMTEB retrieval tasks spanning legal, financial, long-document, and general domains.
Search & Retrieval sessions at AI Engineer World's Fair 2026 in San Francisco.
Tuesday, June 30, 2026
3:45 PM - 4:05 PM·20m
Track 3 · Room 2003
Capacity: 250 attendees
Sign in to add this talk to your schedule.

Han Xiao
VP of AI
Elastic
Han Xiao is the VP of AI at Elastic. He founded Jina AI in 2020 and served as CEO until its acquisition by Elastic (NYSE: ESTC) in October 2025. Before that, he led search R&D at Tencent and worked on search and recommendations at Zalando. He created Fashion-MNIST, a widely used computer vision benchmark with 12,000+ citations, and got his Ph.D. from TU Munich in 2014 on adversarial and robust non-parametric Bayesian learning. He has lived and worked across the San Francisco Bay Area, Berlin, Munich, Taipei, Beijing, and Shenzhen, and is currently based in Mountain View.