Search & Retrieval

Autoresearch for Dense Retrieval: Test-Time Compute with Frozen Embedding Models

TalkIntermediate

Test-time compute is widely believed to benefit only large reasoning models. We show it also helps small embedding models. Since modern embedding models are distilled from LLM backbones, a frozen encoder should benefit from extra inference compute without retraining. Using an agentic program-search loop spanning 144 generations, we explore 144 candidate programs over a frozen encoder API. The search produces twelve Pareto-optimal programs spanning cost ratios of c=1.2 to 14.7 over the single-pass baseline. The programs are structurally diverse: the search independently rediscovers Rocchio pseudo-relevance feedback, ColBERT-style MaxSim at sentence granularity, reciprocal rank fusion, and the Fisher linear discriminant, all without trainable parameters or external models. Every frontier program improves nDCG@10 over the frozen baseline across all 14 MMTEB retrieval tasks spanning legal, financial, long-document, and general domains.

About the Search & Retrieval Track

Search & Retrieval sessions at AI Engineer World's Fair 2026 in San Francisco.

When

Tuesday, June 30, 2026

3:45 PM - 4:05 PM·20m

Where

Track 3 · Room 2003

Capacity: 250 attendees

Speaker

Han Xiao

VP of AI

Elastic

Han Xiao is the VP of AI at Elastic. He founded Jina AI in 2020 and served as CEO until its acquisition by Elastic (NYSE: ESTC) in October 2025. Before that, he led search R&D at Tencent and worked on search and recommendations at Zalando. He created Fashion-MNIST, a widely used computer vision benchmark with 12,000+ citations, and got his Ph.D. from TU Munich in 2014 on adversarial and robust non-parametric Bayesian learning. He has lived and worked across the San Francisco Bay Area, Berlin, Munich, Taipei, Beijing, and Shenzhen, and is currently based in Mountain View.

AI Engineer World's Fair 2026