With the boom of vision language models barrier of entry to build vision apps are much lower so developers tend to use them right away. However, these models are very large and inefficient in production. In this talk, I will go through combining vision language models with Skills to build end-to-end vision apps from training to deployment using HF Skills, on top of showing the state-of-the-art in small computer vision/multimodal models.
Vision & OCR sessions at AI Engineer World's Fair 2026 in San Francisco.
Tuesday, June 30, 2026
11:40 AM - 12:00 PM·20m
Track 2 · Room 2006
Capacity: 250 attendees
Sign in to add this talk to your schedule.

Merve Noyan
Developer Advocate
Hugging Face
@mervenoyann
Works at Hugging Face open-source team, author of the book Vision Language Models with Hugging Face published by O'Reilly.