Vision & OCR

Skill issue: stop deploying vision language models, use them with Skills to build e2e vision apps on edge

Sponsor SessionIntermediate

With the boom of vision language models barrier of entry to build vision apps are much lower so developers tend to use them right away. However, these models are very large and inefficient in production. In this talk, I will go through combining vision language models with Skills to build end-to-end vision apps from training to deployment using HF Skills, on top of showing the state-of-the-art in small computer vision/multimodal models.

About the Vision & OCR Track

Vision & OCR sessions at AI Engineer World's Fair 2026 in San Francisco.

When

Tuesday, June 30, 2026

11:40 AM - 12:00 PM·20m

Where

Track 2 · Room 2006

Capacity: 250 attendees

Speaker

Merve Noyan

Developer Advocate

Hugging Face

@mervenoyann

Works at Hugging Face open-source team, author of the book Vision Language Models with Hugging Face published by O'Reilly.

AI Engineer World's Fair 2026