Dhruv Nathawani

Senior Research Scientist

Nvidia

Dhruv Nathawani is a Research Scientist at NVIDIA, where he works at the intersection of synthetic data and foundation model alignment. His work focuses on building data-centric methods that improve the reliability, adaptability, and performance of Nemotron foundation models and modern AI systems. Prior to NVIDIA, Dhruv was at Gretel, a synthetic data platform for developers, where he worked on tools for generating high-quality data for AI applications. At Salesforce Research, Apple Maps, and Carnegie Mellon University, Dhruv built AI systems spanning medical multimodal learning, document AI/OCR, satellite computer vision, and fMRI-based cognitive decoding.

Sessions (1)

Teaching Agents to Search: Building Synthetic Training Pipelines with NVIDIA Data Designer

11:05 AM·Track 5 · Room 2005

Modern agentic systems often fail because the right training data simply does not exist. Search agents are a perfect example: if you want a model to browse the web effectively, you need high-quality multi-step trajectories that teach it how to search, refine queries, inspect sources, and recover from dead ends. Those datasets are rarely available off the shelf. In this hands-on workshop, we will show how NVIDIA used Data Designer to build synthetic supervised fine-tuning data for search-capable Nemotron models. Participants will learn how to translate a target capability into a scalable data generation pipeline: defining task structure, generating strong seed examples, producing realistic search trajectories, filtering low-quality generations, and converting traces into training-ready records. Using a real search-agent use case, we will walk through the design decisions behind teaching Nemotron Super to browse the web, including how to create BrowseComp-style tasks, generate tool-use rollouts, and manage the tradeoffs between diversity, correctness, and yield. We will also cover the practical realities of production synthetic data workflows, including validation, dataset curation, and where most pipelines break down. But the goal of this workshop goes beyond search. Participants will leave with a reusable framework for designing any dataset they wish they already had: starting from the behavior they want to teach, mapping that behavior into a data schema, generating examples at scale, and iterating until the dataset is useful for training. By the end of the session, attendees will not only know how to build synthetic data for search agents, but how to design custom datasets for specialized behaviors across reasoning, tool use, and domain-specific applications. Attendees will leave with a practical methodology for synthetic data design, plus hands-on familiarity with NVIDIA Data Designer as an open-source system for rapid experimentation.

Workshops Day 1advancedworkshop