Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
NVIDIA, UC Berkeley, and UCSF released this collection of 100,000 to 1,000,000 records in 2025 for training Describe Anything Models (DAM). The data consists of localized image and video captions stored in WebDataset tar files to support vision-language tasks.
Data is provided in WebDataset format (tar files); users should utilize the webdataset library for efficient data streaming and loading.