Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Cambrian-S-3M is a collection of approximately 3 million video instruction tuning records developed by nyu-visionx for the third training stage of the Cambrian-S multimodal model. Released in early 2026, the dataset aggregates video-text pairs from Cambrian-S-3M, LLaVA-Video-178K, and LLaVA-Hound (ShareGPTVideo).
Requires Hugging Face CLI version 0.36.0 or higher and approximately 5 TB of disk space for local storage.