DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Cambrian-S-3M: 3 Million Video Instruction Tuning Records | DataSalon

Home Multimodal & LLMCambrian-S-3M: 3 Million Video Instruction Tuning Records

Multimodal & LLM

Cambrian-S-3M: 3 Million Video Instruction Tuning Records

Name: Cambrian-S-3M: 3 Million Video Instruction Tuning Records
Creator: nyu-visionx
Published: 2025-11-07T02:32:07
Keywords: Arxiv251104670, Regionus, Licenseapache 20

by nyu-visionx·Updated 6mo ago

Available on 1 platform

Description

Cambrian-S-3M is a collection of approximately 3 million video instruction tuning records developed by nyu-visionx for the third training stage of the Cambrian-S multimodal model. Released in early 2026, the dataset aggregates video-text pairs from Cambrian-S-3M, LLaVA-Video-178K, and LLaVA-Hound (ShareGPTVideo).

Use Cases

Instruction tuning of multimodal models using video-text pairs
Benchmarking video understanding capabilities against LLaVA-Video-178K subsets
Training conversational agents to describe temporal events in ShareGPTVideo sequences

Strengths

Aggregates 3,000,000 video instruction records
Combines established benchmarks including LLaVA-Video-178K and LLaVA-Hound
Released under Apache 2.0 license

Limitations

Requires approximately 5 TB of disk space for storage
Potential for label noise or format inconsistencies across the three source datasets

Provenance

Source: nyu-visionx (Arxiv 2511.04670)
Collection Method: Aggregation of existing open-source video instruction datasets
Freshness: Last updated January 2026.

Requires Hugging Face CLI version 0.36.0 or higher and approximately 5 TB of disk space for local storage.

Arxiv251104670 Regionus Licenseapache 20

Related Datasets

Quality Score

C42

Description

Source

Reputation

Quality Score

C42

Description

Source

Reputation

Access

Community

19.3K downloads

5 likes

0 views

Dataset Info

Author: nyu-visionx
Created: Nov 7, 2025
Updated: Jan 22, 2026
Last synced: Jun 16, 2026

Access

Community

19.3K downloads

5 likes

0 views

Dataset Info

Author: nyu-visionx
Created: Nov 7, 2025
Updated: Jan 22, 2026
Last synced: Jun 16, 2026

Cambrian-S-3M: 3 Million Video Instruction Tuning Records

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info