Sign in to view source links and access this dataset
Description
Videoreason Training is a large-scale dataset comprising 471,575 samples across three subsets: perception (176,907 samples), simulation (105,818 samples), and embodied (188,850 samples). It was created by Zane-QIU and last updated on HuggingFace in April 2026. The dataset is designed for training models on video reasoning tasks spanning visual perception, 3D simulation, and robotics.
Use Cases
Train segmentation models using the perception subset's segmentation labels.
Develop video super-resolution models using the low-resolution and high-resolution video pairs in the perception subset.
Simulate camera motion for 3D scene navigation using the camera pose sequences in the simulation subset.
Train embodied AI agents for robotic manipulation using the action and state sequences in the embodied subset.
Strengths
471,575 total video samples
Covers three distinct reasoning domains: perception, simulation, and embodied tasks
Limitations
Specific column names, data formats, and sample sizes per task within subsets are unknown
No information on video resolution, frame rate, or annotation quality
Provenance
Source
Zane-QIU on HuggingFace
Freshness
Last updated April 2026.
The full dataset description and column details are only available on the external HuggingFace dataset page. License information is unknown.