Sign in to view source links and access this dataset
Description
NVIDIA's Cosmos-Reason1 SFT dataset pairs videos with text annotations for embodied reasoning. The annotations support tasks from multiple sources including BridgeDatav2, RoboVQA, Agibot, HoloAssist, and AV. Released on Hugging Face in May 2025, it also includes RoboFail data for benchmarking.
Use Cases
Fine-tuning vision-language models for embodied reasoning based on video-text pairs.
Benchmarking agent performance on failure scenarios using the included RoboFail dataset.
Training models for robotic question answering based on RoboVQA annotations.
Developing assistive AI agents using annotated data from the HoloAssist and Agibot sources.
Strengths
Released by NVIDIA, a leading AI research institution.
Covers multiple established embodied AI datasets like BridgeDatav2 and RoboVQA.
Includes a dedicated benchmark dataset (RoboFail) for evaluating failure modes.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
NVIDIA
Collection Method
Annotations summarized from the Cosmos-Reason1 research paper.
Freshness
Last updated 2025-05-20 06:52:30.
License is unknown; users must verify terms before use.