Sign in to view source links and access this dataset
Description
Released by NVIDIA in May 2025, this multimodal dataset contains pairs of videos and text annotations for embodied reasoning tasks. It includes data from BridgeDatav2, RoboVQA, Agibot, HoloAssist, AV, and RoboFail datasets. The annotations are structured for Supervised Fine-Tuning (SFT), Reinforcement Learning (RL), and benchmarking purposes.
Use Cases
Training reinforcement learning agents for robotics based on video-text pairs.
Benchmarking embodied reasoning models on tasks from multiple source datasets.
Conducting supervised fine-tuning for vision-language models on embodied tasks.
Analyzing failure modes in robotic systems using the RoboFail benchmark data.
Strengths
Multimodal structure pairs video with text annotations, which is a key format for embodied AI.
Integrates data from at least six distinct source datasets for varied task coverage.
Released by NVIDIA, a leading institution in AI research and hardware.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count, file formats, and exact data size are unknown, limiting suitability assessment.
The description references tables in a paper not provided in the input, requiring external reading for full context.
Provenance
Source
NVIDIA
Collection Method
Aggregated and annotated from multiple existing robotics and embodied AI datasets.
Freshness
Last updated 2025-05-20 06:51:06.
License is unknown; users must verify terms of use before downloading.