Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Robo2VLM 1 provides between 100,000 and 1,000,000 visual question-answering records derived from real-world robot manipulation trajectories. Created by researcher keplerccc and updated in late 2025, the dataset uses multi-modal robot data to enhance scene understanding in vision-language models. It bridges the gap between internet-scale image-text corpora and specific robotic visuomotor policies.
Data is provided in Parquet format and is associated with Arxiv paper 2505.15517; users should consult the paper for specific details on the trajectory sources and VQA generation methodology.