Sign in to view source links and access this dataset
Description
Bridge-CoT is a dataset of 35,357 samples for robot manipulation, derived from BridgeDataV2. Each sample pairs a scene image with a task description and includes structured VLM-generated annotations for object detection, spatial relations, and subgoal decomposition. The dataset was created by CliffKai and was last updated on Hugging Face in April 2026.
Use Cases
Training vision-language models for task understanding based on scene images and task descriptions.
Developing robotic planning algorithms based on the structured subgoal decomposition annotations.
Benchmarking object detection and spatial reasoning models using the VLM-generated annotations.
Studying chain-of-thought reasoning for physical tasks based on the provided annotations.
Strengths
Contains 35,357 samples, providing a substantial collection for model training.
Each sample includes a scene image paired with a task description and multiple structured annotation types.
Annotations are generated by a Vision-Language Model, providing detailed chain-of-thought reasoning.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect bias inherent to its source, BridgeDataV2.
Provenance
Source
Derived from BridgeDataV2.
Collection Method
VLM-generated annotations added to existing robot manipulation data.
Freshness
Last updated 2026-04-02 05:24:41; freshness should be verified.
License is unknown; users should verify terms before use.