OpenDV Poses provides structured annotations for the OpenDV-YouTube driving-video dataset released by OpenDriveLab. It includes car and lane skeleton keypoints extracted with OpenPifPaf, pedestrian whole-body keypoints extracted with DWPose, and image captions. The dataset was collected and generated as part of the MAD project.
Use Cases
- Train pose estimation models for vehicles and pedestrians based on extracted skeleton keypoints.
- Develop multi-modal models for driving scene understanding based on combined video, keypoint, and caption data.
- Benchmark lane detection algorithms based on lane skeleton keypoints.
- Generate descriptive captions for driving scenarios based on the provided image captions.
Strengths
- Annotations are extracted using established computer vision tools: OpenPifPaf and DWPose.
- Dataset is associated with a known research project (MAD) and organization (OpenDriveLab).
- Provides multiple annotation types: keypoints for cars, lanes, pedestrians, and captions.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
Provenance
- Source
- OpenDV-YouTube dataset from OpenDriveLab.
- Collection Method
- Annotations extracted with OpenPifPaf and DWPose.
- Freshness
- Last updated 2026-06-05 13:43:53; freshness should be verified.