Description

Tokyo-based driving data provides 16 million question-answer pairs over 270,000 frames. The STRIDE-QA dataset is a large-scale visual question answering resource for physically grounded spatiotemporal reasoning in autonomous driving. It was constructed from 100 hours of multi-sensor driving data and includes dense annotations such as 3D bounding boxes, segmentation masks, and multi-object tracks.

Use Cases

Benchmarking spatiotemporal reasoning models based on the 16 million QA pairs
Training visual question answering systems for autonomous driving scenarios based on multi-sensor data
Developing object tracking algorithms based on the multi-object track annotations
Evaluating 3D scene understanding models based on the 3D bounding box and segmentation mask annotations

Strengths

Large scale with 16 million QA pairs
Dense annotations include 3D bounding boxes, segmentation masks, and multi-object tracks
Constructed from 100 hours of multi-sensor driving data

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment

Provenance

Source: turing-motors
Collection Method: Constructed from 100 hours of multi-sensor driving data
Freshness: Last updated 2026-01-23 08:31:01; freshness should be verified
Geography: Tokyo

Multimodal WEBDATASET Languageen Task Categoriesvisual Question Answering Librarywebdataset Modalitytext Size Categories100 Kn1 M Librarymlcroissant Modalityimage Librarydatasets Spatiotemporal Reasoning Computer Vision Multi Sensor Data Regionus Large Scale Autonomous Driving Visual Question Answering Arxiv250810427

STRIDE-QA: Visual Question Answering Dataset for Autonomous Driving

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info