Sign in to view source links and access this dataset
Description
A large-scale multimodal benchmark for intelligent traffic surveillance created by LifeIsSoSolong and last updated on 2025-10-25. It contains 170,400 images paired with approximately 5 million instruction-following visual question answering samples. The dataset covers diverse traffic scenes including congestion, spills, unusual weather, construction, fireworks, smoke, and accidents.
Use Cases
Training visual question answering models based on the ~5 million instruction-following samples.
Developing traffic scene recognition systems based on the diverse ITS scenes described.
Building models for object counting and localization in traffic surveillance imagery.
Enhancing background awareness and reasoning capabilities for autonomous systems based on the multimodal benchmark.
Strengths
Contains 170,400 images, providing a substantial visual corpus.
Includes approximately 5 million visual question answering samples, offering extensive language annotations.
Covers a diverse range of traffic scenes as mentioned in the description, such as congestion, spills, and accidents.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
huggingface
Collection Method
Likely collected and annotated for research purposes, but specific gathering method is not detailed.
Time Range
null
Freshness
Last updated 2025-10-25 01:59:40.
Geography
null
License is unknown; users must verify permissions before use.