Name: MITS: Multimodal Intelligent Traffic Surveillance with 170,400 Images and 5M VQA Samples
Creator: LifeIsSoSolong
Published: 2025-08-14T02:24:16
Keywords: Vision Language, Traffic Surveillance, Benchmark, Large Scale, Vqa, Intelligent Transportation, Multimodal Benchmark, Multimodal

Description

A large-scale multimodal benchmark for intelligent traffic surveillance created by LifeIsSoSolong and last updated on 2025-10-25. It contains 170,400 images paired with approximately 5 million instruction-following visual question answering samples. The dataset covers diverse traffic scenes including congestion, spills, unusual weather, construction, fireworks, smoke, and accidents.

Use Cases

Training visual question answering models based on the ~5 million instruction-following samples.
Developing traffic scene recognition systems based on the diverse ITS scenes described.
Building models for object counting and localization in traffic surveillance imagery.
Enhancing background awareness and reasoning capabilities for autonomous systems based on the multimodal benchmark.

Strengths

Contains 170,400 images, providing a substantial visual corpus.
Includes approximately 5 million visual question answering samples, offering extensive language annotations.
Covers a diverse range of traffic scenes as mentioned in the description, such as congestion, spills, and accidents.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface
Collection Method: Likely collected and annotated for research purposes, but specific gathering method is not detailed.
Time Range: null
Freshness: Last updated 2025-10-25 01:59:40.
Geography: null

License is unknown; users must verify permissions before use.

Multimodal Vision Language Traffic Surveillance Benchmark Large Scale Vqa Intelligent Transportation Multimodal Benchmark

MITS: Multimodal Intelligent Traffic Surveillance with 170,400 Images and 5M VQA Samples

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info