Released in 2024, TemporalBench is a video understanding benchmark designed to evaluate fine-grained temporal reasoning for multimodal video models. It consists of approximately 10,000 video question-answer pairs sourced from around 2,000 high-quality human-annotated video captions. The dataset was created by Microsoft.
Use Cases
- Benchmarking model performance on fine-grained temporal reasoning based on video question-answer pairs.
- Training models to understand detailed temporal dynamics and actions in videos.
- Evaluating the ability of multimodal systems to answer questions about temporal sequences.
Strengths
- Contains approximately 10,000 video question-answer pairs.
- Built from around 2,000 high-quality human-annotated video captions.
- Specifically designed to capture detailed temporal dynamics and actions.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Last updated 2024-11-07 08:32:37; freshness should be verified.
Provenance
- Source
- Microsoft
- Collection Method
- Sourced from human-annotated video captions.
- Freshness
- Last updated 2024-11-07.