Name: FineBench: Large-Scale Video Question Answering for Fine-Grained Action Understanding
Creator: FINEBENCH
Published: 2025-04-07T11:08:03
Keywords: Human Action Understanding, Computer Vision, Multiple Choice, Large Scale, Time Series, Video, Video Question Answering, Multimodal

Description

FineBench is a large-scale, multiple-choice Video Question Answering dataset designed to evaluate fine-grained understanding of human actions in videos. It leverages dense spatial and temporal annotations from the AVA v2.2 dataset, providing approximately 200,000 questions focused on nuanced person movements, interactions, and object manipulations within long video contexts. The dataset was created by FINEBENCH and was last updated on May 23, 2026.

Use Cases

Benchmarking Video Question Answering models on fine-grained human action understanding based on the described multiple-choice questions.
Training models to recognize nuanced person movements and interactions based on the dense spatial and temporal annotations.
Developing models that understand object manipulations within long video contexts as described in the dataset's focus.
Evaluating model performance on tasks requiring reasoning about temporal sequences and spatial relationships in video.

Strengths

Approximately 200,000 multiple-choice questions provide a substantial evaluation scale.
Leverages dense spatial (bounding boxes) and temporal (timestamps) annotations from the established AVA v2.2 dataset.
Focuses specifically on fine-grained understanding of human actions, interactions, and object manipulations.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: FINEBENCH
Collection Method: Derived from the AVA v2.2 dataset annotations.
Freshness: Last updated 2026-05-23 07:04:42; freshness should be verified.

License is unknown; users must verify terms of use before downloading.

Time Series Video Multimodal Human Action Understanding Computer Vision Multiple Choice Large Scale Video Question Answering

FineBench: Large-Scale Video Question Answering for Fine-Grained Action Understanding

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info