Sign in to view source links and access this dataset
Description
FineBench is a large-scale, multiple-choice Video Question Answering dataset designed to evaluate fine-grained understanding of human actions in videos. It leverages dense spatial and temporal annotations from the AVA v2.2 dataset, providing approximately 200,000 questions focused on nuanced person movements, interactions, and object manipulations within long video contexts. The dataset was created by FINEBENCH and was last updated on May 23, 2026.
Use Cases
Benchmarking Video Question Answering models on fine-grained human action understanding based on the described multiple-choice questions.
Training models to recognize nuanced person movements and interactions based on the dense spatial and temporal annotations.
Developing models that understand object manipulations within long video contexts as described in the dataset's focus.
Evaluating model performance on tasks requiring reasoning about temporal sequences and spatial relationships in video.
Strengths
Approximately 200,000 multiple-choice questions provide a substantial evaluation scale.
Leverages dense spatial (bounding boxes) and temporal (timestamps) annotations from the established AVA v2.2 dataset.
Focuses specifically on fine-grained understanding of human actions, interactions, and object manipulations.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
FINEBENCH
Collection Method
Derived from the AVA v2.2 dataset annotations.
Freshness
Last updated 2026-05-23 07:04:42; freshness should be verified.
License is unknown; users must verify terms of use before downloading.