Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
AV-SpeakerBench is an audiovisual question-answering benchmark containing between 1,000 and 10,000 records, released in December 2024 by researcher plnguyen2908. It features trimmed segments across audio-only, visual-only, and audiovisual modalities paired with speaker-aware annotations to test fine-grained reasoning in multimodal models.
The dataset is released under the MIT license. Users should refer to the associated GitHub repository for evaluation scripts and implementation details.