Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
61,000+ multimodal samples across text, video, and audio modalities from nine datasets. The MMLA benchmark, created by THUIAR, includes data from films, TV series, YouTube, Vimeo, Bilibili, TED, and improvised scripts for evaluating foundation models.
The full dataset description is hosted externally; users must visit the provided Hugging Face page for complete details on tasks, structure, and access.