Omni-Cloze frames detailed captioning evaluation as a cloze-style multiple-choice proxy task. The unified benchmark spans 9 main domains and 47 sub-categories for evaluating models across audio-only, visual-only, and audio–visual settings. It was created by BoJack and last updated on March 18, 2026.
Use Cases
- Benchmarking multimodal model performance on a cloze-style proxy task based on the described evaluation framework
- Training models for detailed captioning across diverse domains like education, entertainment, and science mentioned in the description
- Evaluating model robustness in audio-only, visual-only, and audio–visual settings as defined by the benchmark
- Researching domain adaptation and generalization across the 9 main domains and 47 sub-categories
Strengths
- Covers 9 main domains and 47 sub-categories, indicating broad topical diversity
- Provides a unified evaluation framework for audio-only, visual-only, and audio–visual settings
- Frames evaluation as a structured cloze-style multiple-choice task
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Row count is unknown, which may limit suitability assessment
- Description metadata is limited; actual data quality requires manual inspection after download
Provenance
- Source
- Hugging Face dataset by author BoJack
- Freshness
- Last updated 2026-03-18 07:14:50