Sign in to view source links and access this dataset
Description
MQ-RAVSBench is a benchmark for evaluating mask-quality auditing in referring audio-visual segmentation. The dataset, created by Jinxing1, links video clips, audio, referring expressions, ground-truth object masks, and candidate masks with different error patterns. It was last updated on Hugging Face on May 22, 2026.
Use Cases
Benchmarking mask-quality auditing models based on candidate masks with error patterns.
Training models to accept, revise, or reject segmentation masks based on linked video, audio, and text expressions.
Researching multimodal integration for object segmentation tasks using synchronized audio-visual-text data.
Strengths
Provides a structured benchmark for a specific task: mask-quality auditing in referring audio-visual segmentation.
Each example contains multiple linked modalities: video, audio, a referring expression, a ground-truth mask, and candidate masks.
Limitations
Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Provenance
Source
huggingface
Freshness
Last updated 2026-05-22 07:29:45; freshness should be verified.
License is unknown; terms of use must be verified before application.