ViYT-Diar is a manually annotated audio dataset extracted from Vietnamese YouTube videos. It is designed as a test benchmark for evaluating Speaker Diarization models on in-the-wild data. The dataset was created by author tuanduy1612 and last updated on 2026-04-03.
Use Cases
- Benchmarking speaker diarization model performance based on manually annotated timestamps.
- Training models for Vietnamese speech processing based on in-the-wild YouTube audio.
- Evaluating model robustness on real-world, multi-speaker audio data mentioned in the description.
Strengths
- Manually annotated for high-quality diarization labels.
- Specifically designed as a benchmark for in-the-wild data.
- Focuses on Vietnamese audio, a specific language domain.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
Provenance
- Source
- huggingface
- Collection Method
- Extracted and manually annotated from Vietnamese YouTube videos.
- Time Range
- null
- Freshness
- Last updated 2026-04-03 03:19:02; freshness should be verified.
- Geography
- null