Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
MDCC is a large-scale Cantonese automatic speech recognition dataset compiled from multiple domains. It provides .wav recordings of both spontaneous and read speech paired with UTF‑8 plain‑text transcripts and speaker metadata. The dataset was created by author 'ming030890' and was last updated on the Hugging Face platform on 2025-07-26.
The .wav data is hosted on a Google Drive link and is noted for research purposes only.