Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Amphion released the NVSpeech (Emilia-NV) dataset in 2025, providing between 100,000 and 1,000,000 Mandarin Chinese speech samples. The collection features word-level annotations for 18 categories of paralinguistic vocalizations, including non-verbal sounds and lexicalized interjections.
The dataset is distributed in the WebDataset format and is licensed under CC BY-NC-SA 4.0, which prohibits commercial use and requires derivative works to be shared under the same terms.