Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Encompassing between 10,000 and 100,000 cleaned audio clips derived from the Mozilla Common Voice 24.0 Chinese (Taiwan) corpus. Released by OKHand in early 2026, it provides 'Voice Seeds' processed through Silero-VAD to remove silence and environmental noise for generative speech tasks.
The data is licensed under CC0 1.0 and is provided in an optimized Parquet format, making it compatible with modern data processing libraries like Polars and Dask.