Sign in to view source links and access this dataset
Description
2,740 audio recordings at 16 kHz form a dataset for traditional Chinese speech synthesis and recognition. Each entry includes the audio, its length, traditional Chinese text, and a corresponding normalized simplified Chinese text. The dataset, originally from ivanzhu109/zh-taiwan, is mirrored and formatted by lianghsun.
Use Cases
Train text-to-speech models based on the provided 16 kHz audio and traditional Chinese text pairs.
Train automatic speech recognition models based on the audio and its corresponding text transcriptions.
Benchmark model performance across traditional and simplified Chinese scripts using the parallel normalized_text field.
Study the impact of text normalization on speech synthesis or recognition for Chinese languages.
Strengths
Contains 2,740 audio-text pairs, split into train (2,698), validation (14), and test (28) subsets.
Provides parallel text in both traditional and normalized simplified Chinese, enabling cross-script applications.
Audio is recorded at a standard 16 kHz sampling rate suitable for speech processing tasks.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count for the primary data splits is known, but the total number of rows per column is unspecified.
The validation and test sets are very small (14 and 28 samples), which may limit robust evaluation.
Provenance
Source
Original source is ivanzhu109/zh-taiwan; this repository is a mirror and format-organized version by lianghsun.
Collection Method
Likely gathered from speech recordings paired with text transcriptions.
Freshness
Last updated 2026-04-14 23:00:17; freshness should be verified.
Geography
The dataset name suggests a focus on Taiwanese Mandarin or Traditional Chinese, but specific geographic coverage is not detailed.
Original copyright belongs to the original author (ivanzhu109). The license is unknown and should be verified before use.