Name: Cantonese YouTube TTS: Filtered Audio for Speech Synthesis
Creator: alvanlii
Published: 2026-03-31T05:39:10
Keywords: Cantonese, Text To Speech, Speech Synthesis, Text, Audio, Audio Processing

Description

Cantonese Audio TTS Dataset is a collection for text-to-speech applications, combining alvanlii/cantonese-radio and alvanlii/cantonese-youtube with an additional dataset of equal size. The dataset creator alvanlii applied filtering and audio enhancement techniques, including the removal of overlapped voices and music. It was last updated on 2026-04-05.

Use Cases

Train text-to-speech models based on the described Cantonese audio data.
Develop speaker verification or diarization systems using the filtered, non-overlapping speech segments.
Fine-tune audio enhancement models using the pre-processed audio mentioned in the description.
Build Cantonese speech corpora for linguistic research based on the described radio and YouTube sources.

Strengths

Dataset is described as having undergone more extensive filtering and audio enhancement than two previously published datasets.
Includes filtering for overlapped voices using pyannote/speaker-diarization-3.1 and for music.

Limitations

Row count, file formats, and column-level documentation are unknown, which may limit suitability assessment.
License information is not provided, which restricts clarity on permissible use.
Data may reflect bias inherent to the specific YouTube and radio sources used for collection.

Provenance

Source: alvanlii on Hugging Face.
Collection Method: Combines and filters existing datasets (cantonese-radio, cantonese-youtube) plus an additional dataset of equal size.
Freshness: Last updated 2026-04-05 13:52:47.
Geography: Likely Cantonese-speaking regions, inferred from the language focus.

Speaker labels are not directly provided; the description suggests using speaker embedding models like Nvidia's TitaNet for speaker identification.

Text Audio Cantonese Text To Speech Speech Synthesis Audio Processing

Cantonese YouTube TTS: Filtered Audio for Speech Synthesis

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info