One million synthetic audio samples for text-to-speech applications, generated across 1000 distinct speakers. The collection was created by Aynursusuz, with each speaker contributing 1000 samples derived from 100 texts and 10 voice clones. The dataset was last updated on Hugging Face on March 11, 2026.
Use Cases
- Pre-training text-to-speech models based on the 1 million audio samples.
- Training multi-speaker TTS systems based on the 1000 distinct speaker identities.
- Developing voice cloning techniques based on the 10 clones per speaker mentioned in the description.
- Benchmarking audio generation quality based on the 44.1 kHz WAV format samples.
Strengths
- Contains 1,000,000 total audio samples.
- Includes audio from 1000 distinct speakers.
- Audio is provided at a 44.1 kHz sample rate in WAV format.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Data may reflect synthetic generation bias inherent to the source method.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- Hugging Face, uploaded by Aynursusuz.
- Collection Method
- Synthetically generated for text-to-speech pretraining.
- Freshness
- Last updated 2026-03-11 17:24:44; freshness should be verified.