23,419 audio-transcription pairs totaling 72 hours of Farsi speech data, contributed by 678 distinct speakers. This dataset is part of the YodaLingua multilingual collection, designed for training text-to-speech and automatic speech recognition models. It was uploaded by Thomcles to Hugging Face and last updated on 2026-04-27.
Use Cases
- Train text-to-speech models based on high-quality, aligned audio-text pairs.
- Develop automatic speech recognition systems for Farsi based on clean, transcribed audio clips.
- Fine-tune voice synthesis models leveraging data from 678 distinct speakers for speaker diversity.
- Build multilingual speech applications by integrating this Farsi subset with other language portions of the YodaLingua collection.
Strengths
- Contains 23,419 individual audio clips, providing a substantial number of training examples.
- Offers 72 hours of total audio duration, a significant volume for speech model training.
- Includes contributions from 678 distinct speakers, which likely provides diversity in vocal characteristics.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Last updated 2026-04-27 16:10:03; freshness should be verified.
Provenance
- Source
- Thomcles on Hugging Face
- Collection Method
- Part of the multilingual YodaLingua speech collection; specific gathering method unknown.
- Freshness
- 2026-04-27