YodaLingua-Farsi: 72 Hours of Farsi Speech for TTS and ASR

Name: YodaLingua-Farsi: 72 Hours of Farsi Speech for TTS and ASR
Creator: Thomcles
Published: 2025-12-07T13:02:08
Keywords: Farsi Language, Speech Synthesis, Multilingual Speech, Multilingual, Audio, Speech Recognition, Multimodal

by ThomclesUpdated 2mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

23,419 audio-transcription pairs totaling 72 hours of Farsi speech data, contributed by 678 distinct speakers. This dataset is part of the YodaLingua multilingual collection, designed for training text-to-speech and automatic speech recognition models. It was uploaded by Thomcles to Hugging Face and last updated on 2026-04-27.

Use Cases

Train text-to-speech models based on high-quality, aligned audio-text pairs.
Develop automatic speech recognition systems for Farsi based on clean, transcribed audio clips.
Fine-tune voice synthesis models leveraging data from 678 distinct speakers for speaker diversity.
Build multilingual speech applications by integrating this Farsi subset with other language portions of the YodaLingua collection.

Strengths

Contains 23,419 individual audio clips, providing a substantial number of training examples.
Offers 72 hours of total audio duration, a significant volume for speech model training.
Includes contributions from 678 distinct speakers, which likely provides diversity in vocal characteristics.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Last updated 2026-04-27 16:10:03; freshness should be verified.

Provenance

Source: Thomcles on Hugging Face
Collection Method: Part of the multilingual YodaLingua speech collection; specific gathering method unknown.
Freshness: 2026-04-27

Audio Multimodal Multilingual Farsi Language Speech Synthesis Multilingual Speech Speech Recognition

Related Datasets

Quality Score

C42

Description

51

Source

36

Reputation

47

Access

26

Community

216 downloads

6 likes

0 views

Dataset Info

Author: Thomcles
Created: Dec 7, 2025
Updated: Apr 27, 2026
Last synced: May 23, 2026

Access

26

Community

216 downloads

6 likes

0 views

Dataset Info

Author: Thomcles
Created: Dec 7, 2025
Updated: Apr 27, 2026
Last synced: May 23, 2026

YodaLingua-Farsi: 72 Hours of Farsi Speech for TTS and ASR

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info