Commonvoice 17 Tr Fixed

Name: Commonvoice 17 Tr Fixed
Creator: ysdede
Published: 2024-12-26T15:53:44
Keywords: Size Categories10 Kn100 K, Librarypolars, Modalityaudio, Licensecc0 10, Modalitytext, Librarymlcroissant, Librarydatasets, Librarypandas, Parquet, Languagetr, Regionus, Task Categoriesautomatic Speech Recognition

by ysdedeUpdated 1y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

A corrected version of the Mozilla CommonVoice 17 Turkish corpus for speech recognition tasks. It utilizes filename stems as unique keys to reorganize the data structure and improve split consistency for model training.

Use Cases

Train Turkish automatic speech recognition models using the audio recordings and text transcriptions
Perform data deduplication and split validation using the filename stems as unique identifiers
Fine-tune acoustic models on the Turkish language using the fixed dataset structure

Strengths

Based on the Mozilla CommonVoice 17 Turkish dataset
Implements filename stems as unique keys for data integrity and deduplication
Optimized for speech recognition training through reorganized data splits

Parquet Size Categories10 Kn100 K Librarypolars Modalityaudio Licensecc0 10 Modalitytext Librarymlcroissant Librarydatasets Librarypandas Languagetr Regionus Task Categoriesautomatic Speech Recognition

Related Datasets

Quality Score

D35

Description

39

Source

36

Reputation

35

Access

22

Community

199 downloads

10 likes

0 views

Dataset Info

Author: ysdede
Created: Dec 26, 2024
Updated: Dec 26, 2024

Access

22

Community

199 downloads

10 likes

0 views

Dataset Info

Author: ysdede
Created: Dec 26, 2024
Updated: Dec 26, 2024

Commonvoice 17 Tr Fixed

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info