M Ailabs Speech Dataset Fr

Name: M Ailabs Speech Dataset Fr
Creator: gigant
Published: 2022-03-02T23:29:22
Keywords: Regionus, Languagefr, Task Categoriesautomatic Speech Recognition, Licensecc

by gigantUpdated 3y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

1,000 hours of audio recordings and transcriptions derived from LibriVox and Project Gutenberg for speech recognition and synthesis. The collection features French audio clips between 1 and 20 seconds in length paired with literary texts published from 1884 to 1964.

Use Cases

Train speech synthesis models using the audio clips and matching transcriptions.
Fine-tune speech-to-text engines using the prepared text-files as ground truth labels.
Analyze phonetic variations in French literary readings using the clip-level transcriptions.

Strengths

Nearly 1,000 hours of total audio data across the M-AILABS collection.
Audio clips standardized to lengths between 1 and 20 seconds.
Includes detailed metadata for each subset within info.txt files.
Text transcriptions provided for every individual audio clip.

Regionus Languagefr Task Categoriesautomatic Speech Recognition Licensecc

Related Datasets

Quality Score

D29

Description

34

Source

36

Reputation

9

Access

22

Community

30 downloads

0 views

Dataset Info

Author: gigant
Created: Mar 2, 2022
Updated: Oct 24, 2022
Last synced: Apr 29, 2026

Access

22

Community

30 downloads

0 views

Dataset Info

Author: gigant
Created: Mar 2, 2022
Updated: Oct 24, 2022
Last synced: Apr 29, 2026

M Ailabs Speech Dataset Fr

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info