Cuge: Common Voice Multilingual Speech Corpus

Name: Cuge: Common Voice Multilingual Speech Corpus
Creator: guoqiang
Published: 2022-03-02T23:29:22
Keywords: Regionus

by guoqiangUpdated 4y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

9,283 recorded hours of audio in MP3 format paired with corresponding text files across 60 different languages. The collection includes 7,335 validated hours and features demographic metadata such as age, sex, and accent for a subset of the recordings.

Use Cases

Train speech-to-text models using the MP3 audio and corresponding text files
Analyze speech patterns across different demographics using the age, sex, and accent metadata
Develop language-specific acoustic models for any of the 60 supported languages

Strengths

9,283 total recorded hours of audio data
7,335 validated hours across 60 distinct languages
Includes demographic metadata fields for age, sex, and accent
Data format consists of MP3 audio files paired with text transcriptions

Regionus

Related Datasets

Quality Score

D32

Description

45

Source

36

Reputation

8

Access

22

Community

12 downloads

0 views

Dataset Info

Author: guoqiang
Created: Mar 2, 2022
Updated: Jan 25, 2022
Last synced: Apr 29, 2026

Access

22

Community

12 downloads

0 views

Dataset Info

Author: guoqiang
Created: Mar 2, 2022
Updated: Jan 25, 2022
Last synced: Apr 29, 2026

Cuge: Common Voice Multilingual Speech Corpus

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info