Multi-Speaker Sinhala Audio with Manual Quality Checks

Name: Multi-Speaker Sinhala Audio with Manual Quality Checks
Creator: keshan
Published: 2022-03-02T23:29:22
Keywords: Regionus

by keshanUpdated 4y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Featuring multi-speaker, high-quality transcribed audio data for the Sinhala language, consisting of wave files and a TSV file. The data was manually quality checked and was collected by Google in Sri Lanka and contributed by the Path to Nirvana organization.

Use Cases

Train a text-to-speech model using the high-quality audio wave files and their corresponding transcriptions.
Develop a multi-speaker voice synthesis system leveraging the speaker information embedded in the FileID structure.
Build a Sinhala speech recognition model by training on the paired audio and transcription data.
Analyze speaker-specific phonetic characteristics using the UserID and audio file associations.

Strengths

Data has undergone manual quality checking.
Contains multi-speaker audio, providing speaker diversity.
Includes high-quality transcribed audio specifically for the Sinhala language.

Limitations

The dataset description acknowledges there may still be errors despite manual checks.
Key metadata such as row count, column count, file size, and license are unknown.
The last update was in October 2021, which may limit relevance for current applications.

Provenance

Source: Collected by Google in Sri Lanka and contributed by Path to Nirvana organization.
Collection Method: Collection and contribution as described; specific gathering method unknown.
Freshness: Last updated on 2021-10-04.
Geography: Sri Lanka (Sinhala language).

The dataset consists of wave files and a TSV file, with a 'si_lk.lines.txt' file containing FileID (which includes UserID) and Transcription. Specific file formats and sizes are unknown.

Regionus

Related Datasets

Quality Score

D27

Description

24

Source

41

Reputation

10

Access

22

Community

17 downloads

2 likes

0 views

Dataset Info

Author: keshan
Created: Mar 2, 2022
Updated: Oct 4, 2021
Last synced: Apr 29, 2026

Access

22

Community

17 downloads

2 likes

0 views

Dataset Info

Author: keshan
Created: Mar 2, 2022
Updated: Oct 4, 2021
Last synced: Apr 29, 2026

Multi-Speaker Sinhala Audio with Manual Quality Checks

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info