DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Kalenjin Speech Audio and Transcripts from Mozilla Common Voice | DataSalon

Home Speech & AudioKalenjin Speech Audio and Transcripts from Mozilla Common Voice

Speech & Audio

Kalenjin Speech Audio and Transcripts from Mozilla Common Voice

Name: Kalenjin Speech Audio and Transcripts from Mozilla Common Voice
Creator: kln001
Published: 2025-07-28T11:11:35
Keywords: Kalenjin Language, Audio, African Languages, Speech Recognition

by kln001·Updated 10mo ago

Available on 1 platform

Description

Audio clips and transcriptions of Kalenjin speech sourced from the Mozilla Common Voice project. The dataset was created by author kln001 and last updated on July 28, 2025. It is intended for training and evaluating Automatic Speech Recognition models.

Use Cases

Train Automatic Speech Recognition models based on Kalenjin audio clips.
Evaluate ASR model performance on the Kalenjin language using validated test sets.
Benchmark speech recognition accuracy for underrepresented languages using the provided transcriptions.

Strengths

Data is sourced from the Mozilla Common Voice project, a known open-source speech data initiative.
The dataset is structured with splits for training, testing, and validation, which suggests a standard machine-learning workflow.

Limitations

The total number of audio hours, row count, and file formats are unknown, limiting suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: Mozilla Common Voice project
Collection Method: Likely contributed by volunteers recording and transcribing speech.
Freshness: Last updated 2025-07-28 11:26:44; freshness should be verified.

Audio Kalenjin Language African Languages Speech Recognition

Related Datasets

Quality Score

D37

Description

Source

Reputation

Quality Score

D37

Description

Source

Reputation

Access

Community

9 downloads

2 likes

0 views

Dataset Info

Author: kln001
Created: Jul 28, 2025
Updated: Jul 28, 2025
Last synced: May 9, 2026

Access

Community

9 downloads

2 likes

0 views

Dataset Info

Author: kln001
Created: Jul 28, 2025
Updated: Jul 28, 2025
Last synced: May 9, 2026

Kalenjin Speech Audio and Transcripts from Mozilla Common Voice

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info