Name: Khmer Speech Dataset: 134.6 Hours of Culturally Thematic Audio
Creator: DDD-Cambodia
Published: 2026-04-20T09:45:10
Keywords: Audio Data, Khmer Language, Cultural Speech, Audio, Speech Recognition, Cambodia, Synthetic

Description

Cambodian cultural speech data comprising 134.6 hours of manually curated speech-text pairs in the Khmer language. The dataset was created by DDD-Cambodia using eight native speakers and was last updated in May 2026. Recordings average 8.54 seconds in length and include speaker metadata such as gender, age group, and origin city.

Use Cases

Train automatic speech recognition models based on Khmer audio recordings.
Fine-tune speech-to-text systems for cultural domain topics based on the described thematic content.
Analyze speech patterns and acoustic features based on speaker metadata like gender and age group.
Develop language models for Khmer based on the transcribed cultural text.
Benchmark ASR model performance on a manually curated, culturally specific dataset.

Strengths

134.6 hours of manually curated speech-text pairs, providing a substantial audio corpus.
Speaker metadata includes gender, age group, and origin city for eight distinct native speakers.
Average recording length of 8.54 seconds with a standard deviation of 3.37 seconds, indicating consistent utterance duration.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect geographic and demographic bias inherent to the eight speakers from Cambodia.

Provenance

Source: DDD-Cambodia
Collection Method: Utterances were manually generated by eight native speakers based on predefined cultural topics and subtopics.
Freshness: Last updated 2026-05-15 10:22:01; freshness should be verified.
Geography: Cambodia

Audio Audio Data Khmer Language Cultural Speech Speech Recognition Cambodia Synthetic

Khmer Speech Dataset: 134.6 Hours of Culturally Thematic Audio

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info