Sign in to view source links and access this dataset
Description
An audio dataset featuring general utterances spoken by Malay speakers from Malaysia. The dataset is hosted on Kaggle, but specific details on size, collection method, and licensing are not provided. The original author and organization are unknown.
Use Cases
Train automatic speech recognition (ASR) models based on Malay audio utterances.
Develop text-to-speech (TTS) systems based on recorded Malay speech samples.
Conduct linguistic analysis of Malay phonetics and prosody based on the audio data.
Benchmark speech processing models for the Malay language based on the described utterances.
Strengths
Focuses on the Malay language, which is a less-resourced language for speech data.
Specifies the geographic origin of speakers as Malaysia, providing a regional context.
Limitations
Row count and total dataset size are unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Last update date is unknown; freshness unverified.
Provenance
Source
Kaggle
Collection Method
null
Time Range
null
Freshness
null
Geography
Malaysia
License is unknown; users must verify permissions before use.