Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A collection of approximately 241 hours of high-quality Malay speech audio synthesized by the ms-MY-YasminNeural voice. The audio is split into two subsets: 99.4 hours from Malay Wikipedia and News texts, and 142 hours from Malaysian Parliament transcripts. All audio has a 24000 Hz sample rate and uses sentences between 2 and 20 words in length.
Associated code and notebooks are hosted on a separate GitHub repository (malaya-speech). The dataset page on Hugging Face should be consulted for the full description and potential license information.