Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Featuring approximately 227.7 hours of high-quality Malay speech audio synthesized by the ms-MY-OsmanNeural voice. The audio is sourced from two text corpora: Malay Wikipedia and News articles (94.5 hours) and transcripts from the Malaysian Parliament (133.2 hours). All audio has a 24,000 Hz sample rate and uses sentences ranging from 2 to 20 words.
The dataset is hosted as a WEBDATASET format; users should be familiar with this format or the associated malaya-speech GitHub repository for loading and processing. The license is not specified in the provided input.