Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
2,018 datasets
Featuring approximately 227.7 hours of high-quality Malay speech audio synthesized by the ms-MY-OsmanNeural voice. The audio is sourced from two text corpora: Malay Wikipedia and News articles (94.5 hours) and transcripts from the Malaysian Parliament (133.2 hours). All audio has a 24,000 Hz sample rate and uses sentences ranging from 2 to 20 words.
June 2022 release of an Amharic speech dataset for automatic speech recognition. The dataset was created by Ephrem and is hosted on HuggingFace, categorized as containing approximately 1,000 samples. It is designed for training and evaluating ASR models for the Amharic language.
Comprising text data for training text-to-speech models in the Kinyarwanda language. It was created by DigitalUmuganda and last updated in May 2022. The dataset is tagged as containing between 1K and 10K entries.
A speech recognition dataset developed by duclee9x, likely associated with Ton Duc Thang University (TDTU), uploaded to Hugging Face in May 2022. It contains audio recordings intended for training and evaluating automatic speech recognition systems. The specific volume, duration, and recording details are not provided.
Librispeech40 is a speech audio dataset derived from the LibriSpeech corpus, containing English audiobook recordings. The dataset was created by Voicemod and uploaded to Hugging Face in May 2022. Specific volume and row counts are not detailed in the provided metadata.
Encompassing music sources, background noise, and impulse-response samples used for training a neural audio fingerprinting model described in a 2021 arXiv paper. The audio files are 16-bit PCM mono WAV format with an 8000 Hz sampling rate. The dataset is hosted on Hugging Face and was last updated in April 2022.
Librispeech39 is a subset of the LibriSpeech corpus, a widely used collection of read English speech derived from audiobooks. The dataset was uploaded to the Hugging Face platform by user 'arampacha' in May 2022. It is designed for training and evaluating automatic speech recognition systems.
Librispeech 100H is a subset of the LibriSpeech corpus containing 100 hours of English speech audio. The dataset was created by namnv1906 and uploaded to Hugging Face in May 2022. It is derived from public domain audiobooks from the LibriVox project.
Librispeech10H is a 10-hour subset of the LibriSpeech corpus, containing English speech audio and corresponding transcriptions. It was created by user ahazeemi and published on Hugging Face in April 2022. The dataset is structured for machine learning tasks, as indicated by its platform tags.
A 2004 prefectural decree defines acoustic zones for land transport infrastructure in Finistère, France, including a section straddling Morbihan. The Bureau de Recherches Géologiques et Minières provides these data as the latest available, segmented by municipality. The zones are not prohibitive but require facade isolation calculations for new buildings.
Asr Glue Train is a benchmark dataset for training automatic speech recognition models, created by user 'voidful' and hosted on Hugging Face. The dataset was last updated in April 2022 and is categorized as a text modality resource with a size between 1 and 10 million entries. Its specific structure and content are derived from the GLUE-style framework for evaluating model performance.
Raw audio waveforms of single-instrument piano music, specifically Beethoven sonatas. It was introduced in the SampleRNN paper by Mehri et al. (2017) and later used for training music generation models in the paper 'It's Raw! Audio Generation with State-Space Models'.
YouTubeMix is a raw audio waveform dataset used for training music generation models. It contains single-instrument piano music derived from a specific YouTube video audio track. The dataset was created by krandiash and was last updated in February 2022.
This dataset was created by Nordic Language Technology for developing automatic speech recognition and dictation systems for Norwegian. The files have been renamed to be unique and meaningful, and metadata has been converted from SPL to anonymized JSON format with UTF-8 encoding. The specific number of audio files, rows, or columns is not provided in the input.
Urdu ASR Flags2 is an audio dataset for automatic speech recognition tasks. It was created by abidlabs and last updated on March 20, 2022. The dataset is categorized as having approximately 1,000 samples.
A collection of audio files for Urdu speech recognition, hosted on Hugging Face by user kingabzpro. The dataset is categorized as containing approximately 1,000 samples and was last updated in March 2022.
Urdu Asr Flags is an audio dataset for Urdu automatic speech recognition, hosted by abidlabs on Hugging Face. The dataset was last updated on March 20, 2022. It is categorized as containing n1K (1,000+) audio samples.
Asr Book Lm V2.3 is a text corpus for language modeling in automatic speech recognition systems, created by Jiejie. The dataset was last updated in March 2022.
9,283 recorded hours of audio in MP3 format paired with corresponding text files across 60 different languages. The collection includes 7,335 validated hours and features demographic metadata such as age, sex, and accent for a subset of the recordings.
Asr Book Lm V2.1 is a dataset for training automatic speech recognition language models, created by Jiejie. It was last updated on March 14, 2022.