Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,925 datasets
Kaggle hosts a dataset titled 'Music Chorus Detection dataset v1.0'. The dataset likely contains audio files or features for identifying chorus sections within songs. Columns and specific content details are unknown.
PhoAudiobook is a high-quality and large-scale Vietnamese speech dataset curated for zero-shot text-to-speech. The dataset construction and experimental results are detailed in the ACL 2025 paper 'Zero-Shot Text-to-Speech for Vietnamese' by Thi Vu, Linh The Nguyen, and Dat Quoc Nguyen. The dataset page was last updated on Hugging Face in January 2026.
Kaggle hosts a dataset titled '30 musical instruments'. The dataset's specific content, size, and creation details are not provided in the available metadata. Its origin, author, and temporal coverage are unknown.
Musical Instrument is a dataset hosted on Kaggle. The dataset likely contains audio samples or metadata related to musical instruments. Specific details regarding its size, creator, and collection method are not provided in the available metadata.
A Sinhala speech dataset for automatic speech recognition (ASR) tasks, published on the Hugging Face platform by author SPEAK-ASR. The dataset was last updated on 2026-02-18. Its specific size, format, and content details are not provided in the metadata.
A Turkish text-to-speech dataset created by author omersaidd to improve the performance of open-source TTS models for the Turkish language. The dataset was last updated on Hugging Face on January 11, 2026. It was constructed by processing videos from various sources into a training-ready format.
An audio dataset for sound classification tasks, published on Kaggle. The dataset's title suggests it contains audio recordings related to the sound of boiling. Specific details on size, collection method, and creator are not provided in the available metadata.
VoxCeleb1-benchmark is a dataset likely used for benchmarking speaker recognition and verification systems. It is hosted on Kaggle and likely contains audio samples. The dataset's specific size, source, and temporal coverage are unknown.
An audio dataset for Automatic Speech Recognition (ASR) in the Assamese language. It was published on Kaggle, but the specific collection method, size, and creator are not detailed in the provided metadata. The dataset's content and structure require verification after download.
An audio dataset focused on sounds related to cooking activities. The dataset is hosted on Kaggle and is tagged for audio-related tasks. Specific details on the number of recordings, file formats, and collection methodology are not provided in the available metadata.
Western Massachusetts freeway and arterial data from summer 2016, describing car-following behavior for work zone planning. The dataset contains metadata for 6 data collection runs, processed by the USDOT Volpe National Transportation Systems Center. It includes variables like run conditions, date, and traffic direction.
EmalonSpeech V0.1 is a high-fidelity, single-speaker speech dataset designed for low-resource languages. It was created by Dayananda Thokchom of YAAI DYNAMICS, with speaker Helly Maisnam, and was released on Hugging Face in January 2026. The dataset aims to address the gap in TTS resources for languages underrepresented in current research.
2001-2006 survey data from the Behavioral Risk Factor Surveillance System (BRFSS) provides a health profile of Massachusetts adults categorized by sexual orientation identity. The dataset was published on paperswithcode by author Kerith J. Conron. It likely contains tabular data comparing health outcomes and risk factors across different demographic groups.
A dataset titled HASRSH, published on Kaggle. The dataset's specific content, size, and origin are not detailed in the provided metadata. Further verification is required to confirm its exact nature and scope.
Test_Music is a dataset hosted on Kaggle. Its specific content, size, and origin are not detailed in the provided metadata. The title suggests it likely contains audio files or related features for testing purposes in music-related domains.
A speech audio dataset derived from the LibriSpeech corpus, likely containing processed or synthesized samples to model children's speech characteristics. The dataset title suggests a scale of 10,000 to 15,000 audio samples. It is hosted on Kaggle, but the original author, collection method, and specific time range are unknown.
A dataset titled 'Musica' hosted on Kaggle. The dataset's specific content, size, and origin are not detailed in the available metadata. Further inspection after download is required to confirm its scope and structure.
Librispeech-Childrenized-5000 is a speech dataset derived from the LibriSpeech corpus, likely containing 5,000 audio samples. It appears to be a modified version tailored for children's speech characteristics, published on Kaggle. The specific source, collection method, and temporal details are not provided in the available metadata.
A Kaggle-hosted dataset titled 'Librispeech_train-clean-100', likely containing audio files for automatic speech recognition (ASR) model training. The title suggests it is a subset of the LibriSpeech corpus, comprising 100 hours of 'clean' speech. Specific details on size, format, and provenance require verification after download.
A 15GB variant of the LibriVAD dataset, which is built on the LibriSpeech corpus. The dataset is noise-augmented, suggesting it is designed for training models in noisy acoustic environments. Its author, organization, and specific creation date are unknown.