Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,912 datasets
Cleaned Asr Transcripts is a text dataset published on Hugging Face by author bingbangboom. The dataset likely contains processed transcripts generated by an Automatic Speech Recognition (ASR) system. It was last updated on March 24, 2026.
VoxCeleb2 is an audio dataset published on Hugging Face by the author 'humanify'. The dataset was last updated on March 25, 2026. Its specific content, size, and license details are not provided in the available metadata.
Audio data related to music, likely intended for training machine learning models. The dataset is hosted on Kaggle, but its specific contents, size, and creation details are not provided in the metadata. Users must download the dataset to verify its exact composition and quality.
Large-scale CC0 Pashto speech dataset for Automatic Speech Recognition (ASR). The dataset is part of the Common Voice project, version 25.0, and is hosted on Kaggle. Its specific collection method, size, and contributor details are not provided in the available metadata.
An AI-generated voice dataset for the Nepali language, published on Kaggle. The dataset is likely designed for text-to-speech (TTS) synthesis, modeled after the LJ Speech dataset structure. Its specific size, creation date, and author details are not provided in the available metadata.
Synthetic audio data generated based on the VS13 framework, likely containing simulated vehicle sounds. The dataset is hosted on Kaggle, but details on its size, creation method, and specific contents are not provided. Metadata is minimal; actual content requires verification after download.
A multilingual automatic speech recognition dataset covering 30 Indic dialects and languages. It contains over 2.8 million audio samples with corresponding transcriptions. The dataset was created by author grushaaaaa and last updated on Hugging Face in February 2026.
UniDataPro's collection features 338 hours of Russian telephone dialogues recorded from 460 native speakers across diverse topics. Updated in January 2026, the data is specifically formatted for automatic speech recognition (ASR) research and model training. It maintains a verified 98% Word Accuracy Rate for its transcriptions.
Pulse 2026 is a high-fidelity synthetic music dataset with engineered streaming metrics. The dataset appears to focus on music evolution and viral analytics. Its specific source, size, and creation date are unknown.
Music-Gen-Task3-Split is a dataset hosted on Kaggle, likely related to a music generation challenge. The title suggests it contains audio data split for a specific machine learning task, though the exact content and structure are unspecified. No information is available regarding its author, size, or creation date.
Hate speech detection data spanning two major languages, English and Spanish. The dataset is hosted on Kaggle, but its specific collection method, size, and annotation details are not provided in the available metadata. Researchers must download the dataset to inspect its volume, annotation schema, and source characteristics.
This Russian speech corpus contains audio recordings across diverse genres including podcasts, public speeches, YouTube content, audiobooks, and phone calls. The dataset was processed using the BALALAIKA pipeline by the MTUCI lab260 team to provide high-quality annotations for generative speech tasks.
ASR new is a dataset published on Kaggle. The title suggests it contains audio data for training or evaluating automatic speech recognition systems. The dataset's specific content, size, and origin require verification after download.
ASR Arabic checkpoints likely contain pre-trained model weights for Arabic automatic speech recognition. The dataset is published on Kaggle, but its specific size, creation date, and author are unknown. Its content suggests it is intended for developers working on Arabic speech technology.
Metacreation released this collection of 2.1 million unique MIDI files in 2026 for symbolic music research. The dataset features detailed annotations for expressive loop detection, incorporating performance nuances such as microtiming and dynamics.
Real-world Philippine English pharmacy calls for medical speech AI training. The dataset appears to consist of audio recordings from pharmacy interactions. Specific details on size, collection date, and creator are not provided in the input.
A dataset containing audio and text data, hosted on Hugging Face by author NjNBrl. It was last updated in March 2026. Specific content details such as genre, instruments, or recording sources are not provided.
High-quality Spanish speech data is available for training AI models in medical telemarketing contexts. The dataset is hosted on Kaggle, but its creator, size, and specific recording details are not provided. Its primary purpose is to support the development of speech recognition and synthesis systems for a specific commercial domain.
A dataset titled 'Musicsoundai' is hosted on Kaggle. Its content likely pertains to music or audio signals for artificial intelligence tasks. The dataset's specific contents, scale, and authorship are unknown due to minimal metadata.
An audio dataset of general utterances spoken by Italian speakers from Italy. The dataset's author, organization, size, and specific recording details are not provided in the available metadata. Further information regarding the number of speakers, audio length, and collection methodology is unknown.