Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,922 datasets
106 speakers (45 male, 61 female) contributed genuine speech recordings with minimal channel or background noise. The database includes spoofed speech generated from the genuine data using several different spoofing algorithms. It is partitioned into training, development, and evaluation subsets for use in the ASVspoof 2015 challenge.
A Bengali language dataset containing paired noisy and clean audio files. The data is described as ready for machine learning tasks. The dataset was sourced from Kaggle, but details on its creator, size, and update date are unavailable.
German-language audio recordings of patient-doctor interactions. The dataset is described as being for generating positivity tips. It was sourced from Kaggle, but details on its size, creation date, and authorship are unknown.
YodaLingua-Croatian is a speech dataset by Thomcles containing 5,655 audio-transcription pairs. The collection totals 11 hours of Croatian speech from 230 distinct speakers, last updated in January 2026. It is designed for training text-to-speech and automatic speech recognition systems.
An audio dataset featuring general utterances spoken by Malay speakers from Malaysia. The dataset is hosted on Kaggle, but specific details on size, collection method, and licensing are not provided. The original author and organization are unknown.
MusicalInstruments-dataset is a collection of audio data related to musical instruments, published on Kaggle. The dataset's specific contents, such as the number of samples, recording conditions, and instrument types, are not detailed in the available metadata. Users must download the dataset to verify its scale, format, and suitability for their projects.
Algerian Arabic speech recordings likely contain general conversation and customer service interactions. The dataset appears to be sourced from Kaggle, but its size, author, and update date are unknown. Its specific collection method and time range are not provided.
A subset of the Librispeech corpus, published on huggingface by sahara22 and last updated on 2026-03-22. The dataset likely contains audio files and corresponding transcriptions for training and evaluating automatic speech recognition models. Its specific size, format, and licensing details are not provided in the available metadata.
Fusha Arabic speech data for general conversation scenarios, likely related to customer service interactions. The dataset is hosted on Kaggle and includes platform tags indicating its use for speech recognition. Specific details on volume, collection method, and recency are not provided in the input.
Belgian Dutch speakers contributed to this audio dataset of general utterances. The dataset is hosted on Kaggle, but details on the number of speakers, recording length, and collection methodology are not provided. The author, organization, and license information are also unknown.
Kaggle hosts an audio dataset focused on water-related sounds. The dataset likely contains recordings of water in various contexts, such as flowing, dripping, or splashing. Metadata is minimal; the exact content, scale, and collection details require verification after download.
700 hours of processed speech data for Hindi, English, and Hinglish (code-mixed) text-to-speech applications. The dataset, created by adjaysagar, includes train and validation manifests and a preprocessing script. It was last updated in February 2026.
Tts Dutch is a dataset hosted on HuggingFace by datadriven-company. The dataset was last updated on March 11, 2026. Its specific content and scale are not described in the provided metadata.
Crowdsourced audio recordings of respiratory sounds filtered to include only Asthma and Healthy subjects. The dataset is hosted on Kaggle and is intended for binary classification tasks. Details on the number of samples, recording specifics, and collection methodology are not provided in the available metadata.
J-HARD-TTS-Eval is a benchmark dataset for evaluating autoregressive Japanese Text-To-Speech models. It focuses on specific failure modes including stability in short sequences, repetition handling, and context completion. The dataset was created by Parakeet-Inc and last updated in January 2026.
StrikerData is an audio dataset containing human speech, environmental noise, and other sound types. It was developed by Strikersoft for research and development in audio and speech technologies. The dataset was last updated on January 22, —.
7 stress-test categories of evaluation samples designed for calculating domain-wise Character Error Rate (CER) scores. The dataset contains unique sentence-language pairs to ensure clean metrics for Text-to-Speech (TTS) robustness testing.
Test_Music is a dataset hosted on Kaggle. The dataset's specific content, size, and origin are not detailed in the available metadata. Further details about the data's creation, scope, and structure require verification after download.
An audio dataset titled 'music-model-h5' is hosted on Kaggle. The dataset's specific content, size, and structure are not detailed in the provided metadata. Its platform tags suggest it is related to machine learning and audio processing.
DailyTalkEdit provides paired original and modified audio files from dialogues, with annotations for modified time ranges and semantic influence. The dataset, created by wsntxxn, was last updated on Hugging Face in February 2026. It includes separate audio segments for modified utterances and structured metadata files for training, validation, and testing splits.