Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,924 datasets
J-HARD-TTS-Eval is a benchmark dataset for evaluating autoregressive Japanese Text-To-Speech models. It focuses on specific failure modes including stability in short sequences, repetition handling, and context completion. The dataset was created by Parakeet-Inc and last updated in January 2026.
StrikerData is an audio dataset containing human speech, environmental noise, and other sound types. It was developed by Strikersoft for research and development in audio and speech technologies. The dataset was last updated on January 22, —.
7 stress-test categories of evaluation samples designed for calculating domain-wise Character Error Rate (CER) scores. The dataset contains unique sentence-language pairs to ensure clean metrics for Text-to-Speech (TTS) robustness testing.
Test_Music is a dataset hosted on Kaggle. The dataset's specific content, size, and origin are not detailed in the available metadata. Further details about the data's creation, scope, and structure require verification after download.
An audio dataset titled 'music-model-h5' is hosted on Kaggle. The dataset's specific content, size, and structure are not detailed in the provided metadata. Its platform tags suggest it is related to machine learning and audio processing.
DailyTalkEdit provides paired original and modified audio files from dialogues, with annotations for modified time ranges and semantic influence. The dataset, created by wsntxxn, was last updated on Hugging Face in February 2026. It includes separate audio segments for modified utterances and structured metadata files for training, validation, and testing splits.
A text dataset containing Bengali language content, likely annotated for hate speech detection. It is hosted on the Kaggle platform. The dataset's author, size, and specific annotation schema are not provided in the available metadata.
A GAMETES dataset for evaluating genetic association methods, focusing on heterogeneity. The dataset name indicates it contains 20 attributes and has a heritability parameter of 0.4.
RTTS_COCO is a dataset hosted on Kaggle. The title suggests it contains images related to road traffic and transportation scenes, likely formatted in the COCO annotation style. Its specific contents, scale, and origin require verification after download.
NASA ASRS raw batch export from the DFOnline system, intended as input for an ETL pipeline. The dataset covers aviation safety reports submitted over a 20-year period from 2005 to 2025. Its specific contents, such as report narratives or coded fields, must be inferred from the source system.
Indic TTS Checkpoint Session3 is a dataset published on Kaggle. The title suggests it contains model checkpoint files for a text-to-speech system focused on Indic languages. The dataset's specific content, size, and structure require verification after download due to minimal provided metadata.
Indic TTS Merged Arrow is a dataset published on Kaggle. The title suggests it contains data for text-to-speech synthesis, likely for languages from the Indian subcontinent. Metadata is minimal; the actual content, scale, and structure require verification after download.
100 female voice audio clips generated using Qwen3-TTS, based on the public-domain ITA-Corpus Emotion text dataset. The audio is provided in 24kHz mono WAV format, with each voice having a descriptive label.
Speech recognition data published on Kaggle. The dataset's specific content, scale, and origin are not detailed in the available metadata. Further inspection after download is required to confirm the actual audio files, transcripts, and recording conditions.
An audio dataset titled 'Real World Noise/Music' is hosted on Kaggle. The dataset likely contains recordings of environmental noise and music for analysis. Metadata such as column details, size, and license are currently unknown.
Sam Wake Word is a dataset uploaded to Hugging Face by author sh1vam10. The dataset's platform tags indicate it contains audio and text modalities, likely for wake word or keyword spotting tasks. It was last updated on March 20, 2026.
TAPS: Throat and Acoustic Paired Speech Dataset is a standardized corpus for deep learning-based speech enhancement, specifically targeting throat microphone recordings. The dataset provides paired recordings from 60 native Korean speakers, designed to address the high-frequency attenuation in throat mics caused by the low-pass filtering effect of skin and tissue.
Kaggle hosts a dataset titled 'music_file'. The dataset likely contains audio files related to music. Metadata is minimal; the specific content, scale, and origin require verification after download.
Audio recordings of general utterances feature Hindi speakers from India. The dataset's size, collection date, and creator are not specified in the provided metadata. It is hosted on the Kaggle platform.
A collection of French-language audio recordings from a medical call center. The dataset is hosted on Kaggle and is intended for speech processing tasks in the healthcare sector. Specific details on size, creation date, and authorship are not provided.