Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,925 datasets
Azerbaijani Asr Zenfira is a speech dataset hosted on HuggingFace by tahmaz. The dataset card indicates it is intended for automatic speech recognition tasks. Its last update was recorded on February 20, 2026.
Urdu TTS Corpus Subset is a dataset hosted on Kaggle, likely containing audio recordings and corresponding text transcripts for speech synthesis. The dataset's author, size, and specific content details are not provided in the metadata. Users must download the dataset to verify its exact composition and suitability for their projects.
A speech audio dataset combining the LibriSpeech corpus with MUSAN augmentation data. The dataset is published on Kaggle, but specific details on size, creation date, and author are not provided in the metadata. Its content likely contains speech recordings augmented with noise and music samples for machine learning training.
Trained model weights and datasets for the BACHI chord recognition system. The data supports the paper 'BACHI: Boundary-Aware Symbolic Chord Recognition Through Masked Iterative Decoding on Pop and Classical Music' by Mingyang Yao and Ke Chen, accepted for ICASSP 2026. The dataset page was last updated on 2026-01 17.
Synthetic-dysarthric-speech is a dataset containing artificially generated and augmented speech samples simulating dysarthria, a motor speech disorder. It is intended for developing robust automatic speech recognition and semantic understanding systems. The dataset's creator, size, and update date are not specified.
A classification dataset for predicting the commercial success of music tracks on Spotify. The dataset likely contains audio features and metadata to categorize songs into High, Medium, or Low popularity tiers. It was sourced from Kaggle, but details on its creator, size, and specific features are not provided.
A dataset for building music recommendation systems, sourced from the Kaggle platform. The specific content, scale, and features are not detailed in the available metadata. Further details regarding the data's origin, collection method, and temporal coverage are unknown.
ACI-Bench-MedARC evaluates model performance in converting clinical dialogue into structured clinical notes. The dataset includes the benchmark and data from ablation studies testing different transcription methods. It was uploaded by mkieffer to HuggingFace and last updated on 2026-01-18.
AppleMusic/Spotify Hits 638k Tracks 2010-2023 is a dataset of music tracks from two major streaming platforms. It contains 638,000 tracks released between 2010 and 2023, sourced from Kaggle. The dataset likely includes audio features and popularity metrics for analysis.
An audio dataset for wake word detection, likely associated with the Livekit platform. The dataset was created by the author 'yepher' and was last updated in March 2026. Specific details on the number of samples, audio length, and recording conditions are not provided.
Filtered GOL Dataset is a Japanese text-to-speech resource containing approximately 1.2 million audio samples totaling 1,880 hours from 380 speakers. It was filtered by tts-dataset for TTS training, applying rules on text length, audio duration, and speaker minimums. The audio is in FLAC format at 44.1kHz and is packaged as a WebDataset.
Kaggle hosts a dataset of 3.3 million songs with corresponding musical feature data. The description states all song feature fields are filled out. The author, organization, and last update date are not specified.
XTTSv2 Finetuning Data 20260417 is a dataset for training or adapting text-to-speech models, published on Kaggle. The dataset likely contains audio recordings and corresponding text transcripts suitable for fine-tuning the XTTSv2 speech synthesis system. Specific details regarding its size, origin, and collection methodology are not provided in the available metadata.
1 to 10 million audio-transcription pairs extracted from Japanese adult games by NandemoGHS in January 2026. The dataset consists of entirely new audio clips and transcriptions with no overlap from the original version.
Thai_TTS_config is a dataset hosted on Kaggle. The title suggests it contains configuration files or parameters for Thai language text-to-speech (TTS) systems. The dataset's author, organization, size, and specific content are unknown.
50 distinct environmental sound classes are likely represented in this dataset. The dataset is hosted on Kaggle and is intended for machine learning tasks. Metadata is minimal; actual content requires verification after download.
USDOT Volpe National Transportation Systems Center collected this dataset of freeway car-following behavior using an Instrumented Research Vehicle in western Massachusetts during summer 2016. It contains instantaneous radar and GPS data points, processed from 6 data collection runs. The data describe velocity, acceleration, and relative position for classified car-following instances.
Six data collection runs captured freeway car-following behavior in western Massachusetts during the summer of 2016. The USDOT Volpe National Transportation Systems Center processed, refined, and cleaned the data, isolating individual car-following instances. This table contains those classified instances, with columns describing vehicle dynamics, road conditions, and work zone status.
An invoice dataset published on Kaggle. The dataset likely contains structured or semi-structured information related to business transactions. Specific details such as the number of records, columns, and collection methodology are not provided in the available metadata.
Tamil language audio data for automatic speech recognition (ASR). The dataset is published on Kaggle and likely contains speech recordings and corresponding transcriptions. The Indian Institute of Science (IISc) MILE lab is inferred as the source, but specific details on size, collection method, and time range are unavailable.