Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,926 datasets
NO music is a dataset published on Kaggle, likely containing audio samples for classification tasks. The dataset's specific content, size, and origin are not detailed in the provided metadata. Its platform tags suggest it is focused on audio data and classification.
An audio dataset published on Kaggle. The dataset is associated with platform tags for audio and 'Indotts'. Specific details regarding its size, content, and creation are not provided in the available metadata.
A list of summer recreation programs for 2025 provided by Montgomery County, Maryland. The dataset includes program details such as activity names, categories, dates, times, age requirements, locations, and costs. It was last updated on June 12, 2025, and is hosted on the Socrata platform by data.montgomerycountymd.gov.
Lexia-Labs published this French-language benchmark for automatic speech recognition on Hugging Face in February 2026. The dataset likely contains audio recordings of mathematical speech for evaluating ASR systems. Its specific content, size, and structure require verification after download.
L2 Librittsr is a speech dataset published on huggingface by Piping. The dataset's title and platform tags suggest it contains audio and text data, likely for speech recognition or text-to-speech tasks. Its last recorded update was on 2026-02-13.
A bilingual benchmark dataset for evaluating audio deepfake detectors against text-to-speech and voice conversion systems. The dataset's author, organization, and specific scale are not provided in the metadata. It was sourced from the Kaggle platform, but the last update date is unknown.
111,000 aviation incident reports collected by NASA's Aviation Safety Reporting System (ASRS) between 2005 and 2025. The dataset likely contains narrative descriptions of safety events and anomalies. The raw description indicates it is sourced from NASA.
Geographical Origin of Music is a dataset from the UCI Machine Learning Repository for predicting the geographic origin of music recordings. It contains audio features extracted from songs, likely for classification tasks. The original creator and specific collection date are not provided.
UCI hosts the Turkish Music Emotion dataset, which contains audio recordings and extracted features for emotion analysis. The dataset's specific size, creator, and creation date are not provided in the available metadata. It is designed for computational analysis of emotional content in music.
FMA is a dataset for music analysis, containing audio tracks and associated metadata. It includes features for music genre classification, audio analysis, and music information retrieval. The dataset was created by researchers and is hosted on the UCI Machine Learning Repository.
Featuring audio speech recordings for the Chichewa language, intended for training language AI models. The specific row count, column structure, and recording details are not provided in the input.
Encompassing audio recordings of general conversation customer speech in Egyptian Arabic. The specific number of recordings, duration, and features are not detailed in the input.
A combined collection of Myanmar language speech data from three sources for ASR tasks. The dataset merges the Myanmar Speech Dataset from Google Fleurs, OpenSLR-80, and a third audio-transcription repository. It was created by chuuhtetnaing and last updated on Hugging Face in December 2025.
An unofficial Arabic-only extraction of Mozilla Common Voice Corpus 18.0, prepared for Automatic Speech Recognition research. The dataset was created by MohamedRashad and last updated on 2025-12-27. It is derived from the original Common Voice 18 release, filtered to include only Arabic speech data while preserving the original dataset structure, splits, and metadata fields.
A merged speech dataset containing 41,427 audio segments from 88 original source datasets. The collection includes 222 speakers and features transcriptions and emotion labels for neutral, angry, sad, and happy speech. It was created by umutkkgz and last updated on Hugging Face in December 2025.
MAC-SLU is a benchmark dataset designed to evaluate Spoken Language Understanding systems on complex, multi-intent user commands within an automotive environment. It addresses limitations in diversity and complexity found in existing SLU datasets. The dataset, created by author Gatsby1984, was last updated on the Hugging Face platform in December 2025.
An unofficial, language-specific subset of the FLEURS dataset, last updated on 2025-12-27. The dataset is focused on Arabic (Egyptian) speech data and is designed for Automatic Speech Recognition (ASR) research and evaluation. It was created by MohamedRashad and follows the original FLEURS structure while being packaged as a standalone Arabic-focused dataset.
SOREVA is a multilingual speech dataset designed for evaluating text-to-speech and speech representation models. It contains approximately 150 audio and transcription samples for each of 49 African languages and dialects. The dataset was created by OlameMend and last updated in December 2025.
340,000+ weekly chart entries documenting every song on the Billboard Hot 100 from August 1958 through the current year. The records categorize music performance by rank, artist, and song columns across more than six decades of US music history.
Aria-MIDI contains 1,186,253 MIDI files representing approximately 100,629 hours of transcribed solo-piano music. The dataset was created by author loubb and includes metadata categories such as genre, composer, performer, and compositional identifiers. It was last updated on December 14, 2025.