Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,925 datasets
An audio dataset of Iranian folk music, sourced from Kaggle. The dataset's specific content, size, and collection methodology are not detailed in the provided metadata. Further verification is required to determine the exact number of recordings, their formats, and the recording conditions.
ASR recordings likely contain speech audio data intended for training or evaluating automatic speech recognition systems. The dataset is hosted on Kaggle, but details on its size, origin, and collection date are not provided. Columns and specific content are unknown, requiring verification after download.
A dataset titled 'offline_wheels_asr' is hosted on Kaggle. The dataset's specific content, size, and origin are not detailed in the provided metadata. Its title suggests a focus on automatic speech recognition, potentially for offline or embedded applications.
GeographicalOriginalofMusic is a dataset for predicting the geographical origin of music based on audio features. It is hosted on the OpenML platform, though specific details on its size, creator, and creation date are not provided in the input. The dataset's primary purpose is to link musical characteristics to specific geographic locations.
Audio samples from 30 different musical instruments, published on Kaggle. The dataset's specific size, recording conditions, and origin are not detailed in the available metadata. Further details about the collection methodology and audio characteristics require verification after download.
745 audio files totaling 1 hour and 40 minutes of Uzbek conversational speech, collected from open Telegram groups. The dataset was created by OvozifyLabs for evaluating speech-to-text models and was last updated on December 10, 2025. It features natural voice messages recorded in diverse acoustic conditions and speaking styles.
A demographic and economic profile of Brazilians in the United States and Massachusetts. The dataset likely contains aggregated statistics on population characteristics and economic indicators. It was authored by Alvaro Lima and sourced from the paperswithcode platform.
Compiam provides data and tools for the computational analysis of Indian Art Music (IAM), developed by the Music Technology Group (MTG). Updated in February 2026, the resource focuses on Music Information Retrieval (MIR) tasks specifically tailored for Hindustani and Carnatic musical traditions.
Thorsten-Voice provides German-language audio recordings and text transcripts for speech synthesis, created by Thorsten Müller and updated in February 2026. The dataset is designed to facilitate the creation of high-quality, offline German text-to-speech (TTS) models without licensing restrictions.
A collection of over 6.74 million unique and deduplicated MIDI files curated for music information retrieval and AI training. The dataset was created by 'projectlosangeles' and was last updated in December 2025. It includes normalized MIDI data and comprehensive metadata for symbolic music analysis.
A dataset by Tawney Tsang, published on paperswithcode, investigating cross-modal sensory associations. The data likely contains experimental results linking color perception to musical stimuli, with emotion and tempo as mediating factors. The specific scale, row count, and collection date are not provided in the metadata.
A paper discussing the intersection of biology and music, authored by Steven R. Brown and published on the paperswithcode platform. The content likely explores theoretical paradoxes in music from a biological perspective. The dataset's specific format, size, and structure are not detailed in the provided metadata.
A dataset on paperswithcode related to music therapy interventions for bereaved youth. The data likely contains clinical research materials, potentially including audio recordings and text, authored by Katrina Skewes McFerran. Temporal coverage and specific data volume are not provided in the available metadata.
Substance Use in Popular Music Videos is a dataset published on paperswithcode. The dataset likely contains analysis of substance use depictions in music videos. The author is Donald F. Roberts.
Barcelona Music Reward Questionnaire data likely contains survey responses related to the psychological experience of music. The dataset is authored by Ernest Mas‐Herrero and is hosted on the paperswithcode platform. The specific number of participants, survey questions, and collection period are not detailed in the available metadata.
Mirdata provides standardized Python loaders for Music Information Retrieval (MIR) datasets, maintained by the mir-dataset-loaders organization with updates through February 2026. It enables programmatic access to audio files and musical annotations such as beats, chords, and melodies across various research collections.
Dolly-Audio contains 1,000 hours of professionally cleaned Vietnamese speech audio featuring 152 speakers from various regions. Created by the Dolly AI Team and updated in December 2024, the corpus is designed to support speech synthesis and recognition research. It includes both audio recordings and corresponding text transcripts across multiple Vietnamese dialects.
Pittsburgh Bridges is a classic dataset from the UCI Machine Learning Repository containing structural and material details for bridges in Pittsburgh, Pennsylvania. It is widely used for classification and regression tasks in civil engineering and machine learning education. The original creator and exact time period are not specified.
3,500 balanced audio samples are provided for music information retrieval tasks. Each sample is represented by a 571-feature matrix.
LibriSpeech is a widely used public domain corpus derived from audiobooks. The dataset is published on Kaggle, making it accessible for download and experimentation. Its specific size, version, and update details are not provided in the available metadata.