Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,907 datasets
Between September 20 and December 27, 2001, a Rapid Single-Particle Mass Spectrometer (RSMS) captured real-time composition data for individual aerosol particles in Pittsburgh. Each record includes aerodynamic particle size, positive and negative mass spectra, and precise measurement time, enabling analysis of particle-to-particle variation. The data covers nine logarithmically spaced size classes from about 40 to 1300 nanometers.
An anonymised dataset from a study investigating music performance anxiety and flow under performance simulation conditions. The dataset is 49.9 KB in size and was last updated on 2026-05-21. It was published by a research team under a CC-BY-4.0 license on figshare.
Primary survey results from a post-event questionnaire conducted in the coastal region of Toyama Prefecture, Japan, following the 2024 Noto Peninsula Earthquake tsunami. The dataset was created by Shuichi Kure and is associated with a 2025 research paper in Coastal Engineering Journal. It consists of 1.4 MB of data available in PDF, TXT, and XLSX formats.
Establishments of the Conservatory of Music and Dramatic Art of Quebec provides a list and geolocation of its establishments. The dataset is published by the Government and Municipalities of Québec under a CC-BY-4.0 license and was last updated on 2026-04-22.
Pidgin ASR Combined is a unified Nigerian Pidgin English speech-to-text dataset created by michaelodafe. It contains approximately 8.6 hours of audio across 4,278 clips from 10 source speakers, formatted as 16 kHz mono WAV files. The dataset was last updated on 2026-05-13 and was used to train a Whisper model that achieved a 21.37% word error rate.
1,200 code-switching utterances form a curated benchmark for evaluating commercial Automatic Speech Recognition systems. The dataset, created by Perle-ai, includes 300 samples each for four language pairs, such as Egyptian Arabic–English. It was last updated on May 21, 2026.
sWuggy is a spoken lexical-discrimination benchmark for evaluating spoken language models. Each item is a pair of a real word and a phonotactically matched pseudo-word, synthesized as audio. The dataset is hosted by the author 'coml' and was last updated on 2026-05-29.
A human-curated, multi-genre audio dataset generated with Suno V5.5 (chirp-fenix), covering 100+ sub-sub-genres across electronic, hip-hop, Latin, jazz, world, rock, ambient, pop, reggae, and classical music. Each track includes full audio (MP3), cover art, the original generation prompt, and a 32-column metadata schema. The dataset was created by author Kukedlc and last updated on 2026-05 25.
Gene expression data for the bacterial parasite Candidatus Aquirickettsia rohweri within the critically endangered coral Acropora cervicornis. The dataset compares parasite physiology under ambient versus nutrient-enriched conditions, as described in a research document authored by Lauren Speare and last updated in April 2026. The data is stored in a 576.8 KB DOCX file.
Profiles contain physiological orienting response data for 16 test sounds, measured by heart rate change in 22 participants. The dataset was created by Mako Katagiri and published on figshare in April 2026. It is a small dataset of 5.5 KB, stored in an XLS file.
Twenty-two healthy young male participants underwent auditory experiments during multi-day stays to measure physiological orienting responses to 16 test sounds. The dataset contains calculated heart rate interval differences and normalized orienting response strengths for musical and complex tones across four octaves. Mako Katagiri published this data on figshare in 2026 under a CC-BY-4.0 license.
Twenty-two healthy young male participants had their heart rate changes measured in response to 16 test sounds during a simulated daily-life experiment. The dataset contains results from a study by Mako Katagiri, published on figshare in April 2026, analyzing the reproducibility of orienting responses for alarm and pre-signal sound selection. Physiological data includes calculated RR interval differences and normalized orienting response strength for sounds spanning frequencies from 130.8 Hz to 1661.4 Hz.
Twenty-two healthy young male participants underwent auditory experiments during three-day stays, with physiological responses measured via heart rate changes. The dataset contains orienting response metrics for 16 distinct test sounds, including eight musical and eight complex tones, across a frequency range of 130.8 Hz to 1661.4 Hz. Researcher Mako Katagiri published this data on figshare in April 2026.
Twenty-two healthy young male participants' physiological responses to 16 test sounds, measured via heart rate changes in a simulated daily environment. Mako Katagiri created this dataset to analyze carryover effects in auditory signal perception. The dataset was last updated in April 2026.
Hamozwa created RepeatAudio for research in class-agnostic audio repetition counting. The dataset contains synthetic samples with varying noise and real-world samples from mechanical, ecological, and medical domains. It was last updated on May 29, 2026.
An evaluation dataset for automatic speech recognition systems designed to transcribe medical speech. It captures challenges specific to processing medical terminology, particularly branded drugs, within the Indian context. The dataset was created by priyamallojjala and was last updated on 2026-06-02.
A collection of GLaDOS voice lines scraped from the Portal Wiki. The dataset covers lines from Portal (2007), Portal 2 main campaign, Portal 2 cooperative mode, and other appearances. It was created by user ray0rf1re and last updated on 2026-05-31.
A 5.5 KB Excel file records time-dependent changes for a group undergoing music-based occupational therapy. The dataset was authored by Ibrahim Erarslan and last updated on May 18, 2026. Its specific variables and sample size are not detailed in the available metadata.
A speech recognition system designed for controlling industrial robots via phoneme-based commands. The system, developed by Adwait Naik of K J Somaiya Medical College, uses Linear Predictive Coding and comprises a microphone array, voice module, and a 3-DOF robotic arm. It was validated through experiments involving simple and complex sentences for tasks like cube manipulation and pick-and-place.
Uzbek YouTube content, including IT vlogs, news, and Tashkent-dialect podcasts, forms the basis of this speech dataset. It contains at least 37,807 audio clips across two splits, totaling over 135.9 hours of audio, curated by Saidakmal and last updated in May 2026. Each audio clip is paired with two automatic speech recognition transcriptions generated by Gemini and Whisper models.