Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,909 datasets
PainSpeech-4 is a speech dataset designed for automatic pain intensity assessment. The description indicates it contains multilevel labels for pain, suggesting a focus on clinical or affective computing applications. The dataset's author, organization, and specific collection details are not provided.
A free preview pack of high-fidelity clinical human voice recordings. The data is intended for training speech-to-text and text-to-speech systems. The dataset's author, organization, and specific size are unknown.
WildASR is a multilingual diagnostic benchmark built from real human speech to stress-test automatic speech recognition (ASR) robustness under real-world out-of-distribution conditions. The dataset decomposes robustness into axes including environmental degradation and demographic shift. It was created by bosonai and last updated on 2026-03 -25.
2,000 labeled 5-second audio clips comprise the ESC-50 dataset, organized into 50 classes with 40 clips each. It was created by Karol J. Piczak of Warsaw University of Technology from public field recordings on Freesound.org. The collection also includes a 10-class subset (ESC-10) and a larger unlabeled set (ESC-US) of 250,000 clips for unsupervised learning.
Sargasso Sea measurements of temperature, salinity, and dissolved oxygen were collected as part of the SYNoptic Ocean Prediction (SYNOP) experiment. The dataset contains profiles from multiple cruises conducted between Fall 1987 and Fall 1990, managed by investigators WATTS; DR. D. RANDOLPH and BANE; JOHN M. JR. It is hosted by the National Oceanic and Atmospheric Administration and also appears on NASA EarthData, indicating its recognized scientific value.
UniDataPro provides 13,000+ hours of real-world call center audio recordings featuring over 90% unique speakers. The collection includes time-stamped transcripts designed for training speech recognition and speaker diarization models in the customer service domain.
Plum Island Sound, Massachusetts, is the location for this dataset of hydrodynamic results from an extratropical storm between January and July 2018. It contains modeled or measured water levels, inundation depths, and flow direction and speed, linked to observations of ice-rafted sediment deposits. The data supports analysis of coastal storm impacts and sediment transport processes on a marsh surface.
ViMedCSS provides 24.3 hours of Vietnamese medical speech across 11,832 training utterances, developed for the LREC 2026 conference. Each recording features at least one English medical term embedded within Vietnamese speech to support code-switching automatic speech recognition (ASR).
Sub Reverb Asr Dataset 0.4 contains 45 audio samples organized across three subsets. The subsets are 'original', 'pointsource_noises', and 'real_rirs_isotropic_noises', each with 15 samples in a 'train' split. The dataset was created by sujalappa and was last updated on HuggingFace in March 2026.
June 26, 2006 bathymetric shapefile contains 10-meter depth contours for the continental shelf and 100-meter contours beyond the 200-meter shelf edge. The data was derived from NOAA National Geophysical Data Center Coastal Relief Models and reprojected by the Massachusetts Office of Coastal Zone Management. The dataset covers the New York Bight and Gulf of Maine regions.
Data from the Massachusetts Ecosystem Assessment Program, a state monitoring effort active until 2003. The program was a partnership with the EPA's National Coastal Assessment, focusing on water quality parameters in selected embayments. It was sponsored by the Environmental Protection Agency, Coastal 2000, and the Massachusetts Coastal Zone Management Program.
High-resolution seismic-reflection surveys map the stratigraphy of the nearshore areas from Chatham to Provincetown, Massachusetts. The U.S. Geological Survey Woods Hole Field Center conducted this investigation to correlate geologic units between the nearshore and onshore. The data defines the Quaternary geologic framework of outer Cape Cod.
A 2006 data set provides bathymetric contours for the Gulf of Maine and New England Shelf. The U.S. Geological Survey constructed it for geologic framework studies. It was reprojected into the NAD83 Massachusetts State Plane coordinate system by the Massachusetts Office of Coastal Zone Management.
Monitoring data tracks the environmental effects of secondary-treated sewage effluent discharged from a 9.5-mile outfall tunnel into Massachusetts Bay. The Environmental Quality Department (Enquad) collects this data to ensure compliance with an NPDES discharge permit for 43 communities. The dataset covers water quality in Massachusetts Bay, Boston Harbor, and Cape Cod Bay.
Massachusetts Bay hosts the as-built location of the Hubline, a 29.5-mile natural gas pipeline constructed primarily offshore between Beverly and Weymouth. The dataset was created by SCIOPS, representing the pipeline's surveyed bottom position. The route traverses 11 coastal communities including Salem, Boston, and Quincy.
NVIDIA's Granary dataset provides approximately 1 million hours of high-quality speech data across 25 European languages for speech recognition and translation. Released in 2026, it consolidates multiple sources into a unified framework to support low-resource language modeling. The collection is designed for both Automatic Speech Recognition (ASR) and Automatic Speech Translation (AST) tasks.
PICO-8 Games Dataset contains 10,967 game cartridges scraped from the Lexaloffle BBS. Each cartridge is decomposed into Lua source code, pixel-art spritesheets, tile maps, sound effects, music patterns, and metadata. The dataset was created by Fraser and includes label screenshots from the top 48 games by star count.
LibriVAD is a large-scale, noise-augmented dataset for Voice Activity Detection (VAD) generated from the LibriSpeech corpus. The dataset was created by LibriVAD and was last updated on March 17, 2026. It is designed for training and evaluating VAD models in noisy environments.
Vivoice Relabeled is a speech dataset derived from the original capleaf/viVoice collection. The dataset has been processed using the Qwen/Qwen3-ASR-1.7B model to update audio-text labels, retaining samples with a Word Error Rate below 15%. It was uploaded by author JayLL13 to Hugging Face in March 2026.
CMI-Pref provides between 1,000 and 10,000 human preference comparisons for multimodal music generation, published by HaiwenXia in 2026. Each record captures a human vote comparing two generated audio samples based on musicality, alignment, and confidence.