DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,596 datasets

Speech & Audio

PainSpeech-4: Arabic Speech for Multilevel Pain Intensity Assessment

PainSpeech-4 is a speech dataset designed for automatic pain intensity assessment. The description indicates it contains multilevel labels for pain, suggesting a focus on clinical or affective computing applications. The dataset's author, organization, and specific collection details are not provided.

AudioArabic LanguageMedical AudioPain Assessment+1

0 views

Speech & Audio

Clinical AI Voice Dataset: Medical Terminology Sample

A free preview pack of high-fidelity clinical human voice recordings. The data is intended for training speech-to-text and text-to-speech systems. The dataset's author, organization, and specific size are unknown.

AudioAudio DatasetHealthcareMedical TerminologyClinical VoiceSpeech Recognition+1

0 views

Speech & Audio

MusicNet_SLP301: Music Audio Dataset

MusicNet_SLP301 is a dataset hosted on Kaggle, likely containing audio data related to music. Its specific content, scale, and creation details are not provided in the available metadata. The dataset's origin and intended application must be verified by downloading and inspecting the actual files.

AudioMachine Learning+1

0 views

Speech & Audio

ASREX 91: Ocean Temperature Time Series from the North Pacific

The eastern North Pacific, about 500 km from Vancouver Island, was the site of the Acoustic Surface Reverberation Experiment in 1991-1992. The Upper Ocean Processes Group deployed moorings to measure oceanographic variables, with this dataset likely containing temperature readings. Data was collected over 9.5 weeks during the winter of 1991-1992 at a sample rate of 7.5 minutes.

AudioTime SeriesNorth PacificOceanographyTemperatureAcoustic Research+1

0 views

Speech & Audio

WildASR: Multilingual Diagnostic Benchmark for ASR Robustness

WildASR is a multilingual diagnostic benchmark built from real human speech to stress-test automatic speech recognition (ASR) robustness under real-world out-of-distribution conditions. The dataset decomposes robustness into axes including environmental degradation and demographic shift. It was created by bosonai and last updated on 2026-03 -25.

AudioMultilingualParquetSize Categories10 Kn100 KOut Of DistributionLibrarypolarsLanguageenSpeech DegradationModalitytextRobustness BenchmarkLibrarymlcroissantLibrarydatasetsBenchmarkLibrarypandasHallucinationRegionusRobustnessTask Categoriesautomatic Speech RecognitionMultilingual AudioLicenseapache 20Speech RecognitionAutomatic Speech Recognition+1

0 views

Speech & Audio

Bangla Audio Dataset with Original and DeepFake Voice Samples

A Kaggle-hosted audio dataset containing Bangla speech samples. The dataset likely contains recordings of original human voices and corresponding synthetic or manipulated deepfake versions. The specific volume, collection method, and creation date are not detailed in the provided metadata.

AudioBangla SpeechDeepfake DetectionVoice Synthesis+1

0 views

Speech & Audio

Music Data for Analysis and Modeling

A dataset titled 'musicdata' published on Kaggle. The specific contents, size, and source are not detailed in the available metadata. Further inspection after download is required to determine its scope and structure.

TabularAudioAudio Analysis+1

0 views

Speech & Audio

ESC-50: 2,000 Environmental Audio Recordings for Sound Classification

ESC-50 is a labeled collection of 2000 environmental audio recordings designed for benchmarking sound classification methods. The dataset consists of 5-second-long recordings organized into 50 semantical classes, with 40 examples per class. It was uploaded to Hugging Face by ashutoshm28 and was last updated on 2026-03 07.

AudioMachine Learning BenchmarkEnvironmental SoundAudio Classification+1

0 views

Speech & Audio

Indic_TTS_SD: Indic Language Text-to-Speech Data

Indic_TTS_SD is a dataset for text-to-speech synthesis, likely containing audio samples and corresponding text transcripts. The dataset is hosted on Kaggle, but its specific contents, size, and creation details are not provided. Its title suggests a focus on Indic languages, which may include languages like Hindi, Bengali, or Tamil.

AudioText To SpeechSpeech SynthesisAi TrainingIndic Languages+1

0 views

Speech & Audio

ESC: Environmental Sound Classification Dataset with 50 Classes

2,000 labeled 5-second audio clips comprise the ESC-50 dataset, organized into 50 classes with 40 clips each. It was created by Karol J. Piczak of Warsaw University of Technology from public field recordings on Freesound.org. The collection also includes a 10-class subset (ESC-10) and a larger unlabeled set (ESC-US) of 250,000 clips for unsupervised learning.

AudioMachine LearningEnvironmental scienceOceanographyEnvironmental SoundAudio ClassificationComputer ScienceAcoustic DataGeologySound Geography+1

0 views

Speech & Audio

Sargasso Sea Oceanographic Profiles from SYNOP Experiment 1987-1990

Sargasso Sea measurements of temperature, salinity, and dissolved oxygen were collected as part of the SYNoptic Ocean Prediction (SYNOP) experiment. The dataset contains profiles from multiple cruises conducted between Fall 1987 and Fall 1990, managed by investigators WATTS; DR. D. RANDOLPH and BANE; JOHN M. JR. It is hosted by the National Oceanic and Atmospheric Administration and also appears on NASA EarthData, indicating its recognized scientific value.

TabularTime SeriesSargasso SeaSynop ExperimentOceanographyPhysical Oceanography+1

0 views

Speech & Audio

Waxal Lug Clean: Luganda Text-to-Speech Audio with Artifact Removal

CraneAILabs provides a cleaned version of Luganda speech recordings from Google's WaxalNLP dataset, preprocessed for fine-tuning text-to-speech models. The dataset applies Silero VAD to remove click and pop artifacts from the start and end of audio clips, which are described as degrading model quality. This cleaned subset was last updated on March 15, 2026.

AudioText To SpeechSpeech SynthesisLugandaAudio Cleaning+1

0 views

Speech & Audio

Call Center Audio: 13,000+ Hours of US Customer Service Conversations

UniDataPro provides 13,000+ hours of real-world call center audio recordings featuring over 90% unique speakers. The collection includes time-stamped transcripts designed for training speech recognition and speaker diarization models in the customer service domain.

AudioAUDIOFOLDERTask Categoriestext To SpeechCustomer ServiceModalityaudioLicensecc By Nc Nd 40Size Categoriesn1 KLibrarymlcroissantLibrarydatasetsAnalyzing CustomerRegionusTask Categoriesautomatic Speech RecognitionCall Center Data+1

0 views

Speech & Audio

Massachusetts Tidal Currents Greater Than 3 Knots from NOAA Tables, 2006

2006 tidal information for Massachusetts waters, derived from National Oceanic and Atmospheric Administration (NOAA) tidal current tables. The GIS datalayer contains areas where tidal current speeds exceed 3 knots, a threshold for tidal in-stream energy conversion devices. The dataset was created by SCIOPS and last updated in 1997.

GeospatialRenewable EnergyTidal CurrentsCoastal DataMarine energy+1

0 views

Speech & Audio

Ice-Rafted Sediment Hydrodynamics in Plum Island Sound

Plum Island Sound, Massachusetts, is the location for this dataset of hydrodynamic results from an extratropical storm between January and July 2018. It contains modeled or measured water levels, inundation depths, and flow direction and speed, linked to observations of ice-rafted sediment deposits. The data supports analysis of coastal storm impacts and sediment transport processes on a marsh surface.

AudioTime SeriesGeospatialSediment TransportPlum Island SoundStorm InundationCoastal Hydrodynamics+1

0 views

Speech & Audio

ViMedCSS: 24 Hours of Vietnamese Medical Code-Switching Speech

ViMedCSS provides 24.3 hours of Vietnamese medical speech across 11,832 training utterances, developed for the LREC 2026 conference. Each recording features at least one English medical term embedded within Vietnamese speech to support code-switching automatic speech recognition (ASR).

OPTIMIZED-PARQUETParquetSize Categories10 Kn100 KLibrarypolarsLibrarydaskModalityaudioArxiv260212911ModalitytextCode SwitchingLibrarymlcroissantLibrarydatasetsLicensecc By 40RegionusTask Categoriesautomatic Speech RecognitionLanguageviMedical+1

0 views

Speech & Audio

MLAAD English: 500 Audio Samples per TTS Model

MLAAD English is a dataset of audio samples for text-to-speech models. The title indicates it contains 500 samples per TTS model, but the specific number of models and total samples is unknown. It is hosted on Kaggle, but the author, organization, and creation details are not provided.

AudioText To SpeechMachine LearningSpeech SynthesisAudio Samples+1

0 views

Speech & Audio

Sub Reverb Asr Dataset 0.4: Audio Samples for Reverberation Simulation

Sub Reverb Asr Dataset 0.4 contains 45 audio samples organized across three subsets. The subsets are 'original', 'pointsource_noises', and 'real_rirs_isotropic_noises', each with 15 samples in a 'train' split. The dataset was created by sujalappa and was last updated on HuggingFace in March 2026.

AudioReverb SimulationAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Bathymetric Contours for New York Bight and Gulf of Maine

June 26, 2006 bathymetric shapefile contains 10-meter depth contours for the continental shelf and 100-meter contours beyond the 200-meter shelf edge. The data was derived from NOAA National Geophysical Data Center Coastal Relief Models and reprojected by the Massachusetts Office of Coastal Zone Management. The dataset covers the New York Bight and Gulf of Maine regions.

GeospatialOceanographyCoastal managementMarine GeospatialMarine GeologyGeospatial DataOcean FloorCoastal GeographyBathymetry+1

0 views

Speech & Audio

Massachusetts Estuary Water Quality Monitoring Data

Data from the Massachusetts Ecosystem Assessment Program, a state monitoring effort active until 2003. The program was a partnership with the EPA's National Coastal Assessment, focusing on water quality parameters in selected embayments. It was sponsored by the Environmental Protection Agency, Coastal 2000, and the Massachusetts Coastal Zone Management Program.

TabularCoastal AssessmentWater QualityEnvironmental DataEstuary Monitoring+1

0 views

PreviousPage 49 of 130Next