DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,564 datasets

Speech & Audio

Nagatoro Hayase Voice Clips for AI Voice Training

140 high-quality, pre-processed clean voice clips of the anime character Nagatoro Hayase. The dataset is curated by ezfiez and was last updated on July 10, 2026, specifically for AI voice training and audio deep learning experiments.

AudioCharacter VoiceAudio DatasetAnime VoiceJapanese LanguageVoice Clips+1

0 views

Speech & Audio

Embodied Musical Engagement Impact on Rehabilitation in Children with Special Needs

A sample of 294 children, predominantly male (68.7%) with the largest age subgroup being 7–9 years (33.7%), was recruited for this study. The research, authored by Yanyan Lu and last updated in May 2026, uses Structural Equation Modeling to examine the direct effects of musical engagement on rehabilitation performance and the mediating role of non-cognitive skills. It provides empirical evidence for music-based interventions in contexts like ASD, ADHD, and intellectual disabilities.

TextAudioMusic InterventionBehavioral assessmentRehabilitation OutcomesNon Cognitive SkillsSpecial Needs Children+1

0 views

Speech & Audio

Ready Exerciser One: Effects of Music and VR on Exercise Responses

A within-subjects study of 24 recreationally active adults completing a 12-minute exercise protocol. The dataset, from a paper by Costas I. Karageorghis, compares affective, perceptual, enjoyment, and cardiac responses across music, virtual reality (VR), VR-with-music, and control conditions.

TabularAudioHuman SubjectsMusic InterventionVirtual RealityAffective ResponseHealthcareExercise Psychology+1

0 views

Speech & Audio

dEchorate: Calibrated Multichannel Room Impulse Responses

dEchorate is a dataset of measured multichannel Room Impulse Responses (RIRs) created by researchers including Diego Di Carlo from the Centre National de la Recherche Scientifique. It includes annotations for early echo timings and 3D positions of microphones, real sources, and image sources under different wall configurations in a cuboid room. The dataset is published with software utilities for data access and baseline methods for echo-related tasks.

AudioMultichannel AudioBenchmarkComputer VisionImpulse ResponsesRoom AcousticsAudio Signal Processing+1

0 views

Speech & Audio

Terrestrial Ecosystem Model Calibration Data for Carbon and Nitrogen Fluxes

Terrestrial Ecosystem Model (TEM) calibration data provides carbon and nitrogen pool sizes and fluxes for 16 globally distributed field sites representing biomes from tundra to tropical forest. The dataset, originally published in 1999 and maintained by the ORNL DAAC, was compiled from literature to calibrate a process-based model for estimating continental-scale biogeochemical cycles. Data files remain unchanged since original publication, with only documentation updates noted in 2026.

TabularTime SeriesZIPEcosystem ModelingTerrestrial BiomesNet Primary ProductivityNitrogen FluxCarbon Flux+1

0 views

Speech & Audio

Wazobia TTS CC: Multilingual Nigerian Speech Dataset for Text-to-Speech

WazobiaVoice TTS (wazobia-tts-cc) is a transcribed speech dataset containing 5,866 audio clips totaling approximately 18.0 hours. It covers five Nigerian language varieties: Yoruba, Hausa, Igbo, Nigerian-accented English, and Nigerian Pidgin. The dataset was created by Axiveri and is licensed under CC-BY-4.0.

TextAudioSpeech SynthesisNigerian LanguagesMultilingual AudioSpeech Recognition+1

0 views

Speech & Audio

VOICe: Sound Event Detection Mixtures for Domain Adaptation

VOICe contains 1,449 mixtures of three distinct sound events: baby crying, glass breaking, and gunshot. The dataset was created by Shayan Gharib of Tampere University to support research in domain adaptation for sound event detection. It includes mixtures with background noise from three acoustic scenes and at two signal-to-noise ratios.

AudioAcoustic ScenesSound Event DetectionBenchmarkMachine Learning AudioAudio Domain AdaptationAudio Mixtures+1

0 views

Speech & Audio

Plum Island Estuary and Parker River Salt Marsh Conceptual Units

Conceptual marsh units delineate the Plum Island Estuary and Parker River salt marsh complex based on surface elevation geoprocessing. The U.S. Geological Survey created this data to assess coastal wetland vulnerability and ecosystem service potential following the Hurricane Sandy Science Plan. Flow accumulation and surface slope analysis were used to define unit boundaries and drainage points.

GeospatialEnvironmental scienceHydrologyCoastal WetlandsSalt Marsh+1

0 views

Speech & Audio

Adlindel Assets: Voice Models for AI-Driven Skyrim NPC Dialogue

AD.Lindel assets are voice model files for the SkyrimNet mod, which adds AI-driven dialogue to Skyrim. The dataset contains a local XTTS voice server and a collection of Russian voice models for the Piper TTS system. Author Archidexter uploaded these files as a download mirror for the launcher on July 17, 2026.

AudioSpeech SynthesisGame ModdingVoice AssetsSkyrim+1

0 views

Speech & Audio

B-Lines East: Wildflower Habitat Corridors for UK Pollinator Conservation

Polygon data for Buglife's national B-Lines initiative in Eastern England, focusing on creating and restoring wildflower habitats to reverse pollinator declines. The East network includes 3km corridors for 13 counties: Lincolnshire, Leicestershire, Rutland, Nottinghamshire, Norfolk, Suffolk, Cambridgeshire, Bedfordshire, Northamptonshire, Essex, Hertfordshire, Buckinghamshire, and Oxfordshire. The data is provided by Natural England and contains Ordnance Survey data.

GeospatialZIPEnvironmental PlanningWildlife ConservationPollinator HabitatUk Counties+1

0 views

Speech & Audio

Auditory Menu Navigation Performance with Spearcon Enhancements

Dianne K. Palladino of the Georgia Institute of Technology conducted a study with 28 undergraduates navigating two-dimensional auditory menus. The data compares navigation speed between text-to-speech (TTS) only menus and TTS enhanced with spearcons, which are compressed sound cues. Results indicate navigation was significantly faster with spearcons, with a smaller per-item speed cost as menu length increased.

TabularAudioText To SpeechSpearconsMenu NavigationAuditory InterfacesHuman Computer Interaction+1

0 views

Speech & Audio

MulTTiPop: Multitrack MIDI Transcriptions for Pop Music Segments

572 segments of pop music, approximately 3.5 hours of audio, with aligned multitrack MIDI transcriptions. The dataset was created by researchers including Nathan Pruyne and Chris Donahue, sourced from TheoryTab and the Lakh MIDI Dataset. The dataset page was last updated on July 10, 2026.

AudioMultimodalMultitrack MidiMusic TranscriptionPop MusicAudio Analysis+1

0 views

Speech & Audio

Libri-AudioEvent: 11,700 Synthesized Noisy Speech Clips

Libri-AudioEvent is a synthesized noisy-speech dataset containing matched noisy speech, clean speech, and noise signals. The dataset contains 11,700 total clips across training, validation, and test splits, with each clip being 10 seconds long and sampled at 16 kHz. It was created by Ediethia and last updated on Hugging Face in July 2026.

AudioMachine LearningSound ClassificationNoise SynthesisAudio Events+1

0 views

Speech & Audio

Uzbek Asr Curated 701H: Uzbek Speech Dataset for ASR

Uzbek speech data comprising approximately 701 hours of audio across 337,920 utterances, curated by uzinfocom-edu-ai. The dataset is formatted as 16 kHz mono WAV files with NeMo JSONL manifests and is split into training, validation, and test subsets.

AudioAudio DatasetUzbek LanguageBenchmarkAsr TrainingSpeech Recognition+1

0 views

Speech & Audio

Supplementary file 1_Performance measures of the Nuance AudioTM Glasses: behavioral outcom

A dataset from an in-lab study of 21 adult participants with mild to moderate sensorineural hearing loss evaluating an Over-the-Counter hearing device. The study assessed speech recognition in noise, subjective listening effort, and real ear measures of the Nuance Audio Glasses. The data was authored by Paula Folkeard and last updated on May 25, 2026.

TabularAudioMedical DevicesHearing AidsHealthcareBehavioral OutcomesSpeech RecognitionReal Ear Measures+1

0 views

Speech & Audio

CoreaSpeech+: Unified Korean Speech Dataset Repository for TTS Research

CoreaSpeech+ is a unified repository for Korean speech data, including the 700-hour CoreaSpeech corpus and additional training data for a 2000-hour configuration. It is maintained by Gong1212 on Hugging Face and was last updated in July 2026. The repository also includes validation data and a Korean Universal Testset.

AudioKorean SpeechText To SpeechSpeech SynthesisSpeech CorpusNatural Language Processing+1

0 views

Speech & Audio

Meta-Analysis of Music Intervention Effects on Arousal in Disorders of Consciousness

A 2026 meta-analysis by Jiayi Gu synthesizes evidence from 7 controlled studies involving 296 patients. The dataset contains statistical results comparing music intervention to control conditions on consciousness levels in patients with disorders of consciousness. It was published on figshare under a CC-BY-4.0 license.

TabularAudioExcelMusic InterventionMeta AnalysisDisorders Of ConsciousnessBenchmarkNeurorehabilitationClinical Trials+1

0 views

Speech & Audio

Meta-Analysis of Music Intervention Effects on Consciousness Levels in 296 Patients

A meta-analysis document reviewing seven controlled studies on the effect of music intervention on arousal promotion in patients with disorders of consciousness. The analysis includes data from 296 patients across four randomized controlled trials and three non-randomized studies. The document was authored by Jiayi Gu and last updated in May 2026.

TextAudioMusic InterventionMeta AnalysisDisorders Of ConsciousnessBenchmarkNeurorehabilitationClinical Trials+1

0 views

Speech & Audio

Meta-Analysis of Music Intervention Effects on Arousal in Disorders of Consciousness

Jiayi Gu's 2026 meta-analysis document on figshare reviews evidence from controlled studies on music intervention for patients with disorders of consciousness. The analysis includes seven studies comprising 296 patients, finding a significant improvement in consciousness levels with high heterogeneity. The document is a 2.6 MB DOCX file licensed under CC-BY-4.0.

TextAudioMusic InterventionMeta AnalysisDisorders Of ConsciousnessBenchmarkNeurorehabilitationClinical Trials+1

0 views

Speech & Audio

Meddies ASR Synthetic Dialog Speech: Doctor-Patient Conversations

Synthetic doctor-patient dialog speech for automatic speech recognition training, generated with Fish Audio TTS. The dataset targets 200 hours of speech per language, using 75 English and 165 Chinese curated conversational voices. It was created by Meddies and last updated in July 2026.

TextAudioSpeech SynthesisHealthcareMedical DialogAsr TrainingDialog SystemsMedical AiMultilingual AudioAudio GenerationSynthetic+1

0 views

PreviousPage 5 of 129Next