DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,573 datasets

Speech & Audio

FittsBART: Balloon Analogue Risk Task Data with Motor Task Measures

Data from two follow-up studies to Chandler and Pronin (2012) investigating the effects of movement on thought speed and subsequent risk-taking behavior. The dataset includes raw data from a Fitts' tapping task (FittsBart), a lower limb tapping task (BodyBart), and the Balloon Analogue Risk Task (BART) for measuring pumping behavior, along with PANAS measures. The 5.6 MB dataset was authored by Clare MacMahon and last updated on 2026-05-25.

TabularExcelExperimentalMotor ControlPsychologyRisk TakingPsychology ExperimentBehavioral DataFitts Law+1

0 views

Speech & Audio

TTS Dataset Guj Eng: 60 Minutes of Indian English and Gujarati Speech

rtxtd's Tts Dataset Guj Eng is a high-quality Text-to-Speech training dataset. It contains 120 audio samples totaling 60 minutes of speech, evenly split between Indian English and Gujarati languages. The dataset was last updated on 2026-06-17.

TextAudioText To SpeechIndian LanguagesAudio DatasetSpeech SynthesisGujarati+1

0 views

Speech & Audio

Khmer Speech Dataset: 134.6 Hours of Culturally Thematic Audio

Cambodian cultural speech data comprising 134.6 hours of manually curated speech-text pairs in the Khmer language. The dataset was created by DDD-Cambodia using eight native speakers and was last updated in May 2026. Recordings average 8.54 seconds in length and include speaker metadata such as gender, age group, and origin city.

AudioAudio DataKhmer LanguageCultural SpeechSpeech RecognitionCambodiaSynthetic+1

0 views

Speech & Audio

Yukon Coal Inventory Map at 1:2,000,000 Scale

37,000 km² of Yukon Territory are underlain by potential coal-bearing rocks from Mississippian to Tertiary periods. This inventory, produced by the Government of Yukon, documents coal occurrences in seven distinct geological areas and includes a 1:2,000,000-scale map. The extent of deposits is largely unknown, as detailed examination has been limited.

GeospatialMineral InventoryGeologyCoal ResourcesYukonYukon Territory+1

0 views

Speech & Audio

Northern Toobally Lake Reconnaissance Geology and Geochemistry

A geological reconnaissance report describes the Toobally fault and surrounding rock formations in northern Toobally Lake, Yukon. The report identifies a newly proposed Toobally Formation diamictite estimated at 1800 m thick and an 850-m-thick basalt succession. It was published by the Government of Yukon and last updated in April 2026.

TabularGeospatialGeologyGeological mappingYukon CanadaStructural geologyGeochemistry+1

0 views

Speech & Audio

Dataset Strix: Audio Clips for Conflict Zone Sound Classification

Dataset Strix is a unified audio dataset for the STRIX project by the Proteus Group student association. It contains 2-second mono clips at 16 kHz for classifying sounds in conflict zones. The dataset was created by Tairooonz and last updated on June 12, 2026.

AudioSound EventsAudio ClassificationMilitary SoundsConflict Zones+1

0 views

Speech & Audio

SoE2015: Main Material Types of Litter in Queensland

A 2015 dataset from the Queensland government's State of the Environment reporting describes the composition of litter. It notes that cigarette butts are the most common type of litter, despite constituting a small volume. The dataset was published by the Department of Environment, Tourism, Science and Innovation and is available under a CC-BY-4.0 license.

TabularCSVEnvironmental scienceCigarette ButtsLitterWaste Management+1

0 views

Speech & Audio

NV-Bench: Benchmark for Nonverbal Vocalizations in Text-to-Speech

NV-Bench is the first benchmark for evaluating nonverbal vocalizations in text-to-speech systems, grounded in a functional taxonomy. It was created by CharlesNi and last updated on June 12, 2026. The benchmark aims to provide standardized metrics and reliable ground truth references for this expressive component of speech synthesis.

AudioMultimodalText To SpeechSpeech SynthesisNonverbal VocalizationsEvaluationBenchmark+1

0 views

Speech & Audio

Music-Based fMRI Neurofeedback Data for Mood and Connectivity Modulation

22 healthy adults participated in a real-time fMRI neurofeedback experiment using a novel musical interface. The dataset includes pre- and post-session questionnaire results assessing mood and subjective experience, alongside neuroimaging data from a 50-minute MRI session. The research was authored by Alexandre Sayal and shared on figshare with a CC-BY-4.0 license.

TabularAudioZIPMusic InterventionFmriBrain ConnectivityNeurofeedbackMood Modulation+1

0 views

Speech & Audio

Med-Dictate: Medical Speech Recognition Benchmark in Three Languages

Med-Dictate is an evaluation dataset released by Corti ApS alongside the Symphony for Speech Recognition white-paper. It contains medical notes dictated by Corti team members and one contractor, with their written consent, in English, French, and German. The dataset is built for benchmarking automatic speech recognition and related NLP systems on medical-domain audio.

TextAudioMultilingualAsr BenchmarkMedical SpeechBenchmarkHealthcareClinical NotesNatural Language Processing+1

0 views

Speech & Audio

PittAdsDB: University of Pittsburgh Advertising Images with Annotations

Pitt Ads Dataset (PittAdsDB) contains advertising artifacts from the University of Pittsburgh. The dataset includes image annotations for actions, reasons, and sentiments, as indicated by the available JSON files. The dataset was uploaded by Mindykkyan and was last updated on June 20, 2026.

ImageMultimodalMultimodal AnalysisUniversity PittsburghAdvertisingImage AnnotationsComputer Vision+1

0 views

Speech & Audio

Med-Term: Synthetic Multilingual Medical Notes for ASR Benchmarking

Med-Term is a fully synthetic evaluation dataset for automatic speech recognition released by Corti ApS. It contains medical notes dictated via text-to-speech in German, French, and English, with no real patient data or PHI. The dataset was released alongside the Symphony for Speech Recognition white-paper.

TextAudioMultilingualEvaluationBenchmarkHealthcareMedical NotesNatural Language ProcessingSynthetic DataSpeech RecognitionSynthetic+1

0 views

Speech & Audio

Data Sheet 1_Social engagement, pleasure, and memory in musical reminiscence workshops for

Nineteen participants with moderate to severe Alzheimer's disease from four nursing homes participated in a single-group intervention study. The data, published in 2026, includes assessments of social engagement, episodic memory, observed emotion, and verbal interactions collected at baseline, during nine workshops, post-intervention, and at a one-month follow-up. The dataset is a 102.8 KB PDF file containing the study's data sheet, authored by Mikael Genguelou.

TabularAudioAlzheimers DiseaseMusic TherapyBehavioral assessmentBenchmarkHealthcareClinical InterventionSocial Engagement+1

0 views

Speech & Audio

Social Engagement and Memory in Musical Reminiscence Workshops for Alzheimer's Disease

Nineteen voluntary residents with moderate to severe Alzheimer's disease from four nursing homes participated in a single-group intervention study. The data likely contains assessments of social engagement, episodic memory, and observed emotions collected at baseline, three points during the intervention, post-intervention, and a one-month follow-up. The dataset was authored by Mikael Genguelou and last updated on April 22, 2026.

TabularAudioAlzheimers DiseaseMusic TherapyBehavioral assessmentBenchmarkHealthcareSocial EngagementClinical Study+1

0 views

Speech & Audio

Magicdata Henan Dialect TTS Lite: Scripted Speech Audio

MagicHub provides a dataset of scripted speech recordings in the Henan dialect of Chinese. Audio files are recorded in a quiet indoor environment at 48 kHz and 16 bits. The dataset is released under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

AudioText To SpeechAudio DatasetSpeech SynthesisChinese Dialect+1

0 views

Speech & Audio

Mostafa Mahmoud Arabic Speech Corpus with 187 Hours of Lectures and Interviews

Approximately 187 hours of Arabic speech recordings and transcripts derived from publicly available lectures, interviews, television appearances, and talks by Dr. Mostafa Mahmoud. The dataset was created by oddadmix to support Arabic speech technology research and development. It was last updated on the platform in June 2026.

AudioArabic SpeechLecture TranscriptsLarge ScaleNatural Language ProcessingAudio CorpusSpeech Recognition+1

0 views

Speech & Audio

Magicdata Dialect Cantonese TTS Lite: Scripted Speech Recordings

MagicHub's Magicdata-Dialect-Northeastern Chinese-TTS-Lite dataset provides scripted speech recordings in a Chinese dialect. Audio files are recorded in a quiet indoor environment at 48 kHz and 16 bits in WAV format. The dataset is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

AudioText To SpeechAudio DatasetSpeech SynthesisChinese Dialect+1

0 views

Speech & Audio

Medv3: Synthetic Turkish Medical Speech Corpus for ASR Research

A synthetic speech corpus for Turkish medical automatic speech recognition research. Clinical sentences were synthesized using Google Cloud Text-to-Speech Chirp 3 HD voices. The dataset was created by turkmedstt and was last updated on June 10, 2026.

AudioTurkish LanguageMedical SpeechHealthcareSpeech RecognitionSyntheticSynthetic Audio+1

0 views

Speech & Audio

Systematic Review and Meta-Analysis Protocol on Music in Plastic Surgery

Herney Andrés García‐Perdomo authored a protocol for the first systematic review and meta-analysis on the role of music in plastic surgery settings. The protocol describes the planned methodology for aggregating and analyzing existing research on this topic. Its publication status is indicated as Open Access (green).

TextAudioSystematic ReviewMeta AnalysisMusic TherapyMedical ResearchPlastic Surgery+1

0 views

Speech & Audio

Gemma 4 Public Bench Eval: ASR Performance on English Test Sets

Decoded transcripts and word-level error metrics from the Gemma 4 Unified models evaluated as automatic speech recognition systems. The dataset includes results for two model sizes, gemma4:e4b and gemma4:12b, on three standard English test sets. The evaluation was produced locally with ollama and includes a script for full reproducibility.

TabularAudioTranscriptsBenchmark EvaluationModel ComparisonBenchmarkSpeech Recognition+1

0 views

PreviousPage 13 of 129Next