DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,575 datasets

Speech & Audio

Data Sheet 1_Evaluating Candidatus Aquirickettsia rohweri gene expression upon nutrient en

Gene expression data for the bacterial parasite Candidatus Aquirickettsia rohweri within the critically endangered coral Acropora cervicornis. The dataset compares parasite physiology under ambient versus nutrient-enriched conditions, as described in a research document authored by Lauren Speare and last updated in April 2026. The data is stored in a 576.8 KB DOCX file.

TextGene ExpressionCoral ReefMicrobiologyHealthcareNutrient EnrichmentMarine Biology+1

0 views

Speech & Audio

Heart Rate Response Profiles for 16 Auditory Test Signals

Profiles contain physiological orienting response data for 16 test sounds, measured by heart rate change in 22 participants. The dataset was created by Mako Katagiri and published on figshare in April 2026. It is a small dataset of 5.5 KB, stored in an XLS file.

TabularAudioExcelPsychoacousticsAuditory perceptionSound Testing+1

0 views

Speech & Audio

Physiological Orienting Responses to Musical and Complex Sounds

Twenty-two healthy young male participants had their heart rate changes measured in response to 16 test sounds during a simulated daily-life experiment. The dataset contains results from a study by Mako Katagiri, published on figshare in April 2026, analyzing the reproducibility of orienting responses for alarm and pre-signal sound selection. Physiological data includes calculated RR interval differences and normalized orienting response strength for sounds spanning frequencies from 130.8 Hz to 1661.4 Hz.

TabularAudioExcelAuditory perceptionSound DesignPhysiological ResponseHeart Rate Variability+1

0 views

Speech & Audio

Heart Rate Orienting Responses to Musical and Complex Tones

Twenty-two healthy young male participants underwent auditory experiments during multi-day stays to measure physiological orienting responses to 16 test sounds. The dataset contains calculated heart rate interval differences and normalized orienting response strengths for musical and complex tones across four octaves. Mako Katagiri published this data on figshare in 2026 under a CC-BY-4.0 license.

TabularAudioExcelPsychoacousticsAuditory perceptionPhysiological ResponseHeart Rate Variability+1

0 views

Speech & Audio

Physiological Orienting Response to Musical and Complex Sounds

Twenty-two healthy young male participants underwent auditory experiments during three-day stays, with physiological responses measured via heart rate changes. The dataset contains orienting response metrics for 16 distinct test sounds, including eight musical and eight complex tones, across a frequency range of 130.8 Hz to 1661.4 Hz. Researcher Mako Katagiri published this data on figshare in April 2026.

TabularAudioExcelAuditory perceptionSound DesignPhysiological ResponseHeart Rate Variability+1

0 views

Speech & Audio

Auditory Orienting Response Data for Signal Analysis

Twenty-two healthy young male participants' physiological responses to 16 test sounds, measured via heart rate changes in a simulated daily environment. Mako Katagiri created this dataset to analyze carryover effects in auditory signal perception. The dataset was last updated in April 2026.

TabularAudioExcelSound PerceptionAuditory PhysiologyHeart Rate ResponseOrienting Response+1

0 views

Speech & Audio

RepeatAudio: Synthetic and Real-World Audio Samples for Repetition Counting

Hamozwa created RepeatAudio for research in class-agnostic audio repetition counting. The dataset contains synthetic samples with varying noise and real-world samples from mechanical, ecological, and medical domains. It was last updated on May 29, 2026.

AudioMachine LearningReal World AudioHealthcareRepetition CountingAudio ProcessingSyntheticSynthetic Audio+1

0 views

Speech & Audio

Eka Medical ASR Evaluation: Medical Speech Transcription for Indian Context

An evaluation dataset for automatic speech recognition systems designed to transcribe medical speech. It captures challenges specific to processing medical terminology, particularly branded drugs, within the Indian context. The dataset was created by priyamallojjala and was last updated on 2026-06-02.

TextAudioIndian ContextMedical SpeechBenchmarkHealthcareMedical TerminologyAsr Evaluation+1

0 views

Speech & Audio

Ne Asr Njo: Ao Language Speech Recordings from Nagaland, India

Ao language audio recordings paired with text transcriptions for training automatic speech recognition systems. The dataset contains 259 training samples, 20 validation samples, and 10 test samples, derived from the ARTPARK-IISc Vaani project. It was uploaded by sulabhkatiyar and last updated on 2026-05-17.

AudioMultimodal🇮🇳 IndiaAudio TranscriptionMultilingual AsrLow Resource LanguageSpeech Recognition+1

0 views

Speech & Audio

GLaDOS Audio V2: Portal Series Voice Lines for TTS Research

A collection of GLaDOS voice lines scraped from the Portal Wiki. The dataset covers lines from Portal (2007), Portal 2 main campaign, Portal 2 cooperative mode, and other appearances. It was created by user ray0rf1re and last updated on 2026-05-31.

AudioMachine LearningVideo Game AudioSpeech SynthesisVoice CloningPortal Series+1

0 views

Speech & Audio

Time-Dependent Changes in a Music-Based Occupational Therapy Group

A 5.5 KB Excel file records time-dependent changes for a group undergoing music-based occupational therapy. The dataset was authored by Ibrahim Erarslan and last updated on May 18, 2026. Its specific variables and sample size are not detailed in the available metadata.

TabularAudioTime SeriesExcelMusic TherapyHealthcare InterventionOccupational Therapy+1

0 views

Speech & Audio

Prussian Original Survey Maps 1:25,000 Scale for Wittstock/Dosse

Prussian Original Survey Maps are hand-drawn, one-off topographic maps produced starting in 1822 for the entire territory of Prussia. The maps were created at a scale of 1:25,000 and were not published, serving as the basis for smaller-scale maps. The dataset includes a specific sheet for the Wittstock/Dosse area, produced by the Bundesamt für Kartographie und Geodäsie.

GeospatialCartographyHistorical MapsTopographyPrussia+1

0 views

Speech & Audio

German Empire Maps at 1:100,000 Scale, Sheet 214 Wittstock

675 map sheets comprise the first large-scale, nationwide map series for the German Empire, completed in 1909. The series was designed in polyhedral projection with each sheet covering an area of approximately 35 km by 28 km. The Bundesamt für Kartographie und Geodäsie provides this historical map sheet, originally produced in monochrome.

GeospatialCartographyHistorical MapsGerman Empire+1

0 views

Speech & Audio

HMM-Based Phoneme Speech Recognition for Industrial Robot Control

A speech recognition system designed for controlling industrial robots via phoneme-based commands. The system, developed by Adwait Naik of K J Somaiya Medical College, uses Linear Predictive Coding and comprises a microphone array, voice module, and a 3-DOF robotic arm. It was validated through experiments involving simple and complex sentences for tasks like cube manipulation and pick-and-place.

AudioPhoneme RecognitionRobotics ControlNatural Language ProcessingSpeech Recognition+1

0 views

Speech & Audio

Uzbek YouTube Speech Recognition Dataset with Gemini and Whisper Labels

Uzbek YouTube content, including IT vlogs, news, and Tashkent-dialect podcasts, forms the basis of this speech dataset. It contains at least 37,807 audio clips across two splits, totaling over 135.9 hours of audio, curated by Saidakmal and last updated in May 2026. Each audio clip is paired with two automatic speech recognition transcriptions generated by Gemini and Whisper models.

AudioMultimodalUzbek LanguageSpeech RecognitionSyntheticMultimodal TranscriptionYoutube Content+1

0 views

Speech & Audio

Coastal Ocean Measurements from Massachusetts and Maine

2007 measurements collected along the Massachusetts and Maine coastal regions. The dataset contains oceanographic data, including chemistry, optics, temperature, and salinity/density, produced by the National Aeronautics and Space Administration. It is available in BIN and ISO file formats.

Time SeriesGeospatialOcean TemperatureOcean OpticsCoastal MeasurementsSalinity DensityOcean Chemistry+1

0 views

Speech & Audio

Hindi Tokens: Pre-Extracted Audio Codec Tokens for TTS Training

somu9's Hindi Tokens dataset contains 305,847 pre-extracted audio codec tokens for text-to-speech training. The data comprises 544.2 hours of Hindi audio, with an average sample duration of 6.4 seconds. It was last updated on June 2, 2026.

AudioHindiText To SpeechSpeech SynthesisAudio TokensPre Extracted+1

0 views

Speech & Audio

Permian Tuff Zircon Ages and Palynology Data from Western Australia

U–Pb dating of zircons from middle Permian tuffs in the Canning Basin reveals an apparent 1.7-million-year conflict with established spore-pollen zonation. The dataset, published in the Australian Journal of Earth Sciences in 2017, includes CA-IDTIMS ages and palynological zone information from core holes spanning 350–400 km. It highlights a potential local environmental influence on fossil assemblages and cautions against direct facies comparison.

TabularCanning BasinGeochronologyPalynologyStratigraphyLarge ScalePermian Period+1

0 views

Speech & Audio

Boston Methane and Ethane Atmospheric Measurements 2012-2020

Atmospheric measurements provide hourly averages of methane (CH4) from five sites and five-minute averages of methane and ethane (C2H6) from one urban site in Boston, Massachusetts. Data collection occurred from September 2012 to May 2020 using Picarro cavity ring down spectrometers and a laser spectrometer. The dataset was produced by ORNL_CLOUD, with background concentrations modeled using HYSPLIT trajectories and NAM meteorology.

Time SeriesZIPATMOSPHERIC CHEMISTRYEthaneGreenhouse GasMethaneUrban Air Quality+1

0 views

Speech & Audio

Shemo Diarization: 50 Hours of Synthetic Multi-Speaker Persian Speech

Persian (Farsi) synthetic multi-speaker speech dataset for speaker diarization. It contains approximately 50 hours of audio across 5,000 tracks, built from utterances in the Shemo dataset and processed through a synthesis framework. The dataset was created by atiyehghm and last updated on Hugging Face in May 2026.

AudioMulti SpeakerPersian LanguageSynthetic SpeechSyntheticSpeech Diarization+1

0 views

PreviousPage 17 of 129Next