DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,572 datasets

Speech & Audio

St. Kitts and Nevis Aid Effectiveness and Development Indicators

World Bank Group indicators for St. Kitts and Nevis, last updated on 2026-04-28. The dataset covers aid effectiveness, measuring the impact of aid on poverty, inequality, growth, and capacity building. It includes indicators on aid received and progress toward Millennium Development Goals in education, health, and human welfare.

TabularCSVDevelopment IndicatorsAid EffectivenessHealthcareEducationHealthFinancePoverty+1

0 views

Speech & Audio

Steam Games Dataset with 133,335 Published PC Games

133,335 published games from Steam, the largest PC gaming platform. The dataset was created using a public API and Steam Spy, and is maintained by Fronkon Games. It includes only published games, excluding DLCs, episodes, music, and videos.

TabularAudioOPTIMIZED-PARQUETParquetTask Categoriestext GenerationLibrarypolarsLanguageenTask CategoriessummarizationTask Categoriestext RankingModalitytextSize Categories100 Kn1 MTask Categoriestext RetrievalModalitytabularLibrarymlcroissantModalityimageLibrarydatasetsTask Categoriestabular RegressionLibrarypandasTask Categoriesfeature ExtractionLicensecc By 40Pc GamingTask Categorieszero Shot ClassificationGame MetadataSteamVideo GamesTask Categoriestabular Classification+1

0 views

Speech & Audio

New England Colleges with Undergraduate Design Programs, 66 Institutions

Schlatter Team Data identifies 66 NECHE-accredited colleges and universities offering undergraduate design programs across the six New England states. The dataset was constructed through manual web searches and documents institutional type, degree type, program name, state, and program characteristics for each institution. It was authored by Tania Schlatter and is hosted on Harvard Dataverse.

TabularUndergraduate ProgramsDesign ProgramsHigher EducationNew England+1

0 views

Speech & Audio

Infrasound Exposure Effects on Human Stress and Mood, 36 Participants

36 participants were exposed to infrasound in a controlled study to measure its non-auditory impact. The dataset likely contains results linking infrasound to elevated salivary cortisol and negative affective self-reporting. The study was authored by Kale R. Scatterty and last updated on 2026-04-27.

TabularAudioHuman Stress ResponseInfrasoundPsychophysiology+1

0 views

Speech & Audio

Magicdata Dialect Wu Chinese TTS Lite: Scripted Speech Audio

A collection of scripted speech audio recordings in the Wu Chinese dialect, released by MagicHub. The dataset consists of WAV format audio files recorded at 48 kHz and 16 bits in quiet indoor environments. The dataset page was last updated on 2026-06-15.

AudioAudio DatasetSpeech SynthesisWu ChineseChinese Dialect+1

0 views

Speech & Audio

Magicdata Dialect Northeastern Chinese TTS Lite: Scripted Speech Recordings

Northeastern Chinese dialect speech recordings for text-to-speech applications. The dataset contains scripted speech recorded in a quiet indoor environment with a microphone, formatted as 48 kHz, 16-bit WAV files. It was released by MagicHub under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.

AudioAudio DatasetSpeech SynthesisNortheastern ChinaChinese Dialect+1

0 views

Speech & Audio

Audiotactile Encoding of Temporal Structure in the Human Brain

Audiotactile encoding of temporal structure in the human brain, as studied by Giulio Degano at the University of Geneva. The dataset likely contains time-series data related to auditory and tactile stimuli and corresponding brain activity. It is published on the paperswithcode platform under an Open Access (green) license.

AudioTime SeriesMultimodalAuditory perceptionMultisensory IntegrationTemporal ProcessingNeuroscience+1

0 views

Speech & Audio

MagicData Dialect Sichuanese TTS Lite: Scripted Speech Audio

A collection of scripted speech audio for the Sichuanese dialect of Chinese. The dataset was created by MagicHub and last updated on June 15, 2026. Audio files are recorded in quiet indoor environments at 48 kHz and 16 bits.

AudioText To SpeechAudio DataSpeech SynthesisChinese Dialect+1

0 views

Speech & Audio

Music or No Music? Research Data on Auditory Perception and Spatial Reasoning

Noah Huntington's research dataset explores the impact of musical and non-musical audio on spatial reasoning abilities, expanding upon the Mozart Effect. The 66.3 MB collection includes the dataset and six MP3 audio files used in the experimental study. It was last updated on May 4, 2026, and is shared under a CC-BY-4.0 license.

TabularAudioAuditory perceptionSpatial ReasoningAudio StimuliExperimental PsychologyMozart Effect+1

0 views

Speech & Audio

MoDiCoL: Modular Diagnostic Continual Learning Dataset for ASR

MoDiCoL is a speech dataset designed to study the robustness of ASR models to different drift factors in a controlled, continual setting. It is constructed using a systematic factorial design that enables a rigorous evaluation of linguistic, speaker, and acoustic variation with clearly defined experimental runs. The dataset combines real-world and synthetic speech with a configuration-dependent augmentation pipeline, and was created by TPekarekRosin.

AudioBenchmarkContinual LearningSynthetic SpeechDiagnostic BenchmarkAutomatic Speech RecognitionSynthetic+1

0 views

Speech & Audio

Audio Emotion Detection Dataset with English and Hindi Speech Clips

Audio Emotion Detection Dataset contains speech clips in English and Hindi annotated with emotion labels and ASR transcripts. Audio is sourced from public YouTube videos and trimmed to approximately 60 seconds per clip, with noise reduction applied. The dataset was created by RapidOrc121 and last updated on 2026-06-07.

AudioMultilingualEmotion DetectionSpeech ClipsAudio Emotion+1

0 views

Speech & Audio

Permian Tuff Geochronology and Palynology Ages from the Canning Basin, Australia

Western Australia's Canning Basin provides zircon U–Pb dates and spore-pollen zonation data from middle Permian tuffs. The dataset, published by Mory et al. in 2017, reveals an apparent age conflict of approximately 1.7 million years between non-marine and marginal-marine facies. It includes CA-IDTIMS ages and palynological zone assignments from core holes spanning 350–400 km.

TabularCanning BasinGeochronologyPalynologyStratigraphyLarge ScalePermian+1

0 views

Speech & Audio

Rickettsial Disease Burden Studies Across Asia

Stuart D. Blacksell compiled a summary of key studies on rickettsial diseases in Asia. The dataset is a 17.5 KB Excel file summarizing the substantial, heterogeneous, and often under-recognised burden of these diseases, highlighting ecological, social, and health-system drivers of transmission and detection. It was last updated on May 26, 2026.

Tabular🌏 AsiaExcelEpidemiologyRickettsial DiseasesHealthcareBurden Of DiseasePublic Health+1

0 views

Speech & Audio

Gujarati Podcast ASR Dataset: 2,471 Hours of Processed Audio

InfoBayAI's Gujarati Podcast ASR Dataset is a large-scale collection of 2,471 hours of processed Gujarati podcast audio recordings. The broader collection contains 57,568 hours of processed audio across 12 languages, capturing real-world interactions across diverse topics and formats. The dataset was last updated on June 2, 2026.

AudioMultilingual SpeechLarge ScaleGujarati LanguagePodcast AudioSpeech Recognition+1

0 views

Speech & Audio

Deadly Active Sport and Recreation Program: Grant Funding for Physical Activity

From July 2022 to June 2026, the Deadly Active Sport and Recreation Program provides grant funding to coordinate physical activity opportunities for Aboriginal and Torres Strait Islander peoples. The data is published by the Queensland Government's Sport, Racing and Olympic and Paralympic Games department. It covers funding targeted at 17 identified discrete communities.

TabularGeospatialExcelGrant FundingSport RecreationPhysical ActivityGovernment ProgramAboriginal Torres Strait Islander+1

0 views

Speech & Audio

Punjabi Podcast Audio Collection with Dual-Channel Format

A large-scale collection of 4,840 hours of processed Punjabi dual-channel podcast audio recordings. The dataset also contains 57,568 hours of processed podcast audio across 12 languages. It was created by InfoBayAI and last updated on June 5, 2026.

AudioMultilingualPunjabi LanguageConversational AiLarge ScalePodcast AudioSpeech Recognition+1

0 views

Speech & Audio

Manufacturing Data Panel: Monthly Oil Price Index, 2009-2016

Monthly data from January 2009 to December 2016 tracks the oil price index (ICP) as an operational variable. The index references Platts and Rim oil prices. The dataset was contributed by Ani Wahyu Rachmawati.

TabularTime SeriesEconomic IndicatorsOil Price IndexManufacturing+1

0 views

Speech & Audio

Zhang Xuefeng Conversational Speech Corpus for AI Style Transfer

A corpus of conversational speech transcripts from Zhang Xuefeng's live-streamed content on college admissions, processed for AI training. The dataset was created by user 'england-lobster' and last updated on June 13, 2026. It is derived from automated speech recognition of public broadcasts, followed by multi-stage cleaning and episode construction.

TextConversational AiVoice CloningStyle TransferSpeech CorpusChinese Language+1

0 views

Speech & Audio

DialectalSpeech-ICL: African American English Speech Recognition Dataset

A stratified sample of utterances from the DialectalSpeech-ICL dataset, which contains African American English speech across multiple regional varieties. The dataset provides audio clips, verbatim transcripts, and speaker/region metadata. It was created by CentificAIResearch and last updated on June 17, 2026.

TabularAudioAudio DatasetDialectal SpeechAfrican American EnglishSpeech Recognition+1

0 views

Speech & Audio

VoTexUg-TTS: Multilingual Speech Synthesis for Ugandan Languages

VoTexUg-TTS is a large-scale multilingual text-to-speech dataset designed to support high-quality speech synthesis for widely spoken Ugandan languages. The dataset emphasizes linguistic diversity, gender balance, and natural African accents. It was created by kukuzaai and was last updated on 2026-06-17.

TextAudioMultilingualText To SpeechSpeech SynthesisLarge ScaleAfrican Languages+1

0 views

PreviousPage 12 of 129Next