DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,572 datasets

Speech & Audio

White-Headed Langur Vocal Responses to Playback Experiments

Acoustic and video data from playback experiments investigating male-male competition in white-headed langurs. The dataset includes sound files, video clips, and analysis results focusing on the temporal structure of loud calls. It was authored by Yinshu Liu and last updated on 2026-04-27.

AudioTime SeriesVideoExcelPlayback ExperimentBioacousticsVocal CommunicationPrimate BehaviorMale Competition+1

0 views

Speech & Audio

Plt Tts Dataset: Plateau Malagasy Synthetic Speech for TTS Models

Mimba PLT TTS Dataset is a clean, multi-speaker synthetic speech corpus for Plateau Malagasy (PTL), the central dialect of the national language of Madagascar. Each item pairs a cleaned PLT sentence with machine-generated speech audio, intended for training and fine-tuning small, on-device text-to-speech models. The dataset was created by 'mimba' and was last updated on 2026-06-24.

TextAudioText To SpeechMalagasy LanguageSpeech CorpusNatural Language ProcessingSynthetic SpeechLow Resource LanguageSynthetic+1

0 views

Speech & Audio

Music Education Hubs Awards Data From Arts Council England

Arts Council England publishes data on awards granted to Music Education Hubs. The dataset likely contains details on funding allocations and recipients. Its last update was recorded as 2026-07-08.

TabularAudioCultural PolicyArts FundingMusic EducationUk Government Data+1

0 views

Speech & Audio

TTSDistil-Phonology: Speech Corpus for Neural Text-to-Speech Research

A speech dataset developed for Text-to-Speech research and model training. The dataset is part of an ongoing research project by ShiniChien focused on building high-quality speech corpora for modern neural TTS systems. It was last updated on June 30, 2026, and is actively maintained with continuous improvements in data quality and transcription consistency.

AudioText To SpeechSpeech CorpusNeural TtsPhonology+1

0 views

Speech & Audio

St. Kitts and Nevis Trade Indicators from the World Bank

St. Kitts and Nevis trade data compiled by the World Bank Group, part of the Transparency in Trade Initiative. The dataset likely contains country-specific trade policy indicators to support market access and a rules-based trading system. It was last updated on 2026-04-28 and is provided in CSV format under a CC-BY-4.0 license.

TabularCSVTradeEconomic IndicatorsWorld BankFinanceSt Kitts And Nevis+1

0 views

Speech & Audio

St. Kitts and Nevis External Debt Statistics from the World Bank

Debt statistics from the World Bank provide a detailed picture of debt stocks and flows for St. Kitts and Nevis. The data likely contains quarterly external debt figures and public sector debt details, including valuation methods and instrument types. The dataset was last updated on 2026-04-28 and is licensed under CC-BY-4.0.

TabularTime SeriesCSVFinanceQuarterly DataPublic SectorFinancial StatisticsExternal Debt+1

0 views

Speech & Audio

St. Kitts and Nevis Science and Technology Indicators from World Bank

World Bank Group data aggregates science and technology indicators for St. Kitts and Nevis. The dataset likely contains metrics on research and development, scientific publications, high-technology exports, royalty fees, and intellectual property. It was last updated on 2026-04-28 and combines data from sources including UNESCO, the U.S. National Science Board, and the World Intellectual Property Organization.

TabularCSVEconomic DevelopmentInnovation IndicatorsFinanceScience And TechnologySt Kitts And Nevis+1

0 views

Speech & Audio

St. Kitts and Nevis Public Sector Performance and Finance Indicators

St. Kitts and Nevis public sector data from the World Bank Group, last updated on 2026-04-28. It includes World Bank staff assessments of country performance in economic management, structural policies, social inclusion, and public sector institutions. The dataset also incorporates government finance statistics from the International Monetary Fund and tax policy indicators from various sources.

TabularCSVGovernment FinanceWorld BankHealthcareFinancePublic SectorEconomic ManagementCountry Performance+1

0 views

Speech & Audio

St. Kitts and Nevis Private Sector and Trade Indicators from World Bank Sources

World Bank Group data on private sector activity and trade for St. Kitts and Nevis. The dataset aggregates indicators from sources including the Private Participation in Infrastructure Project Database, Enterprise Surveys, Doing Business Indicators, IMF Balance of Payments, and UNCTAD. It was last updated on 2026-04-28.

TabularCSVPrivate SectorTradeWorld BankFinanceInfrastructureEconomic Growth+1

0 views

Speech & Audio

Amazon Rainforest Tower Temperature and CO2 Flux Profiles

Hourly temperature profiles from eight heights on a tower in the Tapajos National Forest, Brazil, spanning January 2002 to January 2006. Co-located measurements include CO2 and H2O concentration profiles, canopy storage, and eddy flux data from two levels, alongside other meteorological parameters. The dataset is associated with a Ph.D. thesis from Harvard University and is hosted by multiple platforms including NASA EarthData.

Time SeriesZIPAmazon RainforestMeteorological TowerEddy CovarianceAtmospheric TemperatureCarbon Flux+1

0 views

Speech & Audio

BaltiVoice ASR: Speech Recognition Data for the Balti Language

BaltiVoice is the first publicly available Automatic Speech Recognition dataset for the critically low-resource Balti language (ISO 639-3: bft). The dataset was collected, validated, and processed by mohdali1 to build the first open-source Balti ASR system using OpenAI Whisper fine-tuning. It was last updated on the Hugging Face platform on June 11, 2026.

AudioAudio DataBalti LanguageLow Resource LanguageSpeech Recognition+1

0 views

Speech & Audio

Vocal Performance Anxiety Intervention Study with 60 Students and 3-Month Follow-Up

Qin Cong's dataset contains results from a mixed-methods intervention study on vocal performance anxiety management. The study involved 60 undergraduate vocal performance majors, with data collected at three time points over a 12-week intervention and a 3-month follow-up. The dataset includes quantitative measures of anxiety, performance quality, resilience, self-efficacy, and heart rate variability, plus qualitative reflective journals.

TextTabularAudioVocal PedagogyMusic PsychologyMixed MethodsPerformance AnxietyIntervention Study+1

0 views

Speech & Audio

Vocal Performance Anxiety Intervention Study with 60 Students and 3-Month Follow-Up

Data Sheet 2 contains results from a mixed-methods intervention study on vocal performance anxiety management. The study by Qin Cong recruited 60 undergraduate vocal performance majors and collected quantitative and qualitative data across three time points: pre-intervention, post-intervention, and 3 months post-intervention. The dataset was last updated on 2026-04 27.

TextTabularAudioVocal PedagogyMusic PsychologyMixed MethodsPerformance AnxietyIntervention Study+1

0 views

Speech & Audio

Neural Math Rock: 4,000 Full-Length Tracks for Multimodal Emotion Analysis

4,000 distinct full-length tracks form this large-scale multimodal emotion classification dataset for Music Information Retrieval. It focuses on complex musical genres like Math Rock and Midwest Emo and was created by author 'anggars'. The dataset page was last updated on 2026-06-19.

AudioMultimodalMusic Information RetrievalEmotion ClassificationLarge ScaleNatural Language ProcessingMath RockMidwest Emo+1

0 views

Speech & Audio

WordVoice-5A: Bilingual Speech for Word-Level Controllable TTS

WordVoice-5A is a large-scale bilingual dataset containing approximately 4.7k hours of Mandarin and English speech with fine-grained word-level acoustic annotations. It is designed for high-precision controllable Text-to-Speech research and was created by author XXH333. The dataset page was last updated on 2026-06-27.

TextAudioText To SpeechSpeech SynthesisBilingual SpeechLarge ScaleAcoustic AnnotationsWord Alignment+1

0 views

Speech & Audio

Mohamed Khairy Arabic Speech Corpus with 430 Hours of Audio and Non-Verbal Transcriptions

A large-scale Arabic speech corpus containing approximately 430 hours of speech recordings and corresponding transcripts. The dataset is a first-of-its-kind resource that includes rich non-verbal transcriptions alongside the spoken text. It was created by author oddadmix and was last updated on the platform in June 2026.

TextAudioArabic SpeechNon Verbal TranscriptionLanguage TechnologySpeech CorpusLarge ScaleNatural Language ProcessingAudio Transcription+1

0 views

Speech & Audio

Meddies ASR Benchmark: Vietnamese and English Medical Speech Recognition

A benchmark for automatic speech recognition evaluation, last updated July 2026. It includes four subsets of Vietnamese and English audio, with medical and general-domain slices. The data is packaged as a Parquet file with embedded audio in a datasets.Audio column, created by Meddies.

TextAudioMultimodalEnglishMedical SpeechVietnameseBenchmarkHealthcareAsr EvaluationSpeech Recognition+1

0 views

Speech & Audio

SoE2017: Litter Composition by Main Material Types in Australia

Cigarette butts are the most common type of litter despite contributing little to total volume. Plastic waste items are high in both number and volume, while glass is the least prevalent litter type. This dataset from the Queensland Department of Environment, Tourism, Science and Innovation was last updated in May 2026.

Tabular🇦🇺 AustraliaCSVLitterWaste ManagementEnvironmental Data+1

0 views

Speech & Audio

Arabic Call Center Audio with Dual-Channel Recordings, 50,068 Hours

50,068 hours of processed Arabic dual-channel call center audio recordings form the core of this dataset, which also includes 2,065,026 hours of audio across 32 languages. It consists of real-world customer and agent speech collected from call center environments. The dataset was created by InfoBayAI and was last updated on June 5, 2026.

AudioDual Channel AudioConversational AiCall CenterArabic LanguageLarge ScaleSpeech Recognition+1

0 views

Speech & Audio

YODAS2 Sidon: 141,927 Quality-Verified Thai TTS Audio Samples

Chalermdej's Yodas2 Sidon Th Tts is a filtered, quality-verified Thai text-to-speech dataset derived from sarulab-speech/yodas2_sidon. It contains 141,927 audio samples totaling 156.0 hours from 4,199 speakers, with transcriptions verified by multiple ASR models and Gemini. The dataset was last updated on 2026-06-06.

TextAudioText To SpeechAudio DatasetSpeech SynthesisThai Language+1

0 views

PreviousPage 9 of 129Next