DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,575 datasets

Speech & Audio

Speech Recognition Performance for Cochlear Implant Audio Processors in Mandarin Speakers

A clinical study of 51 native Mandarin-speaking cochlear implant users, testing speech perception across five audio processor configurations. The dataset includes monosyllabic word, disyllabic word, and sentence recognition scores in quiet and noise conditions. The research was authored by Kailong Yin and published on figshare in April 2026.

TabularAudioCochlear ImplantsBenchmarkHealthcareAudiologyClinical StudySpeech Recognition+1

0 views

Speech & Audio

Rickettsial Research Contributions and Public Health Impact in Asia

Stuart D. Blacksell's dataset on figshare summarizes key scientific contributions and public health impact from long-term rickettsial research in Asia. The data is stored in an XLS file of 9.5 KB and was last updated on 2026-05-26. The dataset is licensed under CC-BY-4.0.

Tabular🌏 AsiaExcelHealthcareRickettsial ResearchScientific ContributionsPublic Health+1

0 views

Speech & Audio

Krio Data Tts V2: Text-to-Speech Audio for the Krio Language

Krio Data Tts V2 is a text-to-speech dataset hosted on Hugging Face. The dataset was created by MosesJoshuaCoker and was last updated on July 20, 2026. Specific details regarding its size, format, and content are not provided in the available metadata.

AudioText To SpeechSpeech SynthesisKrio LanguageAudio Generation+1

0 views

Speech & Audio

Burmese Synthetic Speech Corpus for TTS and Speech Recognition

DatarrX created a Burmese Synthetic Speech Corpus designed to advance Text-to-Speech systems and speech recognition for the Burmese language. The dataset is described as high-fidelity and manually curated to provide natural, native-sounding audio. It was last updated on 2026-05-31.

AudioText To SpeechBurmese LanguageSpeech SynthesisNatural Language ProcessingAudio CorpusSynthetic+1

0 views

Speech & Audio

ConsolidadoSentenciasRutaEtnicaURT

A dataset from the Colombian government's Land Restitution Unit (URT) shows the number of judicial sentences issued per municipality under the Ethnic Route. It includes data on resolved requests and covers municipalities designated as PDET (Territorial Development Programs). The data was last updated on May 18, 2026, and is provided by www.datos.gov.co.

TabularCSVXMLJSONLand RestitutionColombiaEthnic CommunitiesJudicial DecisionsMunicipal Data+1

0 views

Speech & Audio

Maleo Short 1.5H: A Speaker Diarization Benchmark for Complex Media

Maleo Short 1.5H is a manually curated and rigorously annotated dataset designed to benchmark State-of-the-Art speaker diarization models. It focuses on complex, 'in-the-wild' media domains where models typically struggle, such as content with overlapping speech and sound effects. The dataset was created by maleo-ai and was last updated on Hugging Face in May 2026.

AudioBenchmarkSpeech AnalysisBenchmark DatasetAudio ProcessingSpeaker Diarization+1

0 views

Speech & Audio

Roadian–Wordian Permian Zircon U-Pb and Palynology Ages from the Canning Basin

U–Pb zircon dating results from middle Permian tuffs in the Canning Basin of Western Australia, revealing an apparent conflict with established spore-pollen zonation. The dataset includes ages such as 267.04 ± 0.14 Ma from the Pittston SD-1 drillhole and comparative data from other core holes. It was published by Mory et al. in 2017 and is hosted by Geoscience Australia.

TabularCanning BasinGeochronologyPalynologyStratigraphyLarge ScalePermian+1

0 views

Speech & Audio

Experimental Results for Music Genre Classification on GTZAN, FMA-Small, and FMA-Medium

CT-GateNet, a hybrid neural network architecture, achieved classification accuracies of 98.72%, 89.42%, and 69.07% on the GTZAN, FMA-SMALL, and FMA-Medium music genre datasets, respectively. The 5.5 KB Excel file contains experimental datasets from this research, authored by Yunyan Ma and last updated in April 2026. The data is shared under a CC-BY-4.0 license on figshare.

TabularAudioExcelMachine LearningAudio DataMusic Genre ClassificationLarge ScaleExperimental Results+1

0 views

Speech & Audio

Shrutilipi-ML: Malayalam Language Speech Recognition Data

A Malayalam-language subset of the Shrutilipi ASR corpus, originally curated by AI4Bharat. The dataset is a lightweight, language-specific version for researchers and developers focusing on Malayalam speech technology. It was uploaded by the author 'trysem' to Hugging Face.

AudioMultilingualMalayalamMultilingual SpeechLarge ScaleNatural Language ProcessingSpeech Recognition+1

0 views

Speech & Audio

SpecDox: 172-Hour Urdu-to-English Speech Translation Dataset

172 hours of authentic audio captures the pure, real-world complexity of the Pakistani accent and code-mixed Urdu-English speech. SpecDox provides highly accurate, structured English transcriptions for complex audio inputs, architected for Urdu-to-English speech translation and ASR. The dataset was created by Shzaib and last updated on Hugging Face in June 2026.

AudioPakistani AccentSpeech TranslationUrdu EnglishCode Mixed SpeechAutomatic Speech Recognition+1

0 views

Speech & Audio

Saint Kitts and Nevis Road Surface and Passability with 2020-2024 AI Analysis

HeiGIT generated this geospatial dataset for Saint Kitts and Nevis by applying deep learning to PlanetScope satellite imagery from 2020 and 2024. It maps surface types, width classes, and passability for approximately 100 km of arterial roads, including motorway, trunk, primary, and secondary classifications. The data supplements OpenStreetMap (OSM) attributes with AI-derived predictions to fill gaps in surface and width tagging.

RoadsServicesOpenstreetmapRuralIndicatorsTransportationHumanitarian AccessDevelopmentSustainable Development Goals SdgLogisticsUrbanSocioeconomicsSustainable DevelopmentPoverty+1

0 views

Speech & Audio

Gene Expression of Coral Parasite Under Nutrient Enrichment in Acropora cervicornis

A 59.3 KB CSV file from figshare, last updated April 2026. It contains gene expression data for the bacterial parasite Candidatus Aquirickettsia rohweri within the critically endangered coral Acropora cervicornis under ambient and nutrient-enriched conditions. The dataset was authored by Lauren Speare to investigate how nutrient enrichment influences parasite physiology and disease susceptibility.

TabularCSVGene ExpressionCoral DiseaseHealthcareMicrobial ParasiteNutrient EnrichmentMarine Biology+1

0 views

Speech & Audio

Psychology Thesis Interview Transcripts on Music Performance

Katie Schofield's MSc Psychology thesis data comprises interview transcripts from a research project on music performance. The dataset is 1.1 MB in size and was last updated on 2026-05-28. It is shared under a CC-BY-4.0 license on figshare.

TextAudioPsychologyInterview TranscriptsMusic PerformanceQualitative Research+1

0 views

Speech & Audio

Patriae Cuban Literature Dataset: 40,000 Records for LLM and TTS Training

Patriae Cuban Literature Dataset contains 40,000 records of Cuban literature in CSV and Parquet formats. It is designed for training and evaluating Large Language Models and Text-to-Speech systems oriented towards the Cuban dialect and culture. The dataset was curated by Carlos Luis Barnés Infante and Yisel Clavel Quintero.

TextCuban LiteratureSpanish-languageCultural DialectLlm TrainingTts Training+1

0 views

Speech & Audio

Kidney Stone Risk Factors: Genetic Polymorphisms, Clinical and Demographic Data

Path analysis data on genetic polymorphisms (CaSR, CLDN14, VDR, ALPL) associated with kidney stone occurrence. The 59.0 KB XLSX file, authored by Widi Atmoko and last updated in May 2026, likely contains clinical variables and demographic characteristics for analysis. It is shared under a CC0-1.0 public domain license.

TabularExcelKidney StonesHealthcarePath AnalysisClinical FactorsGenetic Polymorphism+1

0 views

Speech & Audio

FitSkills: Physical Activity and Participation Data for Young People with Disability

Data from a stepped wedge cluster randomised trial evaluating the FitSkills community-based physical activity intervention. The dataset supports the findings of a 2025 publication in the British Journal of Sports Medicine. It was authored by Nora Shields and colleagues and is available under a CC-BY-4.0 license.

TabularExcelDisability InterventionClinical TrialPhysical ActivityCommunity Health+1

0 views

Speech & Audio

Pl Asr Benchmark: Polish Medical Speech Recognition Evaluation Corpus

Polish ASR Benchmark combines real-world Creative Commons video recordings with synthetic speech from healthcare articles. The dataset is intended for evaluating transcription quality on domain-specific vocabulary like medical terminology and pharmaceuticals. Author akrasi uploaded it to Hugging Face, with a last recorded update in June 2026.

TextAudioBenchmark DataMedical SpeechBenchmarkHealthcarePolish LanguageSpeech RecognitionHealthcare AiSynthetic+1

0 views

Speech & Audio

Spiro.pacbio.sorted.cram: PacBio Reads Aligned to Spironucleus Salmonicida Assembly

PacBio sequencing reads mapped to a Spironucleus salmonicida genome assembly consisting of 42 scaffolds. The alignment was performed using blasr v5.3.5 and the resulting BAM file was converted to the CRAM format to reduce file size. The dataset was contributed by Feifei Xu and is listed on the Papers with Code platform.

TabularSequence alignmentPacbio ReadsCram FormatGenomicsSpironucleus Salmonicida+1

0 views

Speech & Audio

An Be Kalan Bench: Bambara Educational Speech for Child ASR Models

RobotsMali's dataset is a collection of read Bambara text from educational children's books. It is designed to support the training and benchmarking of Automatic Speech Recognition models, with a focus on child speech and regional acoustics. The dataset is structured into two separate subsets for specialized training and evaluation.

TextAudioChild SpeechWest AfricaBenchmarkBambara LanguageSpeech Recognition+1

0 views

Speech & Audio

Sympatheia-18k: 18,000 Emotion-Aware Spoken Dialogue Pairs for Speech Synthesis

18,000 query–response dialogue pairs across 12 emotion categories, intended for empathetic speech synthesis research. The dataset includes synthesized audio and text transcripts, with subsets for emotional and neutral queries. It was created by susameddin and last updated on Hugging Face in May 2026.

AudioMultimodalSpeech SynthesisDialogue SystemsEmotion RecognitionAudio Text Pairs+1

0 views

PreviousPage 15 of 129Next