DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,573 datasets

Speech & Audio

SoE2020: Queensland Litter Composition by Material Type

Plastic items were the most common littered items in Queensland during the 2018–19 period, replacing cigarette butts. The data, provided by the Queensland Department of Environment, Tourism, Science and Innovation, highlights the significant environmental load of cigarette butts despite their small volume. It was last updated on May 12, 2026.

TabularCSVEnvironmental monitoringQueenslandCigarette ButtsLitterPlastic Pollution+1

0 views

Speech & Audio

SO-Dataset: Large-Scale Spatial Audio in First-Order Ambisonics

SO-Dataset is a large-scale spatial audio dataset in first-order ambisonics format. It combines simulated spatial scenes and real FOA recordings, with sound event annotations mapped into a unified 63-class taxonomy based on the FSD50k dataset. The dataset was created by dieKarotte and last updated in June 2026.

AudioAudio DatasetSound Event DetectionLarge ScaleSpatial AudioFirst Order AmbisonicsSynthetic+1

0 views

Speech & Audio

Naija TTS Processed: Nigerian Text-to-Speech Audio Data

Naija TTS Processed is a text-to-speech dataset hosted on HuggingFace. It was created by Axiveri and was last updated on July 16, 2026. The dataset's specific content, size, and structure are not detailed in the available metadata.

AudioText To SpeechSpeech SynthesisNigerian LanguagesAudio Processing+1

0 views

Speech & Audio

S8: High-Quality Single-Speaker Persian Narration for TTS

A Persian Farsi text-to-speech dataset containing professional single-speaker narration recordings. The dataset was created by author amir0907 and was last updated on the Hugging Face platform in May 2026. It is designed for training TTS models.

TextAudioText To SpeechPersianSingle SpeakerFarsiAudio Synthesis+1

0 views

Speech & Audio

Meedies ASR Human Labels: Per-Channel Audio Transcripts from Eight YouTube Channels

Per-channel human-transcript exports for audio from eight YouTube channels, including FAPTV and AnhThamTuTV. The dataset, created by Meddies, contains sequential WAV audio files, VTT transcripts, and manifests within a 'processed/' folder, along with audit and summary files in an 'analysis/' folder. It was last updated on July 1, 2026.

TextAudioTime SeriesAudio DataMultichannel AudioSpeech RecognitionHuman Transcription+1

0 views

Speech & Audio

Dyadic Heart-Rate Synchrony During Music Therapy in Neurorehabilitation

A 2026 study by Sun Sun Yap on figshare investigates heart-rate synchrony between a music therapist and 11 in-patients during neurorehabilitation sessions. The dataset includes dyadic heart-rate data, session videos, and notes, focusing on moments of interest within therapy interventions averaging 25.62 minutes. It explores relationships between physiological synchrony, nonverbal synchrony, and patient therapy readiness.

AudioTime SeriesMultimodalHeart Rate SynchronyMusic TherapyDyadic PhysiologyHealthcareNeurorehabilitationPsychotherapy Research+1

0 views

Speech & Audio

University of Pittsburgh Virtual Tumor Board Experience During COVID-19

Harish Dharmarajan from the University of Pittsburgh Medical Center describes the transition to a virtual multidisciplinary tumor board for head and neck oncologic care. The description suggests the feasibility of virtual MDC design and implementation in a large academic medical center with satellite hospitals. The dataset likely contains analysis or documentation of this transition process.

TextGeospatialTelemedicineHealthcareCovid 19Head Neck CancerMultidisciplinary CareMedical Oncology+1

0 views

Speech & Audio

Nyan Jenny Format: Japanese Speech Audio and Text for TTS

RikkaBotan reformatted the 'nyan' dataset to match the 'jenny_tts_dataset' structure. The dataset contains Japanese audio clips paired with their transcriptions. It was last updated on June 7, 2026.

TextAudioText To SpeechAudio DatasetSpeech SynthesisJapanese Audio+1

0 views

Speech & Audio

Saint Kitts and Nevis Road Surface Classification from OSM and AI

Approximately 0.0012 million km of roads are mapped in OpenStreetMap for Saint Kitts and Nevis. This dataset, created by HeiGIT and last updated in March 2026, classifies road surfaces as paved or unpaved using a hybrid deep learning approach that augments OSM data with Mapillary imagery and urban layers.

GeospatialRoadsServicesOpenstreetmapSurface ClassificationIndicatorsTransportationHumanitarian AccessDevelopmentSustainable Development Goals SdgLogisticsLarge ScaleInfrastructureUrbanSocioeconomicsSustainable DevelopmentPoverty+1

0 views

Speech & Audio

Vietnamese and English Speech Audio with Text Transcriptions from a Single Speaker

A bilingual speech dataset contains 4480 Vietnamese and 3956 English audio samples paired with text transcriptions. The dataset was created by author beyoru and is hosted on Hugging Face. It was last updated on June 8, 2026.

TextAudioEnglish LanguageSingle SpeakerAudio TranscriptionVietnamese LanguageSpeech Recognition+1

0 views

Speech & Audio

SMAPVEX19-22: L-Band Vegetation Water Dynamics at Harvard Forest

SMAPVEX19-22 campaign data captures L-band radiometer measurements over a red oak forest at Harvard Forest, Massachusetts, from late April to mid-October 2019. The dataset includes concurrent in-situ measurements of canopy leaf water potential, dielectric constant, soil moisture, temperature, and tree xylem properties. Its primary goal is to study the sensitivity of L-band vegetation optical depth (VOD) to changes in vegetation water potential over a growing season.

Time SeriesGeospatialForest ecologyL BandSoil MoistureVegetation Optical Depth+1

0 views

Speech & Audio

FactShield: Experimental Artifacts for a Modular Claim Verification Pipeline

FactShield is a dataset containing experimental artifacts for a preliminary evaluation of a modular pipeline for automatic claim verification in audiovisual content. The dataset was authored by Fabiann Barbosa and last updated on May 23, 2026. It is a small dataset of 35.6 KB, available in ZIP and XLSX formats under a CC-BY-4.0 license.

TextTabularZIPExcelDeepfake VerificationAudio TranscriptFact CheckingBenchmarkClaim VerificationMisinformation Detection+1

0 views

Speech & Audio

Speech Recognition Performance for Cochlear Implant Audio Processors in Mandarin Speakers

A clinical study of 51 native Mandarin-speaking cochlear implant users, testing speech perception across five audio processor configurations. The dataset includes monosyllabic word, disyllabic word, and sentence recognition scores in quiet and noise conditions. The research was authored by Kailong Yin and published on figshare in April 2026.

TabularAudioCochlear ImplantsBenchmarkHealthcareAudiologyClinical StudySpeech Recognition+1

0 views

Speech & Audio

Rickettsial Research Contributions and Public Health Impact in Asia

Stuart D. Blacksell's dataset on figshare summarizes key scientific contributions and public health impact from long-term rickettsial research in Asia. The data is stored in an XLS file of 9.5 KB and was last updated on 2026-05-26. The dataset is licensed under CC-BY-4.0.

Tabular🌏 AsiaExcelHealthcareRickettsial ResearchScientific ContributionsPublic Health+1

0 views

Speech & Audio

Burmese Synthetic Speech Corpus for TTS and Speech Recognition

DatarrX created a Burmese Synthetic Speech Corpus designed to advance Text-to-Speech systems and speech recognition for the Burmese language. The dataset is described as high-fidelity and manually curated to provide natural, native-sounding audio. It was last updated on 2026-05-31.

AudioText To SpeechBurmese LanguageSpeech SynthesisNatural Language ProcessingAudio CorpusSynthetic+1

0 views

Speech & Audio

ConsolidadoSentenciasRutaEtnicaURT

A dataset from the Colombian government's Land Restitution Unit (URT) shows the number of judicial sentences issued per municipality under the Ethnic Route. It includes data on resolved requests and covers municipalities designated as PDET (Territorial Development Programs). The data was last updated on May 18, 2026, and is provided by www.datos.gov.co.

TabularCSVXMLJSONLand RestitutionColombiaEthnic CommunitiesJudicial DecisionsMunicipal Data+1

0 views

Speech & Audio

Maleo Short 1.5H: A Speaker Diarization Benchmark for Complex Media

Maleo Short 1.5H is a manually curated and rigorously annotated dataset designed to benchmark State-of-the-Art speaker diarization models. It focuses on complex, 'in-the-wild' media domains where models typically struggle, such as content with overlapping speech and sound effects. The dataset was created by maleo-ai and was last updated on Hugging Face in May 2026.

AudioBenchmarkSpeech AnalysisBenchmark DatasetAudio ProcessingSpeaker Diarization+1

0 views

Speech & Audio

Roadian–Wordian Permian Zircon U-Pb and Palynology Ages from the Canning Basin

U–Pb zircon dating results from middle Permian tuffs in the Canning Basin of Western Australia, revealing an apparent conflict with established spore-pollen zonation. The dataset includes ages such as 267.04 ± 0.14 Ma from the Pittston SD-1 drillhole and comparative data from other core holes. It was published by Mory et al. in 2017 and is hosted by Geoscience Australia.

TabularCanning BasinGeochronologyPalynologyStratigraphyLarge ScalePermian+1

0 views

Speech & Audio

Experimental Results for Music Genre Classification on GTZAN, FMA-Small, and FMA-Medium

CT-GateNet, a hybrid neural network architecture, achieved classification accuracies of 98.72%, 89.42%, and 69.07% on the GTZAN, FMA-SMALL, and FMA-Medium music genre datasets, respectively. The 5.5 KB Excel file contains experimental datasets from this research, authored by Yunyan Ma and last updated in April 2026. The data is shared under a CC-BY-4.0 license on figshare.

TabularAudioExcelMachine LearningAudio DataMusic Genre ClassificationLarge ScaleExperimental Results+1

0 views

Speech & Audio

Shrutilipi-ML: Malayalam Language Speech Recognition Data

A Malayalam-language subset of the Shrutilipi ASR corpus, originally curated by AI4Bharat. The dataset is a lightweight, language-specific version for researchers and developers focusing on Malayalam speech technology. It was uploaded by the author 'trysem' to Hugging Face.

AudioMultilingualMalayalamMultilingual SpeechLarge ScaleNatural Language ProcessingSpeech Recognition+1

0 views

PreviousPage 14 of 128Next

Speech & Audio Datasets | DataSalon