DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,571 datasets

Speech & Audio

Replication Data for Semi-Automated vs Manual Citation Screening

A test dataset and analysis for replicating findings from the manuscript 'Man vs Machine: comparing semi-automated (ASReview) and manual citation screening for systematic review'. The data was authored by Andrew Ross and is hosted by DataverseNL. The record was last updated on July 20, 2026.

TabularSystematic ReviewMachine LearningAsreviewResearch MethodologyCitation Screening+1

0 views

Speech & Audio

Memo2496: 2,496 Instrumental Songs with Expert Valence-Arousal Annotations

Memo2496 is a music emotion recognition dataset containing 2,496 instrumental songs annotated by 30 music experts. The dataset provides valence-arousal labels and extracted acoustic features to support affective computing research. It was updated by Qilin Li in April 2026 and is hosted on figshare under a CC-BY-4.0 license.

TabularAudioZIPCSVTextJSONAffective ComputingValence ArousalAudio FeaturesNatural Language ProcessingMusic Emotion RecognitionInstrumental Music+1

0 views

Speech & Audio

Discover Piano: Pre-Tokenized Solo Piano MIDI Dataset for Symbolic Music AI

Ultimate pre-tokenized solo Piano MIDI dataset for symbolic music AI and MIR purposes. The dataset was created by asigalov61 and was last updated on June 23, 2026. It is hosted on the Hugging Face platform.

AudioMusic Information RetrievalMidiSymbolic MusicPiano+1

0 views

Speech & Audio

Whiskered Auklet Ornament Symmetry Data for 721 Birds, 1992-2009

Ian L. Jones provides data and R code for a study on symmetry in facial feather ornaments of the Whiskered Auklet. The dataset covers 721 wild-caught marked individuals, with 162 of known sex and 94 of known age (1-16 years old), studied between 1992 and 2009. It was used to analyze effects of age, sex, body size, and condition on ornament asymmetries, and correlations with ocean climate and population measures.

TabularEcological DataBird SymmetryFluctuating asymmetryAuklet BiologyOrnamental Plumes+1

0 views

Speech & Audio

Permian Tuff and Palynology Ages from the Canning Basin, Western Australia

U–Pb dating of zircons from middle Permian tuffs in the Canning Basin reveals a conflict with established spore-pollen zonation. The dataset includes an age of 267.04 ± 0.14 Ma from the M. villosa Zone, which is 1.7 million years younger than tuffs from the D. granulata Zone. This research was published by Mory et al. in the Australian Journal of Earth Sciences in 2017.

TabularCanning BasinGeochronologyPalynologyStratigraphyLarge ScalePermian+1

0 views

Speech & Audio

Bullying Among Massachusetts Students Survey Data from 2009

2009 Massachusetts Youth Health Survey data analyzed by the Massachusetts Department of Public Health and CDC shows significant associations between bullying involvement and family violence. The report details adjusted odds ratios for middle and high school students categorized as bullies, victims, or bully-victims. Analysis controlled for age group, sex, and race/ethnicity.

TabularBullyingSurvey DataHealthcareRisk FactorsYouth HealthPublic Health+1

0 views

Speech & Audio

Khasi-English Multi-Task Speech and Text Dataset for Low-Resource NLP

A multi-task speech and text dataset for Khasi, an Austroasiatic language spoken by roughly 1.5 million people in Meghalaya, Northeast India. It was built to help bring Khasi into modern speech recognition, text-to-speech, and machine translation systems. The dataset was authored by RonitMehta260704 and last updated on HuggingFace in July 2026.

TextAudioText To SpeechMachine TranslationLarge ScaleNatural Language ProcessingKhasi LanguageLow Resource LanguageSpeech Recognition+1

0 views

Speech & Audio

MMGenre: Chinese Singing Voice and Score Pairs Across 10 Genres

MMGenre is a multi-genre benchmark for singing voice synthesis containing 3,152 aligned Chinese singing voice and symbolic music score segments. The data, derived from 148 songs, totals approximately 261.8 minutes (4.36 hours) of audio. It was created by author Leaky-ReLU and last updated on the Hugging Face platform in June 2026.

AudioMultimodalBenchmarkSinging Voice SynthesisMusic GenreChinese MusicAudio Score Alignment+1

0 views

Speech & Audio

PLAID 2014: High-Frequency Electrical Measurements from Household Appliances

The 2014 version of the Plug-Load Appliance Identification Dataset (PLAID) contains voltage and current measurements from different electrical household appliances. Data was sampled at 30 kHz and collected at 65 different locations in Pittsburgh, Pennsylvania (US). The dataset was authored by Jingkun Gao and is available under an Open Access license.

Time SeriesAppliance IdentificationEnergy ConsumptionSmart MeterIot+1

0 views

Speech & Audio

PLAID 2017: Electrical Appliance Voltage and Current Measurements

The 2017 version of the Plug-Load Appliance Identification Dataset (PLAID) contains voltage and current measurements from different electrical household appliances sampled at 30 kHz. Data was collected at 65 different locations in Pittsburgh, Pennsylvania (US). The dataset was authored by Leen De Baets.

Time SeriesIot SensingHousehold AppliancesEnergy ConsumptionSignal Processing+1

0 views

Speech & Audio

Mghana-st: Short Local-Language Audio Clips for Speech Tasks

mghana-st is a curated audio dataset of short local-language audio clips. The dataset is intended for speech tasks and contains annotations with English translations and tags for non-verbal events. It was created by adwumatech-ai and last updated on June 13, 2026.

AudioSpeaker IdentificationAudio ClipsSpeech TranslationSpeech RecognitionLocal Language+1

0 views

Speech & Audio

WRF-STILT Atmospheric Footprint Grids for Boston 2013-2014

July 2013 to December 2014 data provides gridded footprint fields from the WRF-STILT Lagrangian particle dispersion model for two receptor sites in Boston, MA. The 1-km resolution footprints quantify the influence of upwind surface fluxes on measured CO2 and CH4 concentrations. This dataset is produced by the National Aeronautics and Space Administration using meteorological fields from WRF version 3.5.1.

Time SeriesGeospatialZIPATMOSPHERIC CHEMISTRYCarbon FootprintUrban Air QualityMeteorological Modeling+1

0 views

Speech & Audio

Common Sense Facts Audio: Spoken Prompts with Correct and Counterfactual Completions

A collection of spoken common-sense factual prompts, each provided in three paired versions: an incomplete prompt, a correct factual completion, and an incorrect counterfactual completion. The dataset was created by author slprl and was last updated on 2026-06-18. It does not define standard train/validation/test splits, with all examples provided in a single neutral split.

TextAudioCommon SenseFactual Prompts+1

0 views

Speech & Audio

Beautiful Motifs: 319k+ High-Rated Music Motifs Extracted from MIDI Files

319,000+ high-rated beautiful music motifs extracted from 248,000+ high-quality MIDI files from the Discover MIDI dataset. The dataset was created by asigalov61 using a beauty metric based on cosine similarity of embeddings against a reference set, with scores ranging from 0.739 to 0.999. It was last updated on June 19, 2026.

AudioMusic GenerationMidiAudio EmbeddingsMusic Motifs+1

0 views

Speech & Audio

Mimba NNH TTS Dataset: Ngiemboon Synthetic Speech for Low-Resource TTS

A multi-speaker synthetic speech corpus for Ngiemboon (NNH), a Grassfields Bantu language of western Cameroon. Each item pairs a cleaned NNH sentence with machine-generated audio, intended for training on-device text-to-speech models. The dataset was created by 'mimba' and was last updated on 2026-06-24.

TextAudioText To SpeechSpeech CorpusNatural Language ProcessingNgiemboonSynthetic SpeechLow Resource LanguageSynthetic+1

0 views

Speech & Audio

Crossmodal Correspondence Data for Musical Engagement from a Thesis

Thesis data by Joni Mok from the University of Oslo investigates crossmodal correspondences in musical engagement using an ecological cognitive approach. The dataset is associated with an Open Access (green) license. The specific data format, size, and last update date are not provided.

MultimodalEcological PsychologyCrossmodal CorrespondencesMusic CognitionThesis Data+1

0 views

Speech & Audio

Turkish TTS Combined Raw: 81,500 Speech Samples from Seven Sources

Turkish TTS Combined Raw is a speech dataset combining seven open-source Turkish text-to-speech collections. It contains approximately 81,500 audio samples recorded at 24kHz and is described as SNAC-compatible. The dataset was created by Hm12cbbcbx and was last updated on Hugging Face in June 2026.

AudioAudio DatasetSpeech SynthesisTurkish LanguageTts Training+1

0 views

Speech & Audio

DESED_public_eval: Domestic Environment Sound Event Detection Clips

DESED_public_eval is a collection of real 10-second audio clips for evaluating sound event detection systems. The clips are sourced from YouTube under Creative Commons licenses and curated by Nicolas Turpault of the Centre National de la Recherche Scientifique. This dataset serves as the public evaluation set, referred to as the 'youtube' evaluation in associated research papers.

AudioMachine Learning EvaluationAudio ClassificationSound Event DetectionBenchmarkDomestic Sounds+1

0 views

Speech & Audio

GOES-R Satellite Validation Surface Radiance Data

NASA's GOES-R PLT dataset provides surface reflectance and total optical depth measurements from Ivanpah Playa, Nevada. The data was collected during a 2017 field campaign using an Automated Solar Radiometer and a portable spectroradiometer to validate post-launch calibration of the ABI and GLM instruments. However, the full dataset appears limited, with data files only confirmed for two specific days in March 2017.

ImageTabularTextGeostationary ObservationsBenchmarkComputer VisionSurface ReflectanceAtmospheric OpticsSatellite Validation+1

0 views

Speech & Audio

Urdu TTS Mini: Speech Audio and Text for Urdu Language Models

A curated Urdu speech dataset for Text-to-Speech and Automatic Speech Recognition research. Audio segments are extracted from publicly available YouTube speech content, processed through a multi-stage quality pipeline, and annotated with Urdu transcriptions. This mini release by salisai, last updated in June 2026, is intended to validate the preprocessing pipeline and establish a quality baseline for future large-scale versions.

TextAudioAudio DataSpeech SynthesisBenchmarkUrdu LanguageLarge ScaleSpeech Recognition+1

0 views

PreviousPage 8 of 129Next