DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,571 datasets

Speech & Audio

Pitch Imagery Arrow Task: Experimental Data on Music Training and Mental Control

Experimental data from the Pitch Imagery Arrow Task investigating the effects of musical training, vividness, and mental control. The dataset includes R code for reproducing figures from the associated research article and three data files. The data and code were contributed by author Rebecca Gelding.

TabularAudioMusic TrainingMental ImageryCognitive ScienceExperimental Data+1

0 views

Speech & Audio

HEAR-HSet: Hierarchical Evaluation Set for Zero-Shot TTS, 2,702 Audio Pairs

2,702 prompt-target audio pairs for evaluating zero-shot text-to-speech systems across three hierarchical dimensions: basic generalization, paralinguistic control, and hard robustness scenarios. The dataset, created by dinosaaaur, contains approximately 1GB of audio in Chinese and English and was last updated on June 23, 2026. It is structured into three subsets targeting different evaluation goals.

AudioSpeech SynthesisEvaluation BenchmarkMultilingual SpeechBenchmarkZero Shot Tts+1

0 views

Speech & Audio

German Far-Right Activism Events at County Level, 2013–2024

A county-level panel dataset integrates three streams of German far-right activism—music events, protests, and violence—from 2013 to 2024. It harmonizes data from federal parliamentary inquiries and civil-society counseling centers to fixed 2024 administrative boundaries. The dataset provides a balanced county-year panel for 400 counties over 12 years, plus event-level and monthly files.

TabularAudioGeospatial🇩🇪 GermanyPolitical ViolencePanel DataFar Right ActivismCounty-Level+1

0 views

Speech & Audio

Neyshekar V4: 40,008 Persian Speech Clips for ASR and TTS

Neyshekar is an open Persian speech dataset containing 40,008 transcribed audio clips totaling approximately 63 hours. The data was collected from native Persian speakers through a community-driven crowdsourcing platform. This release, V4.1, was created by shekar-ai and last updated on June 15, 2026.

AudioAudio DataPersian LanguageSpeech CorpusCrowdsourcedSpeech Recognition+1

0 views

Speech & Audio

Human-Performed Piano Improvisation in F Sharp Dorian, 1.1 GB

Alexander Paul Burton's dataset documents a live, unedited rubato piano improvisation in F Sharp Dorian. The 1.1 GB release includes multiple file formats such as MP3, CSV, and XLSX, capturing a high-density performance across the C3 to C6 range. It is released under a CC-BY-4.0 license as open-access anti-algorithm training data.

TabularAudioCSVExcelAudio DataNeoclassicalPianoMusic PerformanceImprovisation+1

0 views

Speech & Audio

Memo2496: 2,496 Instrumental Songs with Expert Valence-Arousal Annotations

Memo2496 is a music emotion recognition dataset containing 2,496 instrumental songs annotated by 30 music experts. The dataset provides valence-arousal labels and extracted acoustic features to support affective computing research. It was updated by Qilin Li in April 2026 and is hosted on figshare under a CC-BY-4.0 license.

TabularAudioZIPCSVTextJSONAffective ComputingValence ArousalAudio FeaturesNatural Language ProcessingMusic Emotion RecognitionInstrumental Music+1

0 views

Speech & Audio

Discover Piano: Pre-Tokenized Solo Piano MIDI Dataset for Symbolic Music AI

Ultimate pre-tokenized solo Piano MIDI dataset for symbolic music AI and MIR purposes. The dataset was created by asigalov61 and was last updated on June 23, 2026. It is hosted on the Hugging Face platform.

AudioMusic Information RetrievalMidiSymbolic MusicPiano+1

0 views

Speech & Audio

Whiskered Auklet Ornament Symmetry Data for 721 Birds, 1992-2009

Ian L. Jones provides data and R code for a study on symmetry in facial feather ornaments of the Whiskered Auklet. The dataset covers 721 wild-caught marked individuals, with 162 of known sex and 94 of known age (1-16 years old), studied between 1992 and 2009. It was used to analyze effects of age, sex, body size, and condition on ornament asymmetries, and correlations with ocean climate and population measures.

TabularEcological DataBird SymmetryFluctuating asymmetryAuklet BiologyOrnamental Plumes+1

0 views

Speech & Audio

Permian Tuff and Palynology Ages from the Canning Basin, Western Australia

U–Pb dating of zircons from middle Permian tuffs in the Canning Basin reveals a conflict with established spore-pollen zonation. The dataset includes an age of 267.04 ± 0.14 Ma from the M. villosa Zone, which is 1.7 million years younger than tuffs from the D. granulata Zone. This research was published by Mory et al. in the Australian Journal of Earth Sciences in 2017.

TabularCanning BasinGeochronologyPalynologyStratigraphyLarge ScalePermian+1

0 views

Speech & Audio

Bullying Among Massachusetts Students Survey Data from 2009

2009 Massachusetts Youth Health Survey data analyzed by the Massachusetts Department of Public Health and CDC shows significant associations between bullying involvement and family violence. The report details adjusted odds ratios for middle and high school students categorized as bullies, victims, or bully-victims. Analysis controlled for age group, sex, and race/ethnicity.

TabularBullyingSurvey DataHealthcareRisk FactorsYouth HealthPublic Health+1

0 views

Speech & Audio

Khasi-English Multi-Task Speech and Text Dataset for Low-Resource NLP

A multi-task speech and text dataset for Khasi, an Austroasiatic language spoken by roughly 1.5 million people in Meghalaya, Northeast India. It was built to help bring Khasi into modern speech recognition, text-to-speech, and machine translation systems. The dataset was authored by RonitMehta260704 and last updated on HuggingFace in July 2026.

TextAudioText To SpeechMachine TranslationLarge ScaleNatural Language ProcessingKhasi LanguageLow Resource LanguageSpeech Recognition+1

0 views

Speech & Audio

MMGenre: Chinese Singing Voice and Score Pairs Across 10 Genres

MMGenre is a multi-genre benchmark for singing voice synthesis containing 3,152 aligned Chinese singing voice and symbolic music score segments. The data, derived from 148 songs, totals approximately 261.8 minutes (4.36 hours) of audio. It was created by author Leaky-ReLU and last updated on the Hugging Face platform in June 2026.

AudioMultimodalBenchmarkSinging Voice SynthesisMusic GenreChinese MusicAudio Score Alignment+1

0 views

Speech & Audio

PLAID 2014: High-Frequency Electrical Measurements from Household Appliances

The 2014 version of the Plug-Load Appliance Identification Dataset (PLAID) contains voltage and current measurements from different electrical household appliances. Data was sampled at 30 kHz and collected at 65 different locations in Pittsburgh, Pennsylvania (US). The dataset was authored by Jingkun Gao and is available under an Open Access license.

Time SeriesAppliance IdentificationEnergy ConsumptionSmart MeterIot+1

0 views

Speech & Audio

PLAID 2017: Electrical Appliance Voltage and Current Measurements

The 2017 version of the Plug-Load Appliance Identification Dataset (PLAID) contains voltage and current measurements from different electrical household appliances sampled at 30 kHz. Data was collected at 65 different locations in Pittsburgh, Pennsylvania (US). The dataset was authored by Leen De Baets.

Time SeriesIot SensingHousehold AppliancesEnergy ConsumptionSignal Processing+1

0 views

Speech & Audio

Mghana-st: Short Local-Language Audio Clips for Speech Tasks

mghana-st is a curated audio dataset of short local-language audio clips. The dataset is intended for speech tasks and contains annotations with English translations and tags for non-verbal events. It was created by adwumatech-ai and last updated on June 13, 2026.

AudioSpeaker IdentificationAudio ClipsSpeech TranslationSpeech RecognitionLocal Language+1

0 views

Speech & Audio

WRF-STILT Atmospheric Footprint Grids for Boston 2013-2014

July 2013 to December 2014 data provides gridded footprint fields from the WRF-STILT Lagrangian particle dispersion model for two receptor sites in Boston, MA. The 1-km resolution footprints quantify the influence of upwind surface fluxes on measured CO2 and CH4 concentrations. This dataset is produced by the National Aeronautics and Space Administration using meteorological fields from WRF version 3.5.1.

Time SeriesGeospatialZIPATMOSPHERIC CHEMISTRYCarbon FootprintUrban Air QualityMeteorological Modeling+1

0 views

Speech & Audio

Common Sense Facts Audio: Spoken Prompts with Correct and Counterfactual Completions

A collection of spoken common-sense factual prompts, each provided in three paired versions: an incomplete prompt, a correct factual completion, and an incorrect counterfactual completion. The dataset was created by author slprl and was last updated on 2026-06-18. It does not define standard train/validation/test splits, with all examples provided in a single neutral split.

TextAudioCommon SenseFactual Prompts+1

0 views

Speech & Audio

Beautiful Motifs: 319k+ High-Rated Music Motifs Extracted from MIDI Files

319,000+ high-rated beautiful music motifs extracted from 248,000+ high-quality MIDI files from the Discover MIDI dataset. The dataset was created by asigalov61 using a beauty metric based on cosine similarity of embeddings against a reference set, with scores ranging from 0.739 to 0.999. It was last updated on June 19, 2026.

AudioMusic GenerationMidiAudio EmbeddingsMusic Motifs+1

0 views

Speech & Audio

Mimba NNH TTS Dataset: Ngiemboon Synthetic Speech for Low-Resource TTS

A multi-speaker synthetic speech corpus for Ngiemboon (NNH), a Grassfields Bantu language of western Cameroon. Each item pairs a cleaned NNH sentence with machine-generated audio, intended for training on-device text-to-speech models. The dataset was created by 'mimba' and was last updated on 2026-06-24.

TextAudioText To SpeechSpeech CorpusNatural Language ProcessingNgiemboonSynthetic SpeechLow Resource LanguageSynthetic+1

0 views

Speech & Audio

Turkish TTS Combined Raw: 81,500 Speech Samples from Seven Sources

Turkish TTS Combined Raw is a speech dataset combining seven open-source Turkish text-to-speech collections. It contains approximately 81,500 audio samples recorded at 24kHz and is described as SNAC-compatible. The dataset was created by Hm12cbbcbx and was last updated on Hugging Face in June 2026.

AudioAudio DatasetSpeech SynthesisTurkish LanguageTts Training+1

0 views

PreviousPage 7 of 127Next

Speech & Audio Datasets | DataSalon