DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,575 datasets

Speech & Audio

Arg Spanish TTS: Argentine Spanish Speech Corpus for Multi-Speaker TTS

arg-spanish-tts is a unified, deduplicated speech corpus for Argentine Spanish (es-AR) containing 10,747 audio rows. The dataset was created by Kukedlc, who merged three public datasets and stripped cross-source duplicates. All audio is resampled to 24 kHz mono, totaling 12.18 hours from 65 unique speakers.

AudioText To SpeechSpeech CorpusNatural Language ProcessingAudio ProcessingArgentine Spanish+1

0 views

Speech & Audio

Evidencefirst Audio: Captions and QA Pairs for AudioSet Clips

EvA Open Data provides audio clips paired with descriptive captions and instruction-based question-answer data. The audio is sourced from the AudioSet Strong Labels dataset and stored in parquet shards. The dataset was authored by SatsukiVie and last updated on Hugging Face in May 2026.

AudioMultimodalAudiosetAudio CaptionsInstruction Qa+1

0 views

Speech & Audio

Keyword Spotting Others: Audio Samples for Speech Recognition

A dataset titled 'Keyword Spotting Others' was published on the Hugging Face platform by Sarvina. The dataset's specific content and scale are not detailed in the provided metadata. Its last recorded update was on July 6, 2026.

AudioAudio ClassificationKeyword SpottingSpeech Recognition+1

0 views

Speech & Audio

DNS5 Challenge: Multilingual Speech and Noise Audio for Denoising

DNS5 Challenge data is a mirrored collection of audio files for speech enhancement tasks. It contains 245 hours of English, 95 hours of French, and 137 hours of German speech sourced from LibriVox, AudioSet, Freesound, OpenSLR26, and OpenSLR28. The dataset was converted to Opus format by user philgzl and last updated in May 2026.

AudioMultilingualMachine LearningNoise SuppressionSpeech Audio+1

0 views

Speech & Audio

FIFE Site Averaged Flux Data From 1987-1989

ORNL_CLOUD provides the Site Averaged Flux Data: 1987 (Betts) Data Set from the 1987-1989 FIFE experiment. This dataset contains site-averaged product data collected by multiple principal investigators, structured in 30-minute time intervals for 1987 and covering the entire 1987-1989 period. The data is available in multiple file formats including HTML, PDF, PNG, BIN, ISO, ZIP, and TEXT.

TabularTime SeriesZIPTextFife ExperimentATMOSPHERIC RADIATIONLand Surface FluxSoil Heat Budget+1

0 views

Speech & Audio

YodaLingua-Arabic: 730 Hours of Speech for Text-to-Speech and ASR

260,162 audio-transcription pairs totaling 730 hours of speech data from 13,290 distinct speakers. This Arabic portion of the YodaLingua collection is designed for training text-to-speech and automatic speech recognition models. The dataset was created by Thomcles and was last updated on May 12, 2026.

AudioMultimodalMultilingualSpeech SynthesisArabic LanguageMultilingual SpeechAudio Text Alignment+1

0 views

Speech & Audio

Massachusetts Wind and Solar Project Permitting Timelines and Outcomes

An original dataset documents the permitting processes for locally-permitted wind and solar energy projects in Massachusetts. Created by Natalie Baillargeon, it contains data on permitting durations, project outcomes, and capacity. The dataset was last updated on April 8, 2026, and is shared under a CC-BY-4.0 license.

TabularEnergy PermittingLocal GovernancePolicy AnalysisWind EnergySolar Energy+1

0 views

Speech & Audio

ChildTalk: Multi-Dialect Chinese Child Speech Corpus with Full-Length Conversations

ChildTalk is a large-scale, publicly available multi-dialect Chinese child speech dialogue dataset. It addresses limitations in existing corpora, such as small size and lack of natural conversations, by providing full-length dialogue recordings. The dataset was created by yujie-ovo and was last updated on May 29, 2026.

TextAudioChild SpeechLarge ScaleNatural Language ProcessingChinese DialectsSpeech Recognition+1

0 views

Speech & Audio

NADI2026 Subtask2 MixedASR: Mixed Arabic Dialect Speech Recognition Data

A dataset for the Mixed Automatic Speech Recognition subtask of the NADI2026 shared task, created by UBC-NLP. The dataset was last updated on June 24, 2026. The specific content and size are not detailed in the provided metadata.

AudioArabic DialectsMixed SpeechSpeech RecognitionAutomatic Speech Recognition+1

0 views

Speech & Audio

SonoroNova-ES: Large-Scale Synthetic English-to-Spanish Speech Translation

SonoroNova-ES is a large-scale synthetic English-to-Spanish speech-to-speech translation dataset containing 329,764 utterances. It was constructed via cascade pipelines combining text-to-text translation models with neural text-to-speech engines, using source audio derived from the HiFiTTS-2 English audiobook corpus. The dataset features 1,315 unique speakers and provides a total of 961 hours of audio.

AudioEnglish SpanishLarge ScaleNatural Language ProcessingSpeech TranslationSynthetic SpeechSyntheticAudio Synthesis+1

0 views

Speech & Audio

Ukrainian Speech and Subtitle Clips from Toronto TV YouTube Channel

A Ukrainian-language speech dataset parsed from the Телебачення Торонто YouTube channel. Each sample consists of a short audio clip paired with its corresponding Ukrainian subtitle text, intended for automatic speech recognition research and education. The dataset was created by yuriilaba and was last updated on Hugging Face in May 2026.

TextAudioUkrainian LanguageAudio TextSpeech RecognitionYoutube Content+1

0 views

Speech & Audio

Somali Voice Dataset

Somali-language audio data published on HuggingFace by jamailyaz. The dataset was last updated on July 8, 2026. Its specific content and scale require verification after download.

AudioAudio DataSomali LanguageSpeech Recognition+1

0 views

Speech & Audio

Permian Tuff Zircon Ages and Palynology Data from the Canning Basin, Australia

Western Australia's Canning Basin provides data on apparent age conflicts in middle Permian stratigraphy. The dataset likely contains U-Pb zircon dates from tuffs and associated palynological zone information, published in a 2017 study by Mory et al. in the Australian Journal of Earth Sciences. It documents a 1.7-million-year discrepancy between CA-IDTIMS dates and established spore-pollen zonation.

TabularCanning BasinGeochronologyPalynologyStratigraphyLarge ScalePermian+1

0 views

Speech & Audio

Site Averaged Meteorological Data from 1987-1989 FIFE Experiment

Kansas, USA hosts this site-averaged dataset from Portable Automatic Meteorological Stations deployed during the 1987-1989 FIFE experiment. It contains 30-minute interval measurements of atmospheric and surface conditions. The dataset is provided by the National Aeronautics and Space Administration.

TabularTime SeriesZIPTextAtmospheric MeasurementsSolar RadiationSoil TemperatureSurface WindsMeteorological Station+1

0 views

Speech & Audio

VoxCeleb2 3S Chunk: Audio Segments for Speaker Recognition

Audio segments derived from the VoxCeleb2 dataset, which is a collection of speech from celebrity interviews. The dataset is hosted on Hugging Face by the author AudioJoe and was last updated in July 2026. The specific content and scale of this '3S Chunk' version require verification after download.

AudioSpeaker VerificationAudio ChunksSpeech Recognition+1

0 views

Speech & Audio

WASIL: 9,304 In-the-Wild Spoken Arabic Prompts with User Feedback

9,304 spoken Arabic prompts from 93 users interacting with an ASR and LLM-based assistant. The WASIL dataset, created by QCRI, captures in-the-wild interactions across multiple dialects and countries, including explicit user feedback signals like likes, dislikes, and scalar scores. The dataset was last updated on Hugging Face in May 2026.

TextAudioSpoken ArabicDialect SpeechLlm InteractionUser Feedback+1

0 views

Speech & Audio

Vi Asr Cascaded: Vietnamese Automatic Speech Recognition Data

Vi Asr Cascaded is a dataset for Vietnamese automatic speech recognition, published on the Hugging Face platform by author pnnbao-ump. The dataset was last updated on June 27, 2026, though specific details on its size, format, and content are not provided in the metadata. Its title suggests it is designed for cascaded ASR model training or evaluation.

AudioVietnamese LanguageAudio ProcessingSpeech RecognitionAutomatic Speech Recognition+1

0 views

Speech & Audio

Voices In The Wild 2M

Voices in the Wild 2M is an automatic speech recognition dataset designed for robustness training and evaluation. The dataset contains audio files grouped by normalized acoustic subset, with fields for file paths and reference transcriptions. It was created by author zhifeixie and last updated on Hugging Face in May 2026.

AudioTime SeriesBenchmarkRobustnessAudio TranscriptionSpeech RecognitionAcoustic Conditions+1

0 views

Speech & Audio

Site Averaged Neutron Soil Moisture Data from 1987 FIFE Experiment

1987 data from the FIFE experiment provides site-averaged daily neutron probe soil moisture measurements. The dataset contains product data where samples were averaged first for each site and then for each day. It is managed by ORNL_CLOUD and originates from a field campaign conducted from 1987 to 1989.

TabularZIPTextLAND SURFACESoil MoistureEarth ScienceField Experiment+1

0 views

Speech & Audio

Site Averaged Gravimetric Soil Moisture Data from 1988 FIFE Experiment

Kansas site-averaged gravimetric soil moisture data was collected during the 1987-1989 FIFE field campaign. This dataset contains only the 1988 product, where samples were averaged first by site and then by day. The data is managed by the ORNL_CLOUD organization.

TabularZIPTextLAND SURFACESoil MoistureEarth ScienceField Experiment+1

0 views

PreviousPage 18 of 129Next