DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,579 datasets

Speech & Audio

MECNature Audio Dataset: Natural Sounds with Music and Mechanical Noise Removed

An audio dataset focused on extracting natural sounds. The description indicates the data was processed to remove music and mechanical noise. The dataset's author, organization, and specific scale are unknown.

AudioAudio PreprocessingEnvironmental SoundsSound Classification+1

0 views

Speech & Audio

YodaLingua-Danish: 21 Hours of Danish Speech from 925 Speakers

YodaLingua-Danish is a speech dataset containing 7,871 audio-transcription pairs totaling 21 hours of Danish speech. It was created by Thomcles and is part of the multilingual YodaLingua collection. The dataset was last updated on Hugging Face in April 2026.

TextAudioMultilingualText To SpeechDanishAutomatic Speech Recognition+1

0 views

Speech & Audio

Common Voice 11.0: Multilingual Speech Corpus with Demographic Metadata

Common Voice Corpus 11.0 is a multilingual speech dataset consisting of MP3 audio files paired with corresponding text transcriptions. The dataset contains 24,210 recorded hours, with 16,413 validated hours across 100 languages. Many recordings include demographic metadata such as age, sex, and accent.

TabularAudioMultilingualNatural Language ProcessingDemographicsAudio CorpusSpeech Recognition+1

0 views

Speech & Audio

Music Cover CLIP Embeddings for ~3.5 Million Album Covers

CLIP ViT-L/14 embeddings for approximately 3.5 million music album covers combine image and text features. Each row contains a 1536-dimensional, L2-normalized vector concatenating the cover image embedding and a text embedding of the format '{title} by: {artist}'. The dataset was created by the author 'dyslexi' and was last updated on 2026-04-30.

AudioMultimodalMusic CoverClip EmbeddingsAlbum ArtComputer VisionMultimodal Embeddings+1

0 views

Speech & Audio

Nitrogen Flux and Travel Time Data from a Restored Massachusetts Wetland, 2016-2024

Tidmarsh, a former cranberry farm restored to a wetland in Plymouth, Massachusetts, is the source of this data. The dataset contains surface water discharge, nitrogen and nitrate concentrations, and specific conductivity measurements collected between 2016 and 2024. It was created by the Department of Agriculture to support watershed-scale modeling and analysis of nutrient retention.

TabularExcelNitrogenWetlandSurface WaterWater QualityWetland RestorationCranberryNitrogen Flux+1

0 views

Speech & Audio

Golha Asr Gold 69: Persian Radio Program Collection

Golha Asr Gold 69 is a dataset published on huggingface by Reza2kn. The title suggests it contains audio recordings, likely from the Persian Golha radio program series. The dataset was last updated on 2026-06-08.

AudioPersianCultural Heritage+1

0 views

Speech & Audio

Guide to the Collection for From Reed to Ney: Musical Craftsmanship in Turkey

A 214.8 KB PDF guide for the project "From Reed to Ney: Documenting Musical Craftsmanship and Pedagogy in Turkey." The guide was authored by Banu Senay and last updated on April 22, 2026. It is hosted on figshare under a CC-BY-NC-SA 4.0 license.

TextPedagogyCraftsmanshipTurkeyMusic EthnographyDocumentation+1

0 views

Speech & Audio

EGYSpeak: 147,979 Egyptian Arabic Speech Clips with Transcriptions

EGYSpeak is a curated dataset of 147,979 single-speaker Egyptian Arabic audio clips paired with transcriptions. It was created by MohamedGomaa30, sourced from the fadisarwat/egyptian-arabic-lines Kaggle dataset and processed through an ASR pipeline. The dataset was last updated on Hugging Face in April 2026.

AudioEgyptian-ArabicDialectSpeech Recognition+1

0 views

Speech & Audio

Ttsdistil: A Distilled Text-to-Speech Dataset

Ttsdistil is a dataset hosted on HuggingFace by author ShiniChien. The dataset's title suggests a focus on text-to-speech, potentially containing audio data for speech synthesis models. Its content and scale are unspecified, requiring verification after download.

AudioText To SpeechDistilled ModelSpeech Synthesis+1

0 views

Speech & Audio

Tizuzaf Audio Dataset

An audio dataset published on HuggingFace by user abdelhaqueidali. The dataset's specific content and size are not detailed in the provided metadata. It was last updated on June 7, 2026.

AudioMachine Learning+1

0 views

Speech & Audio

Massachusetts Land Surface Temperature Index from Satellite Imagery, 2018-2020

A spatial land surface temperature (LST) index dataset for Massachusetts produced by MAPC Data Services. The data is derived from satellite imagery captured between April and October from 2018 to 2020, providing a relative heat tendency measure for each 30-meter pixel. The download includes three complementary datasets: the LST index, a variability raster, and a shapefile of the hottest 5% of areas.

GeospatialMassachusettsUrban HeatSatellite ImageryLand Surface TemperatureGeospatial Analysis+1

0 views

Speech & Audio

Ukrainian Audiobook Speech Dataset for TTS and ASR

Ukrainian speech dataset for TTS and ASR tasks, processed from the Yehor/audiobooks-xxl source. The audio has been filtered for music and noise, resampled to 24 kHz, and transcribed using the nvidia/canary-1b-v2 model. The dataset was created by Mikhailo and last updated on April 29, 2026.

AudioSpeech SynthesisUkrainian LanguageAudiobooksAudio ProcessingSynthetic+1

0 views

Speech & Audio

TikTok Daily Trending Hashtags and Music Rankings from 2024 to 2025

TikTok Trending Hashtags and Music (2024 - 2025) contains the top 100 daily trending hashtags and music records from the TikTok Creative Center. The dataset covers the period from 2024-05-23 to 2025-07-09 and includes 13,399 unique hashtags and 11,157 unique songs. It was uploaded by author lingbow to Hugging Face.

TabularAudioMusic PopularityHashtag AnalysisSocial Media TrendsTiktok+1

0 views

Speech & Audio

SlovakSpeechMale: Slovak Male Voice Audio for Text-to-Speech

SlovakSpeechMale is a speech synthesis dataset containing approximately one hour of Slovak language audio recorded by a male speaker. The dataset is hosted on Hugging Face by the author 'neurlang' and was last updated in May 2026. It is specifically designed for text-to-speech (TTS) applications and includes Slovak text transcripts.

AudioText To SpeechSpeech SynthesisLicensecc By Sa 40RegionusSlovak LanguageMale Voice+1

0 views

Speech & Audio

Raw Emocean: 15-Hour English Speech Dataset for TTS Training

Raw Emocean is a large-scale English speech dataset designed for training autoregressive text-to-speech models. It contains 8,649 audio segments totaling 15.39 hours, sourced from 22 videos, with an average segment duration of 6.4 seconds. The dataset was created by author somu9 and last updated on Hugging Face in April 2026.

AudioText To SpeechMachine LearningLarge Scale+1

0 views

Speech & Audio

Replication Data for: Effects of Music and Drama-Based Interventions on Psychological Well

22 institutionalized older adults participated in a quasi-experimental study examining arts-based interventions. Daniela Lourenço collected data on life satisfaction and depressive symptoms at baseline and post-intervention using the Satisfaction With Life Scale and Geriatric Depression Scale. The dataset was last updated on 2026-04-22 via Harvard Dataverse.

TabularAudioLong Term CareBenchmarkQuasi ExperimentalArts InterventionPsychological Well BeingGeriatric Health+1

0 views

Speech & Audio

CYGNSS Level 1: Satellite Radar Cross Section Maps for Ocean Wind

CYGNSS Level 1 Science Data Record Version 2.1 provides calibrated Delay Doppler Maps from a constellation of eight satellites. The dataset includes bistatic radar cross section measurements, quality flags, and geolocation parameters, with up to eight files generated daily. NASA produced this second science-quality release, which includes improvements like additional data during orbital maneuvers and reduced measurement biases.

Time SeriesGeospatialRadar MeasurementsOcean WindEarth Science Platform Characteristics Spectral EnSatellite Remote SensingHealthcareEarth Science Radar Spectral Engineering Radar CroEarth Science Radar Spectral Engineering Radar RefEarth ScienceEarth Science Radar Spectral Engineering Sigma Nau+1

0 views

Speech & Audio

Musicscape Groningen: Live Music Landscape in 2010

A booklet describing the musical landscape and live performances in the city of Groningen during 2010. The document was published by the Dutch Ministry of the Interior and Kingdom Relations and is available under a CC-BY-4.0 license. The exact data format and volume within the PDF are unspecified.

TextLive PerformanceGroningenCultural DataMusic Landscape+1

0 views

Speech & Audio

God-Level Music Producer Workflows for LLM Training, 9,941 Examples

9,941 high-quality examples of advanced music production workflows, created by author 11-47. The dataset is intended for training large language models to act as elite music producers across genres like Rap, Crunk, and Dubstep. It was last updated on April 23, 2026.

TextAudioWorkflow ExamplesElectronic MusicLlm TrainingLarge ScaleMusic Production+1

0 views

Speech & Audio

Music Data for Vietnam, 2015-2026

Vietnam is the geographic focus of this dataset, which appears to contain information related to music from 2015 to 2026. The data is hosted on Kaggle, a platform for data science and machine learning projects. The specific content, collection method, and original author are not detailed in the available metadata.

TabularAudioTime SeriesVietnam+1

0 views

PreviousPage 24 of 129Next

Speech & Audio Datasets | DataSalon