DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,586 datasets

Speech & Audio

Seed TTS Eval: Text-to-Speech Evaluation Dataset

Seed Tts Eval Arrow is a dataset for evaluating text-to-speech systems, published on HuggingFace by zhaochenyang20. The dataset was last updated on 2026-05-22. Its specific content and scale require verification after download.

AudioText To SpeechMachine LearningSpeech Evaluation+1

0 views

Speech & Audio

Dari Wavs: Audio Data for Speech Recognition

Dari Wavs is an audio dataset created by Sanji27. The description suggests the dataset could be expanded in size and include transcripts ready for automatic speech recognition (ASR). The dataset was last updated on May 17, 2026.

AudioAfghanistanDari LanguageSpeech Recognition+1

0 views

Speech & Audio

UMRMS Music Royalty Leakage Benchmark: Anonymized Streaming Logs

Anonymized streaming logs and composition statements for royalty auditing. The dataset appears to be a benchmark for detecting royalty leakage in the music industry. Its specific temporal coverage, size, and creator are not detailed in the provided metadata.

TabularAudioMusic IndustryAuditingStreaming LogsBenchmarkMusic Royalty+1

0 views

Speech & Audio

wild_asr_haid: Audio Data for Speech Recognition

wild_asr_haid is a dataset hosted on Kaggle. The title suggests it contains audio data, likely for automatic speech recognition (ASR) tasks. Its specific content, size, and origin require verification after download.

AudioWild DataAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Japanese Speech Synthesis Benchmark for Language Model Evaluation

Voicebench Ja contains 4 subsets created by applying speech synthesis to samples from three Japanese text benchmarks: Elyza-tasks-100, M-IFEval, and JamC-QA. The dataset was constructed by SB Intuitions using their internal TTS model and JVS corpus audio prompts to quantitatively evaluate performance gaps between audio and text inputs for language models. It was last updated on March 30, 2026.

TextAudioParquetSize Categories1 Kn10 KLibrarypolarsLibrarydaskArxiv190806248Speech SynthesisModalitytextLibrarymlcroissantLicensecc By Sa 40BenchmarkingLibrarydatasetsArxiv260312565Arxiv250204688RegionusJapanese Language+1

0 views

Speech & Audio

ADLIB DevTerm: Japanese Software Development Terminology Audio Test Set

A test audio dataset for the ADLIB language-aware ASR benchmark framework for Japanese. It contains 247 test cases with audio from 3 speakers, focusing on the DevTerm (software development terminology) domain. Reference transcripts and term annotations are provided in a separate JSONL file within the project's GitHub repository.

AudioAsr BenchmarkAudio TestingBenchmarkSoftware DevelopmentJapanese LanguageSpeech Recognition+1

0 views

Speech & Audio

Turkana Speech Dataset: 529 Audio Clips from Bible Narratives

529 audio segments totaling 46 minutes provide speech data for Turkana, an Eastern Nilotic language with roughly 1 million speakers in Kenya. The dataset was created by Speedykom using Bible narratives from the Global Recordings Network, segmented via silence detection. Transcripts were auto-generated using the facebook/mms-1b-all model with a Teso adapter.

AudioBible NarrativesKENYAAudio TranscriptionLow Resource LanguageSpeech RecognitionSynthetic+1

0 views

Speech & Audio

TTS Crawl: Text-to-Speech Audio Samples

A text-to-speech dataset published on HuggingFace by author thach124. The dataset was last updated on 2026-05-28 10:23:30. Its specific content and scale are unknown from the provided metadata.

AudioText To SpeechSpeech SynthesisAudio Generation+1

0 views

Speech & Audio

CYGNSS Satellite Ocean Surface Radar Cross Section Measurements

NASA's CYGNSS constellation provides calibrated Delay Doppler Maps (DDMs) measuring ocean surface scattering. The dataset contains daily files from up to 8 spacecraft, with a typical latency of 6 days from measurement. Version 3.1, produced by POCLOUD, supersedes Version 3.0 with improved antenna gain calibration and quantization correction.

Time SeriesGeospatialRadarOceanographyWeatherSatellite Remote SensingHealthcare+1

0 views

Speech & Audio

Greek Laiko Music Metadata for Research and Analysis

Structured metadata for Greek laïko music tracks, intended for research and machine learning. The dataset includes fields for emotion, era, and genre but does not contain audio files. It was created by author christosfouk and was last updated on 2026-04-16.

TabularAudioLaiko MusicGreek MusicMusic AnalysisLaikoMusic Metadata+1

0 views

Speech & Audio

Librispeech-PC: High-Quality Audio Replacement for Speech Recognition

Librispeech-PC 44kHz Opus replaces the original Librispeech PC audio with higher-quality source material encoded as Opus at 64 kbps. Sampling rates are increased from 16kHz up to 48kHz, depending on the source. The dataset was created by mythicinfinity and last updated on March 28, 2026.

AudioOpus EncodingAudio ProcessingLibrispeechSpeech Recognition+1

0 views

Speech & Audio

pasr-xauusd-m3-wfo: Gold vs USD Exchange Rate at 3-Minute Intervals

A time-series dataset of the Gold (XAU) to US Dollar (USD) exchange rate, likely recorded at 3-minute intervals. The dataset is published on Kaggle, but its author, size, and specific time range are unknown. The raw description suggests it contains price data for the forex pair XAU/USD.

Time SeriesGold PriceForexFinancial Markets+1

0 views

Speech & Audio

TTS Model Evaluation Votes and Feedback

A SQLite database contains user votes and feedback from TTS Arena, a platform for comparing text-to-speech models. The dataset was created by Pendrokar and last updated in April 2026. It is designed to help developers identify model faults through community evaluation.

TabularAudioText To SpeechSpeech SynthesisModel EvaluationUser Votes+1

0 views

Speech & Audio

Massachusetts and Rhode Island Benthic Points for Oil Spill Sensitivity

A 2016 geospatial dataset from NOAA characterizing macroalgae beds for oil spill sensitivity planning in Massachusetts and Rhode Island. Vector points represent vegetation beds, with associated tables containing species-specific abundance, seasonality, and life history information. The data is part of a larger Environmental Sensitivity Index (ESI) effort to map coastal resources.

Geospatial🌎 North AmericaGeospatial PointsRhode IslandMassachusettsMacroalgaeOil Spill PlanningEnvironmental monitoringNoaaEnvironmental ImpactsCoastal Zone ManagementOil Spill SensitivityCoastal managementOil SpillsSensitivity MapsOffice Of Response And RestorationDocnoaanosorrNational Ocean ServiceEarth ScienceEsiHUMAN DIMENSIONSBenthic EcologyCoastal ResourcesBenthicptContinentMassachusetts And Rhode Island+1

0 views

Speech & Audio

Massachusetts and Rhode Island Benthic Species Data for Oil Spill Sensitivity

National Oceanic and Atmospheric Administration (NOAA) data for Massachusetts and Rhode Island contains sensitive biological resource data for benthic species. Vector polygons represent submerged aquatic vegetation and macroalgae, with associated tables for species abundance, seasonality, and life history. This data is part of the Environmental Sensitivity Index (ESI) characterizing coastal environments by their sensitivity to oil spills.

Geospatial🌎 North AmericaRhode IslandMassachusettsOil Spill PlanningEnvironmental monitoringNoaaEnvironmental ImpactsCoastal Zone ManagementOil SpillsSensitivity MapsOffice Of Response And RestorationBenthicDocnoaanosorrNational Ocean ServiceEarth ScienceEsiHUMAN DIMENSIONSCoastal ResourcesContinentBenthic SpeciesMassachusetts And Rhode Island+1

0 views

Speech & Audio

YouTube FA ASR: Speech Recognition Dataset

A speech recognition dataset sourced from YouTube, likely containing audio and corresponding transcriptions. It was published by user 'veziriii' on Hugging Face and was last updated on May 23, 2026. The specific content and scale require verification after download.

TextAudioMultilingualYoutubeAudio TranscriptionSpeech Recognition+1

0 views

Speech & Audio

SMAPVEX19-22: UAVSAR Radar Mosaics for Forest Soil Moisture

SMAPVEX19-22 field campaign collected daily mosaicked UAVSAR images at three polarization configurations from April to July 2022 near Petersham, Massachusetts. The terrain-flattened gamma-corrected radar data targets forested land cover to validate satellite-derived soil moisture estimates. This dataset supports the Soil Moisture Active Passive Validation Experiment's goal of improving remote sensing accuracy in vegetated areas.

ImageGeospatialRadarSynthetic Aperture RadarComputer VisionSpectral EngineeringSoil MoistureEarth ScienceUavsarSynthetic+1

0 views

Speech & Audio

EN Emilia Yodas ScribeEvents: Vocal Bursts and Background Sounds for TTS

16,017 audio samples filtered from a larger 616-hour speech dataset to contain only ElevenLabs Scribe v1 audio events. The dataset, created by TTS-AGI, focuses on vocal bursts and background sounds with unified annotation formatting. It was last updated on March 28, 2026.

AudioVocal BurstsSpeech SynthesisTts TrainingAudio Events+1

0 views

Speech & Audio

EmiratiTTS: 10 Smoke Test Audio Clips for Emirati Arabic Speech Synthesis

10 audio clips serve as a stage 0.5 acceptance check for the EmiratiTTS project, fine-tuned for Emirati Arabic. The clips were created by Alqayed2024 to verify the data, tokenizer, and pipeline wiring before a full fine-tuning run. The dataset page was last updated on April 12, 2026.

AudioMultilingualText To SpeechSpeech SynthesisArabic LanguageAi Training+1

0 views

Speech & Audio

Librispeech Manifests: Speech Recognition Audio Data

Librispeech_manifests likely contains metadata files for the LibriSpeech corpus, a widely used benchmark in automatic speech recognition. The dataset is published on Kaggle, but its specific contents and scale are not detailed in the available metadata. Columns and sample data are unknown, requiring verification after download to confirm the exact structure and utility.

AudioMachine LearningAudio ProcessingSpeech Recognition+1

0 views

PreviousPage 31 of 130Next