DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,575 datasets

Speech & Audio

Massachusetts Bay Oceanographic Measurements 2002-2005

Oceanographic measurements collected in Massachusetts Bay and the surrounding area. The dataset covers a multi-year period from 2002 to 2005 and is provided by the National Aeronautics and Space Administration. Data includes parameters related to ocean chemistry, optics, temperature, and salinity.

Time SeriesGeospatialOcean TemperatureMassachusetts BayOcean OpticsSalinity DensityOcean Chemistry+1

0 views

Speech & Audio

Coastal Ocean Measurements for New England in 2009

NASA collected in-situ oceanographic data along the coastal regions of New Hampshire and Massachusetts during 2009. The dataset includes measurements related to ocean chemistry, optics, temperature, and salinity. It is available in BIN and ISO file formats.

Time SeriesGeospatialOcean TemperatureOcean OpticsCoastal MeasurementsSalinity DensityOcean Chemistry+1

0 views

Speech & Audio

SpeakerCard-1M: Evidence-Grounded Speaker Traits from VoxCeleb

SpeakerCard-1M is a speaker-centric corpus built on the VoxCeleb1 and VoxCeleb2 datasets. It was created by JYP2024 using a tool-first, LLM-last pipeline where ten acoustic probes extract evidence for a structured schema. The dataset was last updated on June 3, 2026.

AudioVoxcelebSpeaker VerificationAcoustic analysisSpeech ProcessingNatural Language Processing+1

0 views

Speech & Audio

Tajik ASR Corpus: Multisource Speech Data for Automatic Speech Recognition

Tajik ASR Corpus v0 is a deduplicated collection for automatic speech recognition assembled from multiple sources. The dataset, created by Peacockery, includes data from FLEURS-derived speech, Mozilla Common Voice 25 Tajik, and augmented data from Muhtasham Tajik ASR. Each data split is provided in TSV format with an audio directory, and a SQLite version includes additional normalized fields.

TabularAudioMultisource CorpusNatural Language ProcessingTajik LanguageSpeech Recognition+1

0 views

Speech & Audio

OpenSLR LibriSpeech ASR 12000-15999: Speech Recognition Audio and Transcripts

OpenSLR LibriSpeech ASR 12000-15999 is a speech recognition dataset published on the Hugging Face platform by user Kimang18. The dataset, last updated on 2026-07-16, likely contains audio recordings and corresponding text transcripts, as suggested by its title and platform tags. The specific number of samples, file formats, and license details are not provided in the available metadata.

TextAudioAudio DataAudio CorpusLibrispeechSpeech Recognition+1

0 views

Speech & Audio

RGAD Cross-Lingual TTS: 10-Hour Prompt-Conditioned Chinese Speech Dataset

RGAD Cross-Lingual TTS 10h is a 10-hour speech dataset for prompt-conditioned text-to-speech fine-tuning. It was created by author 'isabeth' and last updated on Hugging Face in May 2026. The dataset contains paired audio prompts and targets across languages, specifically for generating Chinese speech from prompts in other languages.

TabularAudioText To SpeechSpeech SynthesisPrompt ConditionedChinese LanguageCross Lingual+1

0 views

Speech & Audio

KasaSpeech: English-Twi Code-Switching Speech Recordings from Ghana

Ghana is the primary source for KasaSpeech, a large-scale speech dataset featuring natural switching between English and Twi. It contains 49,878 transcribed audio samples, split into training, validation, and test sets. The dataset was created by Kennethdot and last updated on Hugging Face in May 2026.

AudioCode SwitchingLarge ScaleAfrican LanguagesSpeech RecognitionTwiLow Resource Languages+1

0 views

Speech & Audio

Pashto Speech Recognition Dataset with Domain-Specific Utterances

A domain-specific Pashto automatic speech recognition dataset covering agriculture, general topics, food services, health, and services. The dataset is structured by domain with audio files and corresponding transcript CSV files, created by Sabtain-Dev and last updated on June 5, 2026.

AudioDomain SpecificHealthcareSpeech RecognitionPashto LanguageAudio Transcripts+1

0 views

Speech & Audio

Roadian–Wordian CA-IDTIMS and Palynology Ages from the Canning Basin

Mory et al. (2017) published zircon U–Pb ages from middle Permian tuffs in Western Australia's Canning Basin. The data reveals an apparent conflict between CA-IDTIMS ages and established spore-pollen zonation, with a specific age of 267.04 ± 0.14 Ma reported from the Pittston SD-1 drillhole. The dataset is hosted by the Australian Ocean Data Network and was last updated in April 2026.

TabularPermian AgeCanning BasinGeochronologyPalynologyStratigraphyLarge Scale+1

0 views

Speech & Audio

MERIT: Factor-Controlled Music Triplet Dataset

MERIT is a dataset of audio triplets designed for training a framework that learns three independent music similarity spaces: melody, rhythm, and timbre. It was created by the AMAAI-Lab and is hosted on Hugging Face. The dataset page was last updated on 2026-05-26.

AudioMachine LearningAudio TripletsDisentangled RepresentationsMusic Similarity+1

0 views

Speech & Audio

NARSTO EPA SS: Single-Particle Aerosol Mass Spectra from Pittsburgh

Between September 20 and December 27, 2001, a Rapid Single-Particle Mass Spectrometer (RSMS) captured real-time composition data for individual aerosol particles in Pittsburgh. Each record includes aerodynamic particle size, positive and negative mass spectra, and precise measurement time, enabling analysis of particle-to-particle variation. The data covers nine logarithmically spaced size classes from about 40 to 1300 nanometers.

TabularTime SeriesGeospatialAerosol ParticlesEnvironmental scienceAir QualityHealthcareSingle Particle Analysis+1

0 views

Speech & Audio

Music Performance Anxiety and Flow Under Simulation Conditions

An anonymised dataset from a study investigating music performance anxiety and flow under performance simulation conditions. The dataset is 49.9 KB in size and was last updated on 2026-05-21. It was published by a research team under a CC-BY-4.0 license on figshare.

TabularAudioExcelPsychologyBehavioral ScienceMusic PerformanceFlowAnxiety+1

0 views

Speech & Audio

Survey Results on Tsunami Behavior and Evacuation in Toyama Bay, Japan

Primary survey results from a post-event questionnaire conducted in the coastal region of Toyama Prefecture, Japan, following the 2024 Noto Peninsula Earthquake tsunami. The dataset was created by Shuichi Kure and is associated with a 2025 research paper in Coastal Engineering Journal. It consists of 1.4 MB of data available in PDF, TXT, and XLSX formats.

Tabular🇯🇵 JapanTextExcelTsunami EvacuationSurvey ResultsPost Event SurveyCoastal Engineering+1

0 views

Speech & Audio

Locations of Quebec's Music and Dramatic Art Conservatories

Establishments of the Conservatory of Music and Dramatic Art of Quebec provides a list and geolocation of its establishments. The dataset is published by the Government and Municipalities of Québec under a CC-BY-4.0 license and was last updated on 2026-04-22.

TabularAudioGeospatialCSVQuebecCultural InstitutionsGeolocationDramatic ArtMusic Education+1

0 views

Speech & Audio

TikTok Music Storytelling Codebook

A codebook for analyzing storytelling in music content on TikTok. The dataset is published on the Papers with Code platform under an Open Access license. The author is listed as 'a v', but other details like size and update date are unknown.

TextAudioContent AnalysisCodebookStorytellingTiktok+1

0 views

Speech & Audio

Hindi Teleconsultation Benchmark for Semantic Intent Alignment and Latency

100 high-fidelity, simulated clinical teleconsultation interactions in Hindi are provided to benchmark hybrid ASR-LLM systems. The dataset is designed to test the resolution of translation gaps, semantic drift, and phonetic errors by mapping colloquial symptom descriptions to SNOMED-CT and ICD-11 ontologies. It was authored by Aryan Raj Thakur and last updated on June 9, 2026.

TextTabularMultilingualTelemedicineHindi LanguageClinical IntentBenchmarkHealthcareSpeech RecognitionSynthetic+1

0 views

Speech & Audio

Pidgin ASR Combined: Nigerian Pidgin Speech-to-Text Dataset with 8.6 Hours of Audio

Pidgin ASR Combined is a unified Nigerian Pidgin English speech-to-text dataset created by michaelodafe. It contains approximately 8.6 hours of audio across 4,278 clips from 10 source speakers, formatted as 16 kHz mono WAV files. The dataset was last updated on 2026-05-13 and was used to train a Whisper model that achieved a 21.37% word error rate.

AudioWhisper ModelAudio DatasetPidgin EnglishBenchmarkNigerian LanguageSpeech Recognition+1

0 views

Speech & Audio

ASR Code-Switch: 1,200 Code-Switching Utterances for Speech Recognition

1,200 code-switching utterances form a curated benchmark for evaluating commercial Automatic Speech Recognition systems. The dataset, created by Perle-ai, includes 300 samples each for four language pairs, such as Egyptian Arabic–English. It was last updated on May 21, 2026.

AudioMultilingualCode SwitchingBenchmarkSpeech Recognition+1

0 views

Speech & Audio

sWuggy: A Spoken Lexical-Discrimination Audio Benchmark

sWuggy is a spoken lexical-discrimination benchmark for evaluating spoken language models. Each item is a pair of a real word and a phonotactically matched pseudo-word, synthesized as audio. The dataset is hosted by the author 'coml' and was last updated on 2026-05-29.

AudioLexical DiscriminationPhonotacticsBenchmarkSpeech RecognitionAudio Benchmark+1

0 views

Speech & Audio

Suno AI Music Dataset: Multi-Genre Audio with Metadata for ML

A human-curated, multi-genre audio dataset generated with Suno V5.5 (chirp-fenix), covering 100+ sub-sub-genres across electronic, hip-hop, Latin, jazz, world, rock, ambient, pop, reggae, and classical music. Each track includes full audio (MP3), cover art, the original generation prompt, and a 32-column metadata schema. The dataset was created by author Kukedlc and last updated on 2026-05 25.

AudioMultimodalAi Generated MusicMusic GenerationAudio MlMulti Genre AudioSynthetic+1

0 views

PreviousPage 16 of 129Next