DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,579 datasets

Speech & Audio

Munch 1 Latent New Parquet: Precomputed Urdu TTS Latent Representations

51,021 pre-computed latent representations for Urdu utterances, designed to bypass audio decoding during TTS model training. The latents are derived from the Humair332/Urdu-munch-1 audio source using the Aratako/Semantic-DACVAE-Japanese-32dim codec at a 25 Hz frame rate. Author zuhri025 uploaded this dataset to Hugging Face in April 2026.

TabularAudioText To SpeechUrdu SpeechLatent RepresentationsAudio Processing+1

0 views

Speech & Audio

Saint Kitts and Nevis Greenhouse Gas Emissions by Sector (2015-2026)

Climate TRACE provides annual and monthly greenhouse gas and air pollutant emission estimates for Saint Kitts and Nevis starting from 2015. The inventory covers country-level aggregates by sub-sector and gas, alongside source-level monthly data and confidence scores beginning in 2021.

Climate WeatherPoints Of Interest PoiEnvironment+1

0 views

Speech & Audio

TTS Corpus: A Text-to-Speech Dataset

A text-to-speech corpus authored by Nishchal-29 and hosted on HuggingFace. The dataset was last updated on 2026-06-18. Its specific content, size, and structure are not detailed in the provided metadata.

AudioText To SpeechSpeech SynthesisNatural Language ProcessingAudio Corpus+1

0 views

Speech & Audio

CYGNSS Level 1: Satellite Radar Maps for Ocean Surface Monitoring

Version 3.1 geo-located Delay Doppler Maps from the CYGNSS satellite constellation provide calibrated ocean surface scattering measurements. At most, 8 netCDF files are generated daily, typically from 6-8 spacecraft, with a latency of approximately 6 days from measurement. This dataset, produced by NASA, supersedes Version 3.0 with improved antenna gain patterns and corrections for radio frequency interference.

Time SeriesGeospatialRadarOceanographyWeatherSatellite Remote SensingHealthcareEarth Science Radar Spectral Engineering Radar CroEarth Science Radar Spectral Engineering Radar RefEarth Science+1

0 views

Speech & Audio

CYGNSS Level 1: Satellite Radar Cross Section Maps for Ocean Surface Monitoring

Version 3.0 geo-located Delay Doppler Maps (DDMs) calibrated into Power Received and Bistatic Radar Cross Section from the CYGNSS satellite constellation. The dataset includes other scientific parameters like Normalized BRCS, Delay Doppler Map Average, and Leading Edge Slope, plus quality flags, error estimates, and geolocation parameters. NASA provides up to 8 netCDF files daily, with a latency of approximately 6 days from the last measurement.

Time SeriesGeospatialRadarOceanographySatellite Remote SensingHealthcareEarth Science Radar Spectral Engineering Radar CroEarth Science Radar Spectral Engineering Radar RefEarth Science+1

0 views

Speech & Audio

Black Myth Wukong Player Survey: Music Cognition and Purchase Intent

561 players of the 2024 video game Black Myth Wukong provided feedback for a study on music's influence on consumer behavior. The dataset supports analysis of how music cognition, immersion, and emotional arousal mediate purchase intentions, based on the Stimuli-Organism-Response (S-O-R) theory. It was analyzed using Partial Least Squares Structural Equation Modeling (PLS-SEM) to examine the role of gamification design and cultural confidence.

TabularAudioGame Music Music Cognition Purchase Intention ImmeS O R TheoryGame MusicPlayer BehaviorMusic CognitionPurchase Intention+1

0 views

Speech & Audio

Sonata in G Major, Rondo Movement, from Berkeley Manuscript 793

A PDF of the third movement, a Rondo andante, from Sonata 1 in G major for keyboard, violin, and cello, as found in Berkeley manuscript 793. The movement is described as a set of variations on a theme, likely with repeated thematic episodes. The dataset was authored by Matthew James Zenas Dicken and last updated on 2026-04 13.

TextAudioClassical MusicMusic AnalysisChamber MusicMusical Scores+1

0 views

Speech & Audio

Berkeley Ms 794 Symphonia in G Major, Allegro Movement

A PDF musical score for a symphonia in G major, sketched out in four parts. The score is part of Berkeley Ms 794 and represents the Allegro movement. It was authored by Matthew James Zenas Dicken and published on figshare in April 2026.

TextAudioClassical MusicSymphoniaChamber MusicMusical Score+1

0 views

Speech & Audio

Adaption Music Style Prompts for Audio Synthesis Across Genres

A remastered version of Reubencf/fma-labeled prepared using Adaption's Adaptive Data platform contains descriptive text prompts designed to generate diverse musical tracks. The prompts detail instrumentation, rhythmic patterns, atmospheric qualities, and emotional tones across genres like pop, techno, ambient, and rock. Author Reubencf last updated the dataset on 2026-04-24.

TextAudioText PromptsMusic GenerationGenre DescriptionsAudio Synthesis+1

0 views

Speech & Audio

Perforated Tritia Shells from El Mnasra Cave Archaeological Site

272 perforated shells of Tritia cf. gibbosula from US 8 of El Mnasra cave, compared with specimens from Djerba and Taforalt. The dataset categorizes shell perforation types and conditions for archaeological analysis. It was authored by Emilie Campmas and shared under a CC BY 4.0 license.

TabularExcelMollusksArchaeologyShell PerforationPaleontology+1

0 views

Speech & Audio

Steam Games Dataset with Over 120,000 Published Titles

Information on more than 120,000 games published on Steam, the largest PC gaming platform. The dataset was created by Fronkon Games using code and APIs from Steam and Steam Spy, and is maintained by user zjgeritz. It was last updated on April 13, 2026.

TabularAudioGame MetadataSteamGaming PlatformVideo Games+1

0 views

Speech & Audio

AbyatSpeech: Arabic Poetry Audio Recordings for Meter Classification

3,805 annotated audio recordings of classical Arabic poetry verses, totaling approximately 9 hours of data. The dataset was created by Dr. Abdul Kareem Saleh Al-Zahrani and published via Harvard Dataverse, with a last update in April 2026. Each sample is a single verse labeled according to one of 16 canonical Arabic poetic meters.

AudioAudio DatasetArabic SpeechComputational LinguisticsPoetry RecognitionMeter Classification+1

0 views

Speech & Audio

Saint Kitts and Nevis: IBTrACS Tropical Cyclone Best-Track Data

Offering unified tropical cyclone best-track data for Saint Kitts and Nevis, merging historical and recent records from multiple meteorological agencies via the IBTrACS project. It contains storm identifiers, temporal data, and physical parameters such as wind speed and central pressure. The data is maintained by HDX and was last updated in March 2026.

Cyclones Hurricanes Typhoons+1

0 views

Speech & Audio

ASR Codeswitched: Speech Data for Multilingual Recognition

ServiceNow-AI published this dataset on huggingface on 2026-06-09. The title suggests it contains audio and text data for training automatic speech recognition systems that handle code-switching. The specific content, size, and structure require verification after download.

TextAudioCode SwitchingSpeech CorpusAutomatic Speech Recognition+1

0 views

Speech & Audio

Roadian–Wordian Permian Zircon and Palynology Ages from Canning Basin, Australia

A 2017 study presents U–Pb zircon dating and palynological data from the middle Permian Canning Basin in Western Australia. The data reveals an apparent age conflict of 1.7 million years between tuffs in non-marine and marginal-marine facies, challenging established spore-pollen zonation. The dataset is associated with Geoscience Australia and the cited research article.

Tabular🇦🇺 AustraliaGeochronologyPalynologyStratigraphyLarge ScalePermianZircon Dating+1

0 views

Speech & Audio

Kimi-K2.6: Per-Layer Expert Routing and Activation Statistics from REAP Calibration

REAP observer output captures per-token routing decisions and expert activation norms for every MoE layer in the moonshotai/Kimi-K2.6 model. The dataset, authored by 0xSero, contains the results of a full calibration pass, providing saliency ingredients for analysis. It was last updated on April 23, -2026.

TabularMixture Of ExpertsModel RoutingNeural Network AnalysisActivation Statistics+1

0 views

Speech & Audio

MISHRON: Bangla-English Code-Mixed Emotional Short Speech

A speech dataset contains Bangla-English code-mixed utterances with emotional labels. The dataset was uploaded to Kaggle, but the author, organization, and specific collection details are not provided. The total number of audio clips, recording dates, and other metadata are unknown.

AudioSpeech EmotionAudio ClassificationCode MixingBengali English+1

0 views

Speech & Audio

WDPCA: Protected and Conserved Areas of Saint Kitts and Nevis

Geospatial boundaries and metadata for marine and terrestrial protected areas and Other Effective Area-based Conservation Measures (OECMs) in Saint Kitts and Nevis. Maintained by the UNEP-WCMC as part of the Protected Planet Initiative, this data is updated on a monthly basis to support international biodiversity reporting. It serves as a primary source for tracking progress toward the Kunming-Montreal Global Biodiversity Framework.

GeodataEnvironment+1

0 views

Speech & Audio

Sindhi Alphabet Audio Recordings, Large-Scale and High-Quality

A large-scale, high-quality audio dataset of Sindhi alphabet recordings. The dataset is hosted on Kaggle, but specific details about its creator, size, and structure are not provided. Its primary purpose appears to be for speech and audio processing tasks related to the Sindhi language.

AudioAudio DatasetSindhi LanguageLarge ScaleAlphabetSpeech Recognition+1

0 views

Speech & Audio

YodaLingua-Russian: 192 Hours of Russian Speech for TTS and ASR

67,482 audio-transcription pairs totaling 192 hours of Russian speech, contributed by 2,611 distinct speakers. The dataset is designed for training text-to-speech and automatic speech recognition systems. It was created by Thomcles and last updated on 2026-04-20.

TextAudioMultilingualRussian LanguageSpeech SynthesisMultilingual SpeechSpeech RecognitionAudio Text Pairs+1

0 views

PreviousPage 23 of 129Next