DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,587 datasets

Speech & Audio

Multilingual Synthetic TTS Dataset with 68,677 Clips Across 9 Languages

68,677 synthetic speech clips across 9 languages, generated using the Qwen3-TTS-12Hz-1.7B-Base model with zero-shot voice cloning from 5 reference speakers. The dataset was submitted to the Uncharted Data Challenge hosted by Adaption Labs and is authored by Reubencf. It was last updated on 2026-04-15.

AudioMultilingualText To SpeechSpeech SynthesisVoice CloningLarge ScaleAudio GenerationSynthetic+1

0 views

Speech & Audio

GLODAPv2: North Atlantic Ocean CTD and Bottle Data, 1981

North Atlantic Ocean data from the ATLANTIS II research vessel cruise 31AN19810612, collected between June 12 and July 8, 1981. The dataset contains discrete sample and profile measurements of dissolved oxygen, nitrate, nitrite, phosphate, silicate, salinity, and water temperature, gathered using CTD and bottle instruments. It is part of the GLODAPv2 compilation, contributed by Carl Wunsch of the Massachusetts Institute of Technology.

TabularTime SeriesCtdtmpNorth Atlantic OceanOxygenOceanographyZonal Transect At 36n CruiseProfileAtlantic OceanNitratThetaSilcatCtd ProfilesSalnty31an109 1Discrete MeasurementWater ChemistryPhsphtOcean Carbon And Acidification Data System Ocads P31an19810612Nitrit+1

0 views

Speech & Audio

Fun-ASR-Nano-2512: A Speech Recognition Dataset

Fun-ASR-Nano-2512 is a dataset hosted on Kaggle. Its title suggests it is likely related to automatic speech recognition (ASR). The dataset's specific content, size, and origin are not detailed in the available metadata.

AudioAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Majestrino Unified Detailed Captions: 4.6 Million Audio-Text Pairs

Majestrino Unified Detailed Captions is a filtered subset of the laion/majestrino-data collection, containing all samples with a unified_detailed_caption field. The dataset comprises 4,658,407 samples, packaged in approximately 932 tar files totaling around 1,017 GB. It was created by TTS-AGI and last updated on March 29, 2026.

AudioMultimodalWEBDATASETSpeech TranscriptionSize Categories1 Mn10 MLibrarywebdatasetAudio CaptionsAudio ClassificationModalitytextText Audio PairsLibrarymlcroissantTask Categoriesaudio ClassificationLibrarydatasetsMajestrinoLicensecc By 40RegionusTask Categoriesautomatic Speech RecognitionCaptions+1

0 views

Speech & Audio

Odia Indextts2 Processed: Text-to-Speech Data for an Indian Language

Odia Indextts2 Processed is a dataset uploaded to HuggingFace by author Akira2049. The title suggests it contains processed data for text-to-speech (TTS) tasks in the Odia language, an Indian language spoken primarily in Odisha. The dataset was last updated on 2026-05-27, but specific details on size, format, and content are not provided in the metadata.

TextAudioText To SpeechOdia LanguageSpeech Synthesis+1

0 views

Speech & Audio

Flying Music: Personal Cloud Storage for Audio Files and Metadata

A personal cloud storage repository for synchronizing a local music player. The dataset, created by ZHIWEI666, likely contains music files, cover art, lyrics, and user metadata. It was last updated on May 1, III.

AudioMultimodalPersonal StorageAudio Files+1

0 views

Speech & Audio

Saint Kitts and Nevis: FAPAR Vegetation Health Anomalies via VIIRS

Copernicus provides 10-day composite GEOTIFF files measuring Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) anomalies for Saint Kitts and Nevis. These biophysical measurements, derived from the Visible Infrared Imaging Radiometer Suite (VIIRS), track vegetation health and agricultural drought impacts. The records are updated through March 2026.

EnvironmentDrought+1

0 views

Speech & Audio

Music Teachers on Television: Demographics and Metadata

Teacher demographics and text metadata for Music Teachers on Television. The dataset was authored by Hugh Gundlach and last updated on April 27, 2026. It is hosted on figshare under a CC-BY-4.0 license.

TabularAudioMedia AnalysisMusic TeachersEducationDemographics+1

0 views

Speech & Audio

LibriSpeech8K_100_360: Speech Audio Corpus for ASR

LibriSpeech8K_100_360 is a speech audio dataset published on Kaggle. The title suggests it is derived from the LibriSpeech corpus, likely containing 8,000 audio samples. The specific content, such as speaker count, recording length, and transcription details, requires verification after download.

AudioMachine LearningAudio CorpusSpeech Recognition+1

0 views

Speech & Audio

Saraiki Speech Emotion Recognition Audio Dataset

Original Saraiki Speech Emotion Recognition (SER) audio dataset. The dataset is described as an original collection for the Saraiki language, a language spoken in parts of Pakistan. Specific details on size, collection method, and creator are not provided in the available metadata.

AudioAudio DatasetAffective ComputingSaraiki LanguageSpeech Emotion Recognition+1

0 views

Speech & Audio

Synthetic Medical Speech Dataset for Clinical ASR Fine-Tuning

A synthetic medical speech dataset contains 101,475 audio-text pairs totaling 184.1 hours of 16 kHz mono speech. It was generated by IntelMedica using the Kokoro-82M TTS system with 19 voices across three English accent groups, focusing on clinical and nursing terminology. The dataset version was noted in April 2026.

TextAudioClinical TerminologySpeech SynthesisMedical SpeechHealthcareAsr TrainingSyntheticSynthetic Audio+1

0 views

Speech & Audio

IITM Mono Hindi Female: Single-Speaker Hindi Speech Synthesis Dataset

A monolingual Hindi text-to-speech dataset containing 6,926 utterances from a single female speaker. The audio data is embedded in parquet files at a 48kHz sampling rate and was extracted from the IndicTTS project by SPRINGLab at IIT Madras. The dataset was uploaded to Hugging Face by the user 'somu9'.

AudioHindiText To SpeechSpeech SynthesisMonolingualSingle Speaker+1

0 views

Speech & Audio

LibriSpeechMix 3-Speakers: Development and Test Clean Audio

LibriSpeechMix 3-Speakers is an audio dataset for speech processing tasks. It is hosted on Kaggle and likely contains clean speech recordings from three speakers, split into development and test subsets. The dataset's specific size, license, and creation details are not provided in the available metadata.

AudioMachine LearningAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Cebuano Speech Dataset: 108 Hours of Balanced Audio

The Cebuano Speech Dataset provides 108 hours of audio data across 807 files in MP3 and WAV formats. It was created by Speech-data and includes balanced voice data with 49% female and 51% male speakers aged 18 to 50+ years.

AudioVoice Applications+1

0 views

Speech & Audio

Saint Kitts and Nevis Food Prices and Economic Indicators from FAO

Food Prices for Saint Kitts and Nevis from the FAOSTAT bulk data service. The dataset covers categories including Consumer Price Indices, Deflators, Exchange rates, and Producer Prices. It is published by the Food and Agriculture Organization (FAO) of the United Nations and was last updated on 2026-03-16.

TabularCSVEconomic IndicatorsIndicatorsFood SecurityPrice IndicesFood Prices+1

0 views

Speech & Audio

Korean TTS Training Dataset with 120 Sentences Across Pronunciation and Prosody Categories

120 Korean speech sentences were generated using the Google Gemini gemini-2.5-pro-preview-tts model with the Zephyr voice. The dataset includes categories for pronunciation, prosody, emotion, and intonation. Audio files are in 24kHz, 16-bit, mono WAV format.

AudioSpeech SynthesisKorean LanguageTts TrainingAudio Generation+1

0 views

Speech & Audio

Conflicting Permian Age Data from Western Australia's Canning Basin

U–Pb zircon dating from tuffs in the Canning Basin reveals a 1.7-million-year age conflict between palynological zones. The dataset documents an age of 267.04 ± 0.14 Ma for the Microbaculispora villosa Zone, challenging established biostratigraphic correlations. This research by Mory et al. (2017) presents isotopic and palynological evidence from the middle Permian.

Earth sciencesCanning BasinChemical Abrasion Isotope Dilution Thermal IonisatPalynostratigraphyBiostratigraphyU Pb ZirconPermianLaser Ablation Inductively Coupled Plasma Mass SpeArgon Argon Dating Ar Ar+1

0 views

Speech & Audio

Papa AI Lab Music: AI-Generated Audio Samples

Papa AI Lab Music is a dataset hosted on Kaggle. The dataset's title suggests it contains music or audio data, potentially related to artificial intelligence. Specific details regarding its size, contents, and creation are unavailable from the provided metadata.

AudioMachine LearningAi GeneratedAudio Processing+1

0 views

Speech & Audio

Indian Music Dataset with Spotify Metadata on Artists, Albums, and Genres

Indian Music Dataset contains real Spotify data for Indian songs. The description indicates it includes fields for artist, album, genre, price, and duration. The dataset's author, organization, license, and exact size are unknown.

TabularAudioSpotifyIndian MusicAudio AnalysisMusic Metadata+1

0 views

Speech & Audio

MIX-HI-EN-TTS: Bilingual Multi-Speaker Speech Synthesis Data

SKT AI LABS sorted this multi-speaker bilingual speech synthesizer dataset. The dataset is intended for text-to-speech applications. It was last updated on 2026-05-19.

AudioText To SpeechAi LabsSpeech SynthesisMulti SpeakerBilingual+1

0 views

PreviousPage 32 of 130Next