DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,587 datasets

Speech & Audio

TuniSpeech-21h: 21-Hour Tunisian Arabic Speech Corpus

TuniSpeech-21h is a 21-hour speech corpus designed for Tunisian Arabic (Derja). It was developed by TuniSpeech-AI to address the underrepresentation of this dialect in Automatic Speech Recognition (ASR). The dataset is compiled from social media and broadcast materials, capturing spontaneous speech and diverse linguistic characteristics.

AudioSpeech CorpusDialect SpeechNatural Language ProcessingTunisian ArabicAutomatic Speech Recognition+1

0 views

Speech & Audio

Google WAXAL SNA ASR: Speech Recognition Audio Dataset

Google WAXAL SNA ASR is a dataset hosted on Kaggle. The title suggests it contains audio data for automatic speech recognition tasks. Its specific content, size, and collection details require verification after download.

AudioMachine LearningAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Google WAXAL LIN ASR: Speech Recognition Data

Google WAXAL LIN ASR is a dataset published on Kaggle. Its title suggests it contains audio data for automatic speech recognition tasks. The dataset's specific content, size, and origin require verification after download.

AudioAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Large-Scale English Speech Corpus for Text-to-Speech Training

615,000 hours of English speech audio from 239.7 million segments, aggregated from 11 source datasets. The corpus was constructed by KRAFTON from 8 public speech corpora and web-sourced recordings to train the RAON-OpenTTS model. The dataset page was last updated in April 2026.

AudioMultimodalEnglishParquetText To SpeechTask Categoriestext To SpeechLicenseotherMachine LearningLibrarypolarsLibrarydaskTraining DataLanguageenSize Categories100 Mn1 BModalitytextLibrarymlcroissantSpeech CorpusLibrarydatasetsRegionusLarge ScaleNatural Language ProcessingOpen DataAudio Synthesis+1

0 views

Speech & Audio

LibriSpeech Manifest: Audio Speech Data for ASR

LibriSpeech_Manifest is a dataset hosted on Kaggle. The title suggests it contains audio data, likely related to speech recognition. The dataset's specific size, structure, and origin are not detailed in the provided metadata.

AudioMachine LearningAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Clean Music Audio Samples

Clean musics is a dataset published on Kaggle. Its specific content, size, and origin are not detailed in the available metadata. The dataset likely contains audio files or related metadata focused on music.

AudioClean Audio+1

0 views

Speech & Audio

Elaina Anime Character Japanese Voice Clips

Elaina Wanderingwitch Audio Ja is a collection of Japanese voice audio clips and corresponding text for the anime character Elaina from 'Majo no Tabitabi'. The dataset was created by user 'yeeko' and was last updated in April 2026.

AudioCharacter VoiceSpeech SynthesisAnime VoiceJapanese Audio+1

0 views

Speech & Audio

CYGNSS Satellite Bistatic Radar Measurements for Ocean Surface Monitoring

Delay Doppler Maps (DDMs) calibrated into Power Received and Bistatic Radar Cross Section, collected by the eight-satellite CYGNSS constellation. The dataset includes daily files from up to 8 spacecraft, with a typical latency of approximately 6 days from measurement. This Version 2.1 science-quality release from NASA's POCLOUD supersedes Version 2.0 with improved calibration and coverage.

Time SeriesGeospatialGeophysical MeasurementsSatellite Remote SensingHealthcareEarth ScienceOcean Wind Speed+1

0 views

Speech & Audio

Uzbek Female Text-to-Speech Voice Dataset

A Kaggle-hosted collection of audio recordings for Uzbek text-to-speech synthesis. The dataset likely contains speech samples from a female speaker, intended for training voice models. Specific details on size, recording conditions, and creation date are not provided in the available metadata.

AudioText To SpeechSpeech SynthesisUzbek LanguageFemale Voice+1

0 views

Speech & Audio

Hmong Text-to-Speech Audio Dataset

Hmong Tts Dataset is a speech synthesis resource hosted on HuggingFace. The dataset was uploaded by author BachDo and was last updated on May 20, 2026. Its specific content, size, and structure are not detailed in the available metadata.

AudioText To SpeechSpeech SynthesisHmong LanguageLow Resource Language+1

0 views

Speech & Audio

English Speech Audio Dataset with 1 Million Hours

1 million hours of English audio-text data was collected from the public internet by AllenAI. The dataset includes a variety of speaking styles, accents, and audio setups, supporting the training of the OLMoASR speech recognition models.

JSONLibrarypolarsSize Categories1 Mn10 MModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasRegionusArxiv250820869Licenseodc By+1

0 views

Speech & Audio

Music Streaming Habits of 4,000 Listeners Across Platforms and Genres

4,000 listeners provide data on their music streaming habits, including platform usage, genre preferences, and listening moods. The dataset likely contains behavioral metrics such as skip rates and total listening minutes. It was sourced from Kaggle, but its author, collection date, and specific time range are unknown.

TabularAudioListening MinutesListener BehaviorSkip RatesMusic Streaming+1

0 views

Speech & Audio

Tadabur: 100,000+ Annotated Qur'anic Recitation Audio Records

Tadabur provides between 100,000 and 1,000,000 Qur'anic recitation audio records for Arabic speech research, released by Faisal Alherran in 2026. The collection supports specialized tasks such as tajwīd-aware speech processing and reciter modeling across diverse vocal styles.

ArabicParquetLibrarypolarsLanguagearLibrarydaskQuranArabic SpeechModalitytextSize Categories100 Kn1 MModalitytabularLibrarymlcroissantTask Categoriesaudio ClassificationLibrarydatasetsLicensecc By Nc 40RegionusTask Categoriesautomatic Speech RecognitionSpeech Recognition+1

0 views

Speech & Audio

Google WAXAL ASR Challenge: Speech Recognition Competition Data

Kaggle hosts the Google WAXAL ASR Challenge dataset. The title suggests it contains audio data for an automatic speech recognition competition. The dataset's specific content, size, and origin require verification after download.

AudioMachine Learning ChallengeAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Persian ASR Audio Text Corpus: 2.69 Million Utterances

Persian-language audio and corresponding text data, likely for automatic speech recognition tasks. The dataset contains approximately 2.69 million entries and was published by Reza2kn on Hugging Face. It was last updated on May 12, 2026.

TextAudioMultilingualPersian LanguageAudio TextSpeech Recognition+1

0 views

Speech & Audio

Voice Acting Pipeline Output: Synthetic Emotional Speech with Perceptual Scores

Voice Acting Pipeline Output is a synthetic emotional speech dataset generated by an automated, multi-GPU system. Each sample consists of 6 audio generations from a consistent speaker, scored across 59 perceptual dimensions by Empathic Insight Voice+. The dataset was created by TTS-AGI and was last updated on March 31, 2026.

AudioProsodySpeaker IdentityEmotional TtsSynthetic SpeechAudio GenerationSynthetic+1

0 views

Speech & Audio

Google WAXAL LUG ASR: Speech Recognition Audio Data

Google WAXAL LUG ASR is a dataset hosted on Kaggle. The title suggests it contains audio data for automatic speech recognition (ASR). The specific content, scale, and collection details are not provided in the available metadata.

AudioGoogleWaxalAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Bolbosh: Kashmiri Text-to-Speech Corpus

A Text-to-Speech corpus for the Kashmiri language, derived from the IndicVoices-R and RASA speech datasets. It was created by GAASH-Lab and used to develop the Bolbosh neural TTS system, as documented in a 2026 paper.

AudioMultilingualText To SpeechIndic VoicesSpeech CorpusKashmiri LanguageNatural Language Processing+1

0 views

Speech & Audio

MOSS-TTS v1.5: Fused Text-to-Speech Model for SGLang

OpenMOSS fused MOSS-TTS Delay and MOSS-Audio-Tokenizer models for the SGLang framework. The dataset likely contains model components for generating speech from text. Specific details on size, format, and licensing are not provided in the input metadata.

AudioMultimodalText To SpeechSpeech SynthesisLanguage ModelAudio Generation+1

0 views

Speech & Audio

MOSS-TTSD v1.0: Fused Text-to-Speech and Audio Tokenizer Model

OpenMOSS fused the MOSS-TTSD and MOSS-Audio-Tokenizer models for use with the SGLang framework. The dataset likely contains audio data and corresponding text or tokenized representations for speech synthesis tasks. Specific details on size, format, and creation date are not provided in the available metadata.

AudioMultimodalText To SpeechSpeech SynthesisLanguage ModelAudio Generation+1

0 views

PreviousPage 34 of 130Next