DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,585 datasets

Speech & Audio

YBoBp: Multimodal EEG, Movement, and Audio Dataset

A 2.4 GB multimodal dataset integrates EEG brain activity data with expressive movement and music audio. It was created with support from the National Science Foundation, NIH, and the University of Texas MD Anderson Cancer Center. The dataset is structured for research on the coupling of neurological signals, physical motion, and auditory stimuli.

AudioMultimodalMultimodal DatasetBrain ActivityMo BiEegExpressive Movement+1

0 views

Speech & Audio

MASC Arabic: 1,000 Hours of Multi-Dialect Speech from YouTube

1,000 hours of Arabic speech audio sampled at 16 kHz, crawled from over 700 YouTube channels. The MASC dataset is multi-regional, multi-genre, and multi-dialect, created by MohamedRashad and last updated in April 2026. It is intended to advance research and development in Arabic speech technology.

AudioAudio DatasetArabic SpeechMulti DialectMulti RegionalSpeech Recognition+1

0 views

Speech & Audio

Stoddart Mattson Kibbey Isopach and Lithofacies GIS Polygons

Geospatial polygon features represent isopach and lithofacies data for Carboniferous strata in the Western Canada Sedimentary Basin. The shapefiles were created by the Alberta Geological Survey from mid-1990s digital files and edited in 2005-2006. This dataset is part of the Geological Atlas of the Western Canada Sedimentary Basin.

GeospatialIsopachCarboniferousGeologySedimentary BasinCarboniferous StrataLithofacies+1

0 views

Speech & Audio

Stoddart-Mattson-Kibbey Carboniferous Isopach and Lithofacies Map

Stoddart/Mattson/Kibbey Isopach and Lithofacies GIS data contains line features representing subsurface geological formations from the Carboniferous period in the Western Canada Sedimentary Basin. The dataset was created by the Alberta Geological Survey from archived digital files produced in the mid-1990s and later edited in 2005-06. It is part of the Geological Atlas of the Western Canada Sedimentary Basin, specifically Chapter 14, Figure 35.

GeospatialIsopachCarboniferousGeologySedimentary BasinLithofacies+1

0 views

Speech & Audio

Ng'akarimojong Speech Dataset with Auto-Generated Transcripts

Ng'akarimojong (kdj), an Eastern Nilotic language with approximately 370,000 speakers in Karamoja, Uganda, is the focus of this speech dataset. It was created by Speedykom using GRN recordings segmented via silence detection. Audio files are in WAV format at 16 kHz mono, paired with UTF-8 transcripts auto-generated by the facebook/mms-1b-all model with a Teso adapter.

TextAudioAfrican LanguagesLow Resource LanguageSpeech RecognitionSynthetic+1

0 views

Speech & Audio

BASS: Benchmark for Audio Language Models on Music Structure and Semantics

BASS is a benchmark dataset for evaluating music understanding and reasoning in audio language models. It comprises 2,658 questions across 12 tasks and 4 categories, covering 1,993 unique songs and over 138 hours of music. The dataset was created by author 'oreva' and last updated on 2026-04-08.

AudioMultimodalMusic AnalysisMusic UnderstandingBenchmarkAudio Language Models+1

0 views

Speech & Audio

Blackcofferttsa: Text Extraction and NLP Analysis Case Study

A case study dataset for text extraction and NLP analysis. The dataset likely contains textual data for analysis tasks. Its origin, size, and specific contents are not detailed in the provided metadata.

TextText ExtractionNatural Language ProcessingCase StudyNlp Analysis+1

0 views

Speech & Audio

Nepali TTS Tagged Indic: Speech Synthesis Dataset

A dataset for text-to-speech (TTS) in the Nepali language, likely containing audio samples and corresponding text transcriptions. It was published on the HuggingFace platform by an author named Titung. The dataset listing was last updated on June 4, 2026.

TextAudioText To SpeechSpeech SynthesisNepali+1

0 views

Speech & Audio

Urdu Text-to-Speech Processed Data

A dataset titled 'urdu-tts-processed' is hosted on Kaggle. The dataset likely contains processed audio and text data for Urdu language speech synthesis. Metadata is minimal; the specific content, scale, and creation details require verification after download.

TextAudioText To SpeechSpeech SynthesisUrdu LanguageAudio Processing+1

0 views

Speech & Audio

ASR Custom: Speech Data for Automatic Speech Recognition

A dataset for Automatic Speech Recognition (ASR) tasks, published on Kaggle. The dataset's specific content, size, and origin are not detailed in the available metadata. Its intended use is likely for training or evaluating custom speech recognition models.

AudioSpeech DataAudio ProcessingAutomatic Speech Recognition+1

0 views

Speech & Audio

Music Extract Wav2Vec2 Features

Music Extract Wav2Vec2 Features is a dataset published on Kaggle. The title suggests it contains extracted features from audio files, likely using the Wav2Vec2 model. Metadata is minimal; actual content requires verification after download.

AudioMachine LearningWav2vec2Audio Features+1

0 views

Speech & Audio

Voice Annotation Data V2: 18,632 Audio Samples Across 58 Voice Dimensions

Voice Annotation Data v2 is a curated dataset of 18,632 audio samples, comprising 9,391 positive and 9,241 negative examples across 58 voice dimensions. The dataset was created by TTS-AGI and was last updated on April 15, 2026. Each dimension includes up to 25 positive examples of audio that fits the category and 25 negative examples confirmed not to fit, using the Gemini 2.0 Flash model.

AudioText To SpeechAudio ClassificationSpeech ProcessingVoice Annotation+1

0 views

Speech & Audio

Music Arena: Text-to-Music Model Evaluation Dataset

Music Arena provides an open platform dataset for evaluating text-to-music models. The dataset is hosted on Hugging Face by the music-arena organization and was last updated in April 2026. It likely contains audio files and associated metadata for benchmarking AI music generation systems.

AudioMultimodalAudio DatasetAi EvaluationMusic GenerationText To Music+1

0 views

Speech & Audio

TOSD: Tamazight Open Speech Dataset for ASR and TTS

A parsed and formatted voice dataset containing recordings and text transcripts in Standard Moroccan Amazigh. The dataset is intended for training Automatic Speech Recognition and Text-to-Speech models and was published by the Tamazight-NLP organization. The dataset page was last updated on April 21, 2026.

TextAudioText To SpeechAudio DatasetNatural Language ProcessingAmazigh LanguageSpeech Recognition+1

0 views

Speech & Audio

Google WAXAL ASR Challenge: Luganda Speech Recognition Dataset

A speech recognition dataset associated with the Google WAXAL ASR Challenge. The title suggests it contains audio data for the Luganda language. It is hosted on the Kaggle platform, but detailed metadata about its size, format, and creation is unavailable.

AudioMachine Learning ChallengeLuganda LanguageAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

waxalasrchallengeWHL: Audio Speech Recognition Challenge Dataset

A Kaggle challenge dataset likely containing audio data for speech recognition tasks. The dataset's specific content, size, and origin are not detailed in the available metadata. Further details such as the number of samples, recording conditions, and annotation specifics require verification after download.

AudioChallengeWhlAudio ChallengeSpeech Recognition+1

0 views

Speech & Audio

LibriSpeech-Mixtures: Speech Audio Mixtures for Source Separation

LibriSpeech-Mixtures is an audio dataset hosted on Kaggle, likely derived from the LibriSpeech corpus. The dataset appears to contain mixtures of speech signals, which are commonly used for tasks like source separation. Specific details on the number of files, duration, and creation methodology are not provided in the available metadata.

AudioSpeech MixturesAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

X Voice Dataset Train: A Multilingual Speech Corpus for Model Training

XRXRX aggregated this multilingual speech dataset from seven distinct sources, including Multilingual LibriSpeech, VoxPopuli, and GigaSpeech 2. The collection was last updated on April 12, 2026. Each constituent dataset retains its own license, with most permitting commercial use.

AudioMultilingualTraining+1

0 views

Speech & Audio

SilentWear: Multi-Session EMG Data for Silent Speech Recognition

SilentWear provides surface electromyography (EMG) data recorded from a wearable neckband for both vocalized and silent speech. The dataset, created by PulpBio and last updated in April 2026, is designed to support research in ultra-low-power wearable AI systems. It likely contains multi-session recordings intended for decoding speech from muscle signals.

AudioMultimodalWearable AiElectromyographyHuman Machine InteractionSilent SpeechAssistive Technology+1

0 views

Speech & Audio

Filipino Tagalog Speech Recordings With Age And Gender Distribution

75 hours of Filipino Tagalog speech audio across 639 files, provided in MP3 and WAV formats. The dataset was created by Speech-data and was last updated in March 2026. It contains recordings from a speaker pool that is 55% male and 45% female, with ages ranging from 18 to over 50 years.

AudioFilipino LanguageAudio DataSpeech RecognitionVoice Technology+1

0 views

PreviousPage 28 of 130Next