DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

Speech Recognition Audio Data for Model Training

Speech recognition data published on Kaggle. The dataset's specific content, scale, and origin are not detailed in the available metadata. Further inspection after download is required to confirm the actual audio files, transcripts, and recording conditions.

AudioMachine LearningAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Sam Wake Word: Audio Dataset for Keyword Spotting

Sam Wake Word is a dataset uploaded to Hugging Face by author sh1vam10. The dataset's platform tags indicate it contains audio and text modalities, likely for wake word or keyword spotting tasks. It was last updated on March 20, 2026.

TextAudioOPTIMIZED-PARQUETParquetLibrarypolarsModalityaudioAudio ClassificationSize Categoriesn1 KModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasKeyword SpottingRegionusSpeech RecognitionLicensemitWake Word+1

0 views

Speech & Audio

Throat And Acoustic Paired Speech Recordings From 60 Korean Speakers

TAPS: Throat and Acoustic Paired Speech Dataset is a standardized corpus for deep learning-based speech enhancement, specifically targeting throat microphone recordings. The dataset provides paired recordings from 60 native Korean speakers, designed to address the high-frequency attenuation in throat mics caused by the low-pass filtering effect of skin and tissue.

ParquetSize Categories1 Kn10 KLibrarypolarsLibrarydaskModalitytextLibrarymlcroissantLibrarydatasetsLicensecc By 40Regionus+1

0 views

Speech & Audio

Music Files Collection

Kaggle hosts a dataset titled 'music_file'. The dataset likely contains audio files related to music. Metadata is minimal; the specific content, scale, and origin require verification after download.

AudioAudio Files+1

0 views

Speech & Audio

Hindi Speech Utterances from India

Audio recordings of general utterances feature Hindi speakers from India. The dataset's size, collection date, and creator are not specified in the provided metadata. It is hosted on the Kaggle platform.

Audio🇮🇳 IndiaHindiUtterances+1

0 views

Speech & Audio

French Medical Call Center Speech Recordings

A collection of French-language audio recordings from a medical call center. The dataset is hosted on Kaggle and is intended for speech processing tasks in the healthcare sector. Specific details on size, creation date, and authorship are not provided.

AudioCall CenterMedical SpeechHealthcareFrench LanguageAudio Processing+1

0 views

Speech & Audio

Contextual ASR Benchmark: Synthetic Voice Bot Data for 10 Indic Languages

Sarvam AI developed this synthetic benchmark in 2026 to evaluate context-aware Automatic Speech Recognition (ASR) within voice bot environments. The collection includes between 1,000 and 10,000 records covering the top 10 Indian languages, focusing on how conversation history and agent prompts influence transcription accuracy.

OPTIMIZED-PARQUETParquetSize Categories1 Kn10 KLibrarypolarsModalityaudioModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasRegionus+1

0 views

Speech & Audio

Spanish Speech Utterances from Spain

An audio dataset likely contains general utterances spoken by Spanish speakers from Spain. The dataset's size, specific content, and creation details are unknown. It is hosted on Kaggle.

AudioUtterancesSpanish-language+1

0 views

Speech & Audio

Persian Farsi Narration Dataset for Text-to-Speech Model Training

A high-quality, single-speaker Persian (Farsi) narration dataset intended for training text-to-speech models. The dataset was created by author pymmdrza and was last updated on January 21, 2026. The description emphasizes professional narration quality for TTS applications.

AudioText To SpeechSpeech SynthesisPersian LanguageSingle Speaker+1

0 views

Speech & Audio

Updated Hate-Speech Dataset for Text Classification

Updated Hate-Speech Dataset is a text corpus likely containing social media posts or comments annotated for offensive language. The dataset is hosted on Kaggle, but its specific size, origin, and update details are not provided in the metadata. Columns and sample data are unknown, requiring verification after download to confirm content and structure.

TextAudioSocial MediaText ClassificationHate SpeechNatural Language Processing+1

0 views

Speech & Audio

TrainXttsV2_Audiobook: Audio Data for Text-to-Speech Training

An audio dataset named TrainXttsV2_Audiobook, likely containing speech recordings for text-to-speech model development. The dataset is hosted on Kaggle, but its specific size, creator, and update date are unknown. Columns and sample data are unavailable, so the exact content requires verification after download.

TextAudioText To SpeechSpeech SynthesisAudiobook+1

0 views

Speech & Audio

Tts Polish Nemo: Polish Speech Synthesis Audio Data

Tts Polish Nemo is a dataset for text-to-speech synthesis, published on HuggingFace by datadriven-company. The dataset was last updated on March 13, 2026. Its specific content and scale require verification after download.

AudioText To SpeechSpeech SynthesisPolish LanguageAudio Generation+1

0 views

Speech & Audio

GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_75_EDM-2_001

A dataset from the OpenML platform with an identifier suggesting it relates to genetic heterogeneity modeling. No concrete details on size, content, or structure are available from the provided input.

0 views

Speech & Audio

GAMETES_Epistasis_2-Way_1000atts_0.4H_EDM-1_EDM-1_1

A dataset from the GAMETES repository for generating epistasis models. The specific attributes, sample size, and data structure are unknown.

0 views

Speech & Audio

BIGOS V2: Benchmark for Polish Automatic Speech Recognition Systems

BIGOS (Benchmark Intended Grouping of Open Speech) is a collection of openly available Polish speech corpora. Its goal is to simplify access to these resources and enable systematic benchmarking of open and commercial Polish automatic speech recognition (ASR) systems. The dataset was created by amu-cai and was last updated on 2026-02-18.

AudioSpeech CorpusBenchmarkPolish LanguageSpeech Recognition+1

0 views

Speech & Audio

Fasrtyuj: Dataset from Kaggle

A dataset titled 'fasrtyuj' is available on the Kaggle platform. The dataset's content, structure, and origin are not described in the provided metadata. Further details about its creation, size, and specific contents require verification after download.

TabularUnknown DomainTitle Derived+1

0 views

Speech & Audio

MusicOne: Audio Data for Music Analysis

MusicOne is a dataset hosted on Kaggle. Its title suggests a focus on music-related information. The dataset's specific content, scale, and origin require verification after download due to minimal provided metadata.

AudioMusic Information Retrieval+1

0 views

Speech & Audio

MusicSecond: Audio Data for Music Analysis

MusicSecond is a dataset hosted on Kaggle. Its title suggests it contains audio data related to music. The dataset's specific content, size, and origin are not detailed in the available metadata.

Audio+1

0 views

Speech & Audio

MusicThree: Audio Data for Music Analysis

A dataset titled 'MusicThree' published on Kaggle. The dataset's content likely relates to music, but specific details such as size, format, and creation date are unavailable. Metadata is minimal; actual content requires verification after download.

Audio+1

0 views

Speech & Audio

Kazakh Speech Dataset: ~726 Hours of Audio for ASR and TTS

Kazakh language speech data comprising approximately 726 hours of audio in FLAC format at a 16kHz sampling rate. The dataset is designed to support Automatic Speech Recognition and Text-to-Speech system development. It is an open-source corpus created by Flamme-VRM and was last updated on Hugging Face in January 2026.

AudioText To SpeechSpeech DataLarge ScaleNatural Language ProcessingAudio CorpusKazakh LanguageSpeech RecognitionAutomatic Speech Recognition+1

0 views

PreviousPage 71 of 130Next