DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

Dutch General Utterances Speech Recordings from Belgium

Belgian Dutch speakers contributed to this audio dataset of general utterances. The dataset is hosted on Kaggle, but details on the number of speakers, recording length, and collection methodology are not provided. The author, organization, and license information are also unknown.

AudioAudio DatasetDutch LanguageSpeech RecognitionBelgium+1

0 views

Speech & Audio

Water Audio Dataset

Kaggle hosts an audio dataset focused on water-related sounds. The dataset likely contains recordings of water in various contexts, such as flowing, dripping, or splashing. Metadata is minimal; the exact content, scale, and collection details require verification after download.

AudioEnvironmental SoundWater+1

0 views

Speech & Audio

TVSpeech: Thai Video Speech Recognition Benchmark with 570 Utterances

TVSpeech is a Thai speech recognition benchmark dataset designed as a Robustness Track for evaluating ASR models. It contains 570 utterances totaling 3.75 hours of audio curated from diverse public YouTube channels under the Creative Commons Attribution license. The dataset was created by typhoon-ai and last updated on 2026-01-21.

AudioBenchmarkSpeech RecognitionThai LanguageAudio BenchmarkRobustness Evaluation+1

0 views

Speech & Audio

700 Hours of Hindi English Hinglish TTS Audio

700 hours of processed speech data for Hindi, English, and Hinglish (code-mixed) text-to-speech applications. The dataset, created by adjaysagar, includes train and validation manifests and a preprocessing script. It was last updated in February 2026.

RegionusLicenseapache 20+1

0 views

Speech & Audio

TTS Dutch: Text-to-Speech Audio Samples for Dutch Language

Tts Dutch is a dataset hosted on HuggingFace by datadriven-company. The dataset was last updated on March 11, 2026. Its specific content and scale are not described in the provided metadata.

AudioText To SpeechSpeech SynthesisDutch LanguageAudio Generation+1

0 views

Speech & Audio

Crowdsourced Respiratory Audio for Asthma and Healthy Subjects

Crowdsourced audio recordings of respiratory sounds filtered to include only Asthma and Healthy subjects. The dataset is hosted on Kaggle and is intended for binary classification tasks. Details on the number of samples, recording specifics, and collection methodology are not provided in the available metadata.

TabularAudioBinary ClassificationRespiratoryRespiratory AudioAsthmaHealthMedical+1

0 views

Speech & Audio

Japanese TTS Model Robustness Benchmark

J-HARD-TTS-Eval is a benchmark dataset for evaluating autoregressive Japanese Text-To-Speech models. It focuses on specific failure modes including stability in short sequences, repetition handling, and context completion. The dataset was created by Parakeet-Inc and last updated in January 2026.

OPTIMIZED-PARQUETParquetTask Categoriestext To SpeechLibrarypolarsModalityaudioSize Categoriesn1 KModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasRegionusLanguagejaLicenseapache 20+1

0 views

Speech & Audio

StrikerData: Audio Dataset for Speech and Noise Research

StrikerData is an audio dataset containing human speech, environmental noise, and other sound types. It was developed by Strikersoft for research and development in audio and speech technologies. The dataset was last updated on January 22, —.

AudioNoiseEnvironmental Sounds+1

0 views

Speech & Audio

Tts Robustness Benchmark

7 stress-test categories of evaluation samples designed for calculating domain-wise Character Error Rate (CER) scores. The dataset contains unique sentence-language pairs to ensure clean metrics for Text-to-Speech (TTS) robustness testing.

OPTIMIZED-PARQUETParquetTask Categoriestext To SpeechLicenseotherLibrarypolarsLanguageenLanguagebnLanguagemlLanguagehiSize Categoriesn1 KModalitytextLibrarydatasetsLibrarypandasLanguageteLanguageodLanguageknLanguagepaLanguagemrLanguagetaLanguagegu+1

0 views

Speech & Audio

Test Music Data

Test_Music is a dataset hosted on Kaggle. The dataset's specific content, size, and origin are not detailed in the available metadata. Further details about the data's creation, scope, and structure require verification after download.

AudioTest Data+1

0 views

Speech & Audio

Music Model H5: Audio Data for Machine Learning

An audio dataset titled 'music-model-h5' is hosted on Kaggle. The dataset's specific content, size, and structure are not detailed in the provided metadata. Its platform tags suggest it is related to machine learning and audio processing.

AudioMachine LearningAudio ModelAudio Processing+1

0 views

Speech & Audio

DailyTalkEdit: Speech Audio with Semantic Influence Annotations

DailyTalkEdit provides paired original and modified audio files from dialogues, with annotations for modified time ranges and semantic influence. The dataset, created by wsntxxn, was last updated on Hugging Face in February 2026. It includes separate audio segments for modified utterances and structured metadata files for training, validation, and testing splits.

AudioSemantic AnnotationSpeech EditingDialogue AudioAudio Modification+1

0 views

Speech & Audio

Bangla Hate Speech Dataset for Text Classification

A text dataset containing Bengali language content, likely annotated for hate speech detection. It is hosted on the Kaggle platform. The dataset's author, size, and specific annotation schema are not provided in the available metadata.

TextAudioBengali LanguageText ClassificationHate SpeechNatural Language Processing+1

0 views

Speech & Audio

GAMETES Heterogeneity Dataset with 20 Attributes and 0.4 Heritability

A GAMETES dataset for evaluating genetic association methods, focusing on heterogeneity. The dataset name indicates it contains 20 attributes and has a heritability parameter of 0.4.

0 views

Speech & Audio

RTTS_COCO: Road Traffic and Transportation Scene Images

RTTS_COCO is a dataset hosted on Kaggle. The title suggests it contains images related to road traffic and transportation scenes, likely formatted in the COCO annotation style. Its specific contents, scale, and origin require verification after download.

ImageComputer VisionObject DetectionImage Annotation+1

0 views

Speech & Audio

NASA ASRS: Aviation Safety Incident Reports (2005–2025)

NASA ASRS raw batch export from the DFOnline system, intended as input for an ETL pipeline. The dataset covers aviation safety reports submitted over a 20-year period from 2005 to 2025. Its specific contents, such as report narratives or coded fields, must be inferred from the source system.

TabularIncident ReportingNasaTabular DataAviation Safety+1

0 views

Speech & Audio

Indic TTS Checkpoint Session3: Text-to-Speech Model Weights

Indic TTS Checkpoint Session3 is a dataset published on Kaggle. The title suggests it contains model checkpoint files for a text-to-speech system focused on Indic languages. The dataset's specific content, size, and structure require verification after download due to minimal provided metadata.

AudioText To SpeechSpeech SynthesisCheckpoint+1

0 views

Speech & Audio

Indic TTS Merged Arrow: Text-to-Speech Data for Indic Languages

Indic TTS Merged Arrow is a dataset published on Kaggle. The title suggests it contains data for text-to-speech synthesis, likely for languages from the Indian subcontinent. Metadata is minimal; the actual content, scale, and structure require verification after download.

TabularAudioText To SpeechAudio DataSpeech Synthesis+1

0 views

Speech & Audio

100 Japanese Female Voice Clones from ITA-Corpus Emotion

100 female voice audio clips generated using Qwen3-TTS, based on the public-domain ITA-Corpus Emotion text dataset. The audio is provided in 24kHz mono WAV format, with each voice having a descriptive label.

RegionusLicensemit+1

0 views

Speech & Audio

Real World Noise and Music Audio Samples

An audio dataset titled 'Real World Noise/Music' is hosted on Kaggle. The dataset likely contains recordings of environmental noise and music for analysis. Metadata such as column details, size, and license are currently unknown.

AudioNoise AnalysisAudio ClassificationMusic AnalysisSignal Processing+1

0 views

PreviousPage 70 of 130Next