DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

2502 Dataset TTS: Text-to-Speech Audio Samples

2502_dataset_TTS is a Kaggle-hosted collection likely containing audio data for text-to-speech applications. The dataset's specific content, size, and origin are unconfirmed due to minimal metadata. Its title suggests it may include speech samples or synthesis parameters for machine learning model training.

AudioText To SpeechSpeech SynthesisAudio Generation+1

0 views

Speech & Audio

Musical Note Recognition Data with Cross-Validation

A dataset likely focused on the recognition of musical notes from audio signals. The title suggests it includes a cross-validation scheme, which may indicate a structured setup for model evaluation. It is published on Kaggle, but details on its size, origin, and specific content are unavailable.

AudioMachine LearningCross ValidationMusic Note RecognitionAudio Processing+1

0 views

Speech & Audio

Hinglish-TTS-Src: Text and Audio for Hinglish Speech Synthesis

A dataset titled 'hinglish-tts-src' is hosted on Kaggle. The title suggests it contains source materials for text-to-speech synthesis in Hinglish, a code-mixed language of Hindi and English. The dataset's specific content, size, and creation details are unknown from the provided metadata.

TextAudioMultilingualText To SpeechSpeech SynthesisHinglish+1

0 views

Speech & Audio

RTTS_C0C0: Kaggle Dataset

Kaggle hosts the RTTS_C0C0 dataset. The title suggests it may relate to a specific project or codename. Its content and structure require verification after download.

TabularC0c0Kaggle DatasetRtts+1

0 views

Speech & Audio

New England National Scenic Trail Centerline for Connecticut and Massachusetts

A 235-mile polyline feature depicting the New England National Scenic Trail from the Long Island Sound in Guilford, Connecticut, to the Massachusetts/New Hampshire border. The dataset was created by combining work from the Connecticut Forest & Park Association and the Appalachian Mountain Club. It was last updated on March 4, 2026.

AudioGeospatialConnecticutTriple M TrailNetTrail GeospatialNational Park ServiceEcological Framework Landscapes Landscape DynamicsMassachusettsNpsNamed Entity RecognitionNew England TrailNational Scenic TrailCenterlineHiking InfrastructureMattabesett Trail And Metacomet Trail Metacomet MoNeenMenunkatuck TrailMetacomet Monadnock TrailNortheast RegionInterior Region 1NstTrailNew EnglandIr1New England National Scenic Trail+1

0 views

Speech & Audio

Kikuyu ASR Preprocessed Audio Data

Kikuyu language audio data, preprocessed for automatic speech recognition tasks. The dataset was published on huggingface by the author InterstellarCG and was last updated on March 16, 2026. The specific content, scale, and preprocessing methods require verification after download.

AudioAudio PreprocessingKikuyu LanguageSpeech RecognitionAutomatic Speech Recognition+1

0 views

Speech & Audio

Dysarthric Speech Audio Recordings

Dysarthric speech data published on Kaggle. The dataset likely contains audio recordings of speech affected by motor speech disorders. Specific details on size, collection method, and origin are not provided in the available metadata.

AudioMedical SpeechAudio ProcessingSpeech RecognitionDysarthria+1

0 views

Speech & Audio

Cleaned ASR Transcripts for Speech Recognition Tasks

Cleaned Asr Transcripts is a text dataset published on Hugging Face by author bingbangboom. The dataset likely contains processed transcripts generated by an Automatic Speech Recognition (ASR) system. It was last updated on March 24, 2026.

TextJSONSize Categories10 Kn100 KTask Categoriestext GenerationTranscriptsLibrarypolarsLanguageenModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasSpeech ProcessingRegionusTask Categoriesautomatic Speech RecognitionLicensemitAutomatic Speech Recognition+1

0 views

Speech & Audio

VoxCeleb2: Speaker Recognition Audio Dataset

VoxCeleb2 is an audio dataset published on Hugging Face by the author 'humanify'. The dataset was last updated on March 25, 2026. Its specific content, size, and license details are not provided in the available metadata.

AudioWEBDATASETSize Categories10 Kn100 KVoxcelebAudio DatasetLibrarywebdatasetSpeaker VerificationModalitytextLibrarymlcroissantLibrarydatasetsRegionusSpeech Recognition+1

0 views

Speech & Audio

Train Music: Audio Data for Machine Learning

Audio data related to music, likely intended for training machine learning models. The dataset is hosted on Kaggle, but its specific contents, size, and creation details are not provided in the metadata. Users must download the dataset to verify its exact composition and quality.

AudioTraining Data+1

0 views

Speech & Audio

Common Voice 25.0: Large-Scale Pashto Speech Data for ASR

Large-scale CC0 Pashto speech dataset for Automatic Speech Recognition (ASR). The dataset is part of the Common Voice project, version 25.0, and is hosted on Kaggle. Its specific collection method, size, and contributor details are not provided in the available metadata.

TextAudioMachine LearningAudio DataCc0 LicenseLarge ScaleSpeech RecognitionPashto LanguageAutomatic Speech Recognition+1

0 views

Speech & Audio

Musicprefs: Pairwise Human Preferences for Text-to-Music Systems

A collection of pairwise human preferences for music generated by text-to-music systems. The dataset is hosted on Huggingface Datasets by the author 'i-need-sleep' and was last updated on 2026-01-27. It is intended for research into evaluating and improving generative music models.

AudioPairwise ComparisonMusic GenerationHuman PreferencesText To MusicSyntheticAudio Evaluation+1

0 views

Speech & Audio

Synthetic Vehicle Audio Dataset (VS13-Based)

Synthetic audio data generated based on the VS13 framework, likely containing simulated vehicle sounds. The dataset is hosted on Kaggle, but details on its size, creation method, and specific contents are not provided. Metadata is minimal; actual content requires verification after download.

AudioVehicle SoundsMachine LearningAudio ProcessingSyntheticSynthetic Audio+1

0 views

Speech & Audio

Nepali XTTS LJ Speech Dataset: AI-Generated Voice Data

An AI-generated voice dataset for the Nepali language, published on Kaggle. The dataset is likely designed for text-to-speech (TTS) synthesis, modeled after the LJ Speech dataset structure. Its specific size, creation date, and author details are not provided in the available metadata.

AudioText To SpeechSpeech SynthesisVoice CloningAi GeneratedNepali LanguageSynthetic+1

0 views

Speech & Audio

Indic Dialect ASR Dataset with 2.8M+ Samples Across 30 Languages

A multilingual automatic speech recognition dataset covering 30 Indic dialects and languages. It contains over 2.8 million audio samples with corresponding transcriptions. The dataset was created by author grushaaaaa and last updated on Hugging Face in February 2026.

AudioMultilingualParquetLanguagegrtLanguagedoiLanguageawaLanguagemaiSize Categories1 Mn10 MLanguageneLanguagebrxLanguagebhoLicensecc By 40LanguagesdLanguageasAudio TranscriptionLanguagekruTask Categoriesautomatic Speech RecognitionLanguagemwrLanguageorLanguagekokMultilingual AudioLanguagemniAutomatic Speech RecognitionLanguageksLanguagesat+1

0 views

Speech & Audio

Russian Telephone Speech: 338 Hours from 460 Native Speakers

UniDataPro's collection features 338 hours of Russian telephone dialogues recorded from 460 native speakers across diverse topics. Updated in January 2026, the data is specifically formatted for automatic speech recognition (ASR) research and model training. It maintains a verified 98% Word Accuracy Rate for its transcriptions.

AudioCSVMachine LearningLibrarypolarsModalityaudioLicensecc By Nc Nd 40Size Categoriesn1 KModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasRegionusNatural Language ProcessingSpeech Recognition+1

0 views

Speech & Audio

Pulse 2026: Synthetic Music Data with Engineered Streaming Metrics

Pulse 2026 is a high-fidelity synthetic music dataset with engineered streaming metrics. The dataset appears to focus on music evolution and viral analytics. Its specific source, size, and creation date are unknown.

TabularAudioStreaming MetricsSynthetic DataMusic AnalyticsMusic EvolutionSynthetic+1

0 views

Speech & Audio

Music-Gen-Task3-Split: Audio Data for a Music Generation Task

Music-Gen-Task3-Split is a dataset hosted on Kaggle, likely related to a music generation challenge. The title suggests it contains audio data split for a specific machine learning task, though the exact content and structure are unspecified. No information is available regarding its author, size, or creation date.

AudioMachine LearningMusic GenerationAudio Processing+1

0 views

Speech & Audio

English and Spanish Hate Speech Text Data

Hate speech detection data spanning two major languages, English and Spanish. The dataset is hosted on Kaggle, but its specific collection method, size, and annotation details are not provided in the available metadata. Researchers must download the dataset to inspect its volume, annotation schema, and source characteristics.

TextAudioMultilingualSocial MediaText ClassificationHate Speech+1

0 views

Speech & Audio

Deepspeech Balalaika

This Russian speech corpus contains audio recordings across diverse genres including podcasts, public speeches, YouTube content, audiobooks, and phone calls. The dataset was processed using the BALALAIKA pipeline by the MTUCI lab260 team to provide high-quality annotations for generative speech tasks.

ParquetTask Categoriestext To SpeechLibrarypolarsModalitytextSize Categories100 Kn1 MArxiv250713563ModalitytabularLibrarymlcroissantLibrarydatasetsLibrarypandasLicensempl 20RegionusTask Categoriesautomatic Speech RecognitionLanguageru+1

0 views

PreviousPage 68 of 130Next