DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,587 datasets

Speech & Audio

TTS-Synthesize: Text-to-Speech Audio Samples

A dataset likely containing synthesized audio files for text-to-speech applications. It is hosted on the Kaggle platform. The specific size, source, and creation details are not provided in the available metadata.

AudioText To SpeechSpeech SynthesisAudio Generation+1

0 views

Speech & Audio

Egyptian Text-Audio Dataset for TTS Model Training

An automated pipeline for collecting Egyptian Arabic text-audio pairs from YouTube videos. The dataset is created by OmarAhmedSobhy and was last updated on 2026-04-25. It uses forced alignment and automatic speech recognition models to process the audio and text.

TextAudioText To SpeechEgyptian-ArabicForced AlignmentAudio Processing+1

0 views

Speech & Audio

TTS Dataset: Batched Annotations

A batched dataset for text-to-speech (TTS) applications, likely containing audio files paired with annotations. It was published by the user humair025 on the Hugging Face platform and was last updated on May 13, 2026. The specific content, scale, and annotation format require verification after download.

AudioText To SpeechSpeech SynthesisAnnotated Data+1

0 views

Speech & Audio

Spotify and YouTube Music Analytics Data

Kaggle hosts a dataset titled 'Spotify & YouTube Music Analytics'. The dataset likely contains metrics related to music streaming and user engagement from two major platforms. Its specific contents, such as column definitions and data volume, are not described in the provided metadata.

TabularAudioAudio AnalyticsPlatform ComparisonMusic Streaming+1

0 views

Speech & Audio

Cantonese YouTube TTS: Filtered Audio for Speech Synthesis

Cantonese Audio TTS Dataset is a collection for text-to-speech applications, combining alvanlii/cantonese-radio and alvanlii/cantonese-youtube with an additional dataset of equal size. The dataset creator alvanlii applied filtering and audio enhancement techniques, including the removal of overlapped voices and music. It was last updated on 2026-04-05.

TextAudioCantoneseText To SpeechSpeech SynthesisAudio Processing+1

0 views

Speech & Audio

NST: Swedish Automatic Speech Recognition Database (16 kHz)

A speech database created by Nordic Language Technology for developing automatic speech recognition and dictation systems in Swedish. The dataset has been reorganized from its original version to improve its usefulness, with changes to the file and folder structure. It is hosted by KTH and was last updated on March 26, 2026.

AudioLanguagesvLicensecc0 10Speech DatabaseRegionusTask Categoriesautomatic Speech RecognitionAudio CorpusAutomatic Speech RecognitionSwedish Language+1

0 views

Speech & Audio

MusicGenGT: Music Generation Dataset

MusicGenGT is a dataset hosted on Kaggle, likely related to music generation tasks. Its specific contents, scale, and creation details are not provided in the available metadata. Users must download the dataset to verify its structure, size, and intended applications.

AudioMachine LearningMusic GenerationAudio Generation+1

0 views

Speech & Audio

OpenScore String Quartets: Scanned and Rendered Score Images with MusicXML Ground Truth

Guangyangmusic derived this dataset from the OpenScore String Quartets corpus for evaluating Optical Music Recognition systems. It contains a subset of string quartets from the 'long 19th century' with both scanned images of real scores and corresponding MusicXML ground truth, plus clean images rendered from the MusicXML. The dataset was last updated on Hugging Face in April 2026.

AudioMultimodalMusic ScoresBenchmarkOptical Music RecognitionNatural Language ProcessingString QuartetsScore ImagesMusicxml+1

0 views

Speech & Audio

Calliope: Music Recommendation System Data

Calliope is a dataset for music recommendation systems, sourced from Kaggle. The dataset's specific contents, such as user interactions, song features, or ratings, are not detailed in the available metadata. Its scale, authorship, and creation date are unknown, requiring verification after download.

TabularAudioMusic RecommendationRecommender SystemsCollaborative Filtering+1

0 views

Speech & Audio

Google Waxal ASR Challenge Dataset

Google Waxal ASR Challenge Dataset is likely a collection of audio data for an Automatic Speech Recognition challenge. The dataset is hosted on Kaggle, but its specific content, size, and origin are not detailed in the provided metadata. Further details such as the number of samples, recording conditions, and precise collection dates are unknown.

AudioSpeech DataAudio ProcessingAutomatic Speech Recognition+1

0 views

Speech & Audio

TTS-Payload-Final-Auto-Test-V2: Text-to-Speech Synthesis Data

TTS-Payload-Final-Auto-Test-V2 is a dataset hosted on Kaggle. The title suggests it contains data related to text-to-speech synthesis, likely for testing or training machine learning models. No further metadata on size, origin, or content is available.

AudioText To SpeechMachine LearningAudio Synthesis+1

0 views

Speech & Audio

TTS Output: Synthesized Speech Audio Samples

A dataset of audio outputs from a text-to-speech system. The dataset is hosted on Kaggle, but its specific size, creation date, and author are unknown. The content likely contains synthesized speech files generated from text inputs.

AudioText To SpeechSpeech SynthesisAudio Generation+1

0 views

Speech & Audio

Vocal Bursts: 28,564 Non-Speech Audio Samples Across 18 Categories

A curated collection of 28,564 non-speech vocal burst audio samples. The dataset spans 18 categories, including laughter, crying, cough, and sigh. It was created by TTS-AGI and last updated on Hugging Face in March 2026.

AudioVocal BurstsSound ClassificationNon SpeechHuman Sounds+1

0 views

Speech & Audio

Massachusetts Shoreline Positions From 1844 To 1994

Five historic shoreline positions for Massachusetts from 1844 to 1994 document coastal erosion and accretion. The dataset was produced by the Massachusetts Coastal Zone Management office in collaboration with the USGS and Woods Hole Oceanographic Institution. It updates a previous analysis from the mid-1800s to 1978 with new 1994 shoreline data.

GeospatialCoastal erosionHistorical DataShoreline ChangeGeospatial Analysis+1

0 views

Speech & Audio

English Conversational Speech with Multiple Speakers

A collection of English conversational speech audio featuring multiple speakers. The dataset is hosted on Kaggle and is tagged for machine learning, speech synthesis, and synthetic data applications. Details on its size, origin, and specific collection methodology are not provided in the available metadata.

AudioMachine LearningSpeech SynthesisMulti SpeakerEnglish SpeechConversational AudioSpeech RecognitionSynthetic+1

0 views

Speech & Audio

OpenScore Lieder: 19th-Century Song Scores for Optical Music Recognition Evaluation

OpenScore Lieder is a dataset derived from a corpus of 19th-century songs in MuseScore/MusicXML format, created by Gotham & Jonas in 2022. It is built for evaluating Optical Music Recognition systems under piano-only, full-page conditions. The dataset provides both camera page images from source PDFs and clean page images rendered from ground-truth MusicXML.

ImageAudioMultimodal19th centuryMusic ScoresBenchmarkOptical Music RecognitionLiederNatural Language ProcessingPiano+1

0 views

Speech & Audio

Cv Mn 24.0: Mongolian Speech Audio Samples from Common Voice

A Mongolian-language subset of the Mozilla Common Voice speech recognition dataset, containing 6,018 audio samples totaling 9.12 hours. The data is split into training, validation, and test sets, with average clip durations between 5.14 and 5.73 seconds. It was uploaded by user 'bilguun' to Hugging Face and last updated on April 13, 2026.

AudioCommon VoiceAudio DatasetSpeech RecognitionMongolian Language+1

0 views

Speech & Audio

Massachusetts Coastal Land Cover Classifications from 1996

1994-1996 land cover classifications for the Massachusetts coastal zone, derived from 10 full or partial Landsat Thematic Mapper scenes. The data was produced by the Multi-Resolution Land Characteristics program for the Coastal Change Analysis Project to establish environmental baselines. It was later reprojected into the Massachusetts State Plane coordinate system by the Massachusetts Office of Coastal Zone Management in October 2006.

GeospatialLand Use ChangeSatellite ImageryBenchmarkGeospatial AnalysisCoastal Land Cover+1

0 views

Speech & Audio

Saint Kitts and Nevis: Daily Port Calls and Shipment Volume Estimates

PortWatch tracks daily port call counts and shipment volume estimates in metric tons for maritime hubs in Saint Kitts and Nevis. This time-series dataset provides high-frequency monitoring of trade activity updated through March 2026.

TradePorts+1

0 views

Speech & Audio

SiriusXM Channel Lineup and Real-Time Airplay Data

A 2026 dataset from Rebrowser provides a satellite radio channel lineup and real-time track history across SiriusXM music channels. The full dataset contains 59.9 million records and is updated daily, though this HuggingFace version is a limited sample. It includes two primary entities: channel metadata and play-by-play track logs.

TabularAudioTime SeriesGeospatialSatellite RadioAirplay HistoryMedia CatalogMusic Genres+1

0 views

PreviousPage 36 of 130Next