DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

Massachusetts Tree Ring Chronology from Alander Mountain (322 to -37 BP)

NOAA/WDS Paleoclimatology archives a tree ring dataset from Alander Mountain, Massachusetts. The chronology covers 359 years, from 322 to -37 calendar years before present. NOAA National Centers for Environmental Information (NCEI) published this data in 1987.

Time SeriesGeospatialTree RingPaleoclimatologyNorth AmericaClimate Reconstruction+1

0 views

Speech & Audio

PASSCAL NOMAD Seismic Time Series from Southern New York

Seismic time series data were collected from four sites in southern New York between June 1997 and February 1999. The dataset contains continuous recordings at 20 samples per second and 1 sample per second, deployed by the PASSCAL instrument center. Preliminary analysis focused on shear-wave splitting measurements to study upper mantle anisotropy.

Time SeriesGeospatialShear Wave SplittingPasscal ExperimentGeophysicsSeismic AnisotropyUpper Mantle+1

0 views

Speech & Audio

CommonVoice: A Crowdsourced Speech Recognition Dataset

CommonVoice is a dataset hosted on Kaggle. The title suggests it is a speech and audio dataset, likely containing voice recordings. The specific content, size, and collection details are not provided in the available metadata.

TabularAudioAudio DataVoice CorpusSpeech Recognition+1

0 views

Speech & Audio

Chinese Musical Instruments Timbre Evaluation Database with Subjective Scores

The Chinese Musical Instruments Timbre Evaluation Database contains subjective timbre evaluation scores for 37 Chinese and 24 Western instruments. The data was collected from Chinese participants with musical backgrounds in a subjective evaluation experiment using 16 descriptive terms. The dataset also includes 10 spectrogram analysis reports.

TabularMultimodalMusic InstrumentsSubjective ScoresBenchmarkChinese MusicTimbre EvaluationSpectrogram Analysis+1

0 views

Speech & Audio

TTS-Hungarian: 702 Hours of Hungarian Speech from Audiobooks

TTS-Hungarian is a large-scale speech dataset containing 253,116 audio samples totaling 702 hours, derived from the Magyar Elektronikus Könyvtár (MEK) collection of Hungarian audiobooks. It features recordings from 100 unique speakers, with an average sample duration of 10.0 seconds and an average DNSMOS quality score of 3.68. The dataset was created by the author 'datadriven-company' and was last updated on the Hugging Face platform in February 2026.

AudioText To SpeechAudiobooksHungarian LanguageLarge ScaleSpeech Recognition+1

0 views

Speech & Audio

Codemix_TTS: Text-to-Speech Data for Code-Mixed Language

A Kaggle dataset titled 'codemix_tts' likely contains audio data for text-to-speech synthesis. The dataset's specific content, such as the number of audio samples or languages covered, is not detailed in the provided metadata. It is hosted on the Kaggle platform, but the author, organization, and last update date are unknown.

AudioText To SpeechSpeech SynthesisCode Mixing+1

0 views

Speech & Audio

Synthetic Turkish TTS Data: 10K-100K Audio Pairs Across 13 Domains

Between 10,000 and 100,000 synthetic Turkish audio-text pairs across 13 specialized domains were generated by Anilosan15 and updated in March 2026. The data includes synthesized speech for sectors such as finance, healthcare, and technical support, created using a high-quality TTS model.

ParquetSize Categories10 Kn100 KTask Categoriestext To SpeechLibrarypolarsLibrarydaskModalityaudioModalitytextLibrarymlcroissantLibrarydatasetsLanguagetrRegionusTask Categoriesautomatic Speech RecognitionLicensecc+1

0 views

Speech & Audio

Mongolian Speech Recognition Data from Common Voice with Translations

A dataset by Ganaa0614, last updated on 2026-04-14. The title suggests it contains Mongolian speech audio and corresponding text translations, likely derived from the Common Voice project. The specific volume of audio clips and translated sentences is unknown.

TextAudioAudio DataTranslationSpeech RecognitionMongolian Language+1

0 views

Speech & Audio

U.S. State Government Party Control, 1834-1985

United States historical data on the partisan composition of state legislatures and the party affiliation of governors from 1834 to 1985. The collection provides annual and biennial records for each legislature. Data from 1834-1868 were collected by W. Dean Burnham of MIT, with subsequent years added by ICPSR staff.

TabularParty AffiliationUs State GovernmentsArithmeticComputer ScienceMathematicsState Computer ScienceHistorical DataDivision MathematicsPolitical Science+1

0 views

Speech & Audio

ChatTS Evaluation Datasets A and B: Time-Series Question Answering

Time-series question answering evaluation data for ChatTS, sourced from Kaggle. The dataset's author, organization, and specific size are unknown. Its last update date is also unspecified.

TextTime SeriesChat TsEvaluationBenchmarkQuestion Answering+1

0 views

Speech & Audio

Automatic Speech Recognition Error Robustness Dataset for Sentence Classification

Sentence classification datasets containing Automatic Speech Recognition (ASR) errors, hosted on AWS Open Data. The data is provided by Amazon and is associated with a research project on ASR error robustness. The license details are available via a linked GitHub repository.

TextAudioAmazonscienceMachine LearningAsr ErrorSentence ClassificationNatural Language ProcessingSpeech RecognitionDeep Learning+1

0 views

Speech & Audio

Standard Moroccan Amazigh Audio and Text with <1,000 Records

Standard Moroccan Amazigh audio recordings and text transcripts totaling fewer than 1,000 records, created by abdelhaqueidali and updated in March 2026. The dataset provides raw, unprocessed speech data for the development of Automatic Speech Recognition and Text-to-Speech models.

AudioAUDIOFOLDERText To SpeechModalityaudioSize Categoriesn1 KModalitytextLibrarymlcroissantLibrarydatasetsLicensecc By 40LanguageberAmazighRegionusLanguagezghAutomatic Speech Recognition+1

0 views

Speech & Audio

Customer Service Persian Diarization Dataset with 80 Hours of Synthetic Speech

The customer_service_persian_diarization_dataset is a synthetic multi-speaker speech dataset designed for training and evaluating speaker diarization models in Persian (Farsi). It contains approximately 80 hours of audio, built using utterances from a customer service dataset and processed through a synthesis framework to simulate realistic conversational dynamics. The dataset was created by atiyehghm and was last updated on the platform in February 2026.

AudioCustomer ServiceMulti SpeakerPersian LanguageSynthetic SpeechSyntheticSpeech Diarization+1

0 views

Speech & Audio

Multilingual Speech Sample from Global Contributor Network

A sample from the Silencio corpus, which contains over 100,000 hours of speech data. The full dataset is collected from a community of over 2 million contributors across more than 180 countries and 100 languages.

AUDIOFOLDERTask Categoriestext To SpeechLanguageenSize Categoriesn1 KModalitytextLibrarymlcroissantTask Categoriesaudio ClassificationLibrarydatasetsLanguageyoMultilingualitymultilingualLicensecc By Nc 40RegionusTask Categoriesautomatic Speech RecognitionLanguagezuLanguageesLanguagedeVoice Ai+1

0 views

Speech & Audio

LoquaciousSet: 25,000 Hours of Heterogeneous English Speech Recognition Data

25,000 hours of transcribed English speech form the core of this dataset for automatic speech recognition research. The collection includes read and spontaneous speech in both clean and noisy acoustic conditions, organized into subsets of varying size. SpeechBrain authored the dataset, which was last updated on February 11, 2026.

AudioEnglishSpeech RecognitionTranscribed Speech+1

0 views

Speech & Audio

Car-Voice-Qwen3: Text-to-Speech Models for In-Vehicle Systems

Car-Voice-Qwen3-TTS-Models is a collection of text-to-speech models likely designed for automotive voice interfaces. The dataset is hosted on Kaggle, but its specific contents, scale, and creation details are not provided in the available metadata. Further verification is required to determine the exact model architectures, audio samples, and performance characteristics included.

AudioText To SpeechQwenSpeech SynthesisCar Voice+1

0 views

Speech & Audio

Car-Voice-Qwen3-TTS-Wheels: Text-to-Speech Audio Samples

A collection of audio samples likely generated by a text-to-speech model named Qwen3, potentially for automotive voice interface applications. The dataset is published on Kaggle, but its specific size, creation date, and author are unknown. The content appears to focus on synthesized speech, possibly for testing or training voice systems.

AudioText To SpeechSpeech SynthesisAutomotive VoiceAi Voice+1

0 views

Speech & Audio

TurkmenSpeech: 251 Hours of Turkmen Audio with Transcriptions for ASR

TurkmenSpeech is a dataset containing 251.86 hours of Turkmen speech audio with transcriptions, created by rozumov and last updated in February 2026. It comprises 119,847 audio clips sampled at 16,000 Hz, intended for training and evaluating Automatic Speech Recognition models. This dataset is described as one of the largest publicly available Turkmen speech collections.

AudioAsr TrainingAudio TranscriptionTurkmen LanguageSpeech Recognition+1

0 views

Speech & Audio

Peoples Speech ASR Clean: Filtered Audio Samples for Speech Recognition

OpenSpeechHub provides a filtered dataset for automatic speech recognition. The dataset has been processed to remove samples with fewer than three words, repetitive tokens, or chat token leaks. It was last updated on March 31, 2026.

AudioAudio ProcessingSpeech RecognitionFiltered Dataset+1

0 views

Speech & Audio

Kazattsd B1 B2 B3: Kazakh Speech Audio Data

Kazattsd B1 B2 B3 is a speech audio dataset authored by 'issai' and published on the Hugging Face platform. The dataset's title suggests it contains Kazakh language audio recordings, potentially categorized by proficiency levels B1, B2, and B3. It was last updated on April 15, 2026, but specific details on size, format, and content are not provided in the metadata.

AudioSpeech AudioLinguisticsKazakh Language+1

0 views

PreviousPage 57 of 130Next