DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

Economic Impact Indices of U.S. Music Zones on Local Businesses

Aggregating six indices measuring the economic impact of 109 Music Zones in the United States. The indices assess venue concentration, tourism proximity, business counts, non-chain business presence, total annual economic output, and supported employment.

Arts And HumanitiesBusiness and ManagementAgglomeration Theory Independent Music Venues Musi+1

0 views

Speech & Audio

Tts Emotional: Text-to-Speech Data with Emotional Labels

A dataset titled 'Tts Emotional' published on the Hugging Face platform by SeifElden2342532. The dataset was last updated on March 3, 2026. Its title suggests it likely contains audio data for text-to-speech synthesis with emotional attributes.

AudioText To SpeechSpeech SynthesisEmotional Speech+1

0 views

Speech & Audio

Tech-Enabled Restaurant Market Data for Massachusetts

A sample of restaurant market data from BeamStation, focusing on technology-ready establishments within Massachusetts, United States. The dataset is a free sample, but the total number of rows, columns, and specific collection date are not provided. The original author and organization are unknown.

TabularRestaurant MarketTech Enabled BusinessMassachusettsBusiness Data+1

0 views

Speech & Audio

CodecFake+: A Large-Scale Neural Audio Codec-Based Deepfake Speech Dataset

A large-scale dataset for deepfake speech detection, created by the CodecFake organization and released in 2025. It includes the CoRS and CoSG subsets, providing audio samples and corresponding protocol and label files for research in synthetic audio generation and detection.

AudioAudio ForensicsNeural CodecSpeech SynthesisAudio-DeepfakeLarge Scale+1

0 views

Speech & Audio

CoSHE-Eval: Hindi-English Code-Switching Speech Recognition Benchmark

India-focused conversational speech data for evaluating Automatic Speech Recognition systems on Hindi-English code-mixed utterances. The dataset was curated by soketlabs and last updated on the Hugging Face platform in January 2026. It focuses on natural bilingual contexts where Hindi in Devanagari script and English in Latin script co-occur within the same utterance.

AudioAsr BenchmarkCode SwitchingBenchmarkHindi EnglishSpeech Recognition+1

0 views

Speech & Audio

Kanitts Speaker Similarity Dataset for Voice Comparison

A speaker similarity dataset created by thezholdoshbekov, hosted on Hugging Face and last updated in March 2026. The dataset is structured in a tabular format and includes text modality, as indicated by platform tags. It is designed for tasks involving the comparison and identification of speaker voices.

TabularAudioParquetSize Categories1 Kn10 KLibrarypolarsModalitytextModalitytabularLibrarymlcroissantLibrarydatasetsLibrarypandasTabular DataVoice IdentificationSpeech ProcessingRegionusSpeaker Similarity+1

0 views

Speech & Audio

English High-Fidelity Text-to-Speech Audio Samples

Over 100,000 audio samples for text-to-speech applications, hosted on Hugging Face by datadriven-company. The dataset includes text and corresponding high-fidelity speech audio. It was last updated in March 2026.

TextAudioParquetText To SpeechLibrarypolarsLibrarydaskModalityaudioSpeech SynthesisModalitytextSize Categories100 Kn1 MLibrarymlcroissantLibrarydatasetsEnglish LanguageRegionusAudio Generation+1

0 views

Speech & Audio

Bahraini Arabic Speech Corpus with 90,421 Utterance Clips

Bahraini Speech Dataset is a Bahraini Arabic speech corpus built from publicly available podcast and video content. It contains 90,421 single-speaker utterance clips with aligned transcriptions, created by Hishambarakat and last updated on January 23, 2026.

AudioDialect ModelingSpeech CorpusNatural Language ProcessingBahraini ArabicAutomatic Speech Recognition+1

0 views

Speech & Audio

English Speech Recognition Dataset With Expert Annotations

SPGISpeech is a monolingual English dataset for automatic speech recognition tasks. The dataset is categorized as containing between 1 million and 10 million data instances. It was created by the author 'kensho' and was last updated in January 2026.

ParquetSource DatasetsoriginalLicenseotherLibrarypolarsLibrarydaskModalityaudioSize Categories1 Mn10 MLanguageenLanguage CreatorsfoundArxiv210402014ModalitytextLibrarymlcroissantLibrarydatasetsRegionusTask Categoriesautomatic Speech RecognitionMultilingualitymonolingualAnnotations Creatorsexpert Generated+1

0 views

Speech & Audio

Universe-Merged-withzero-noASR: Audio Dataset

An audio dataset titled 'universe-merged-withzero-noASR' is hosted on Kaggle. The dataset's specific content, scale, and creation details are unknown from the provided metadata. Its title suggests it may involve merged audio data, possibly excluding automatic speech recognition (ASR) components.

AudioMerged DataAudio Processing+1

0 views

Speech & Audio

ASR Full Bundle: Speech Recognition Dataset

ASR Full Bundle likely contains audio data for training automatic speech recognition systems. The dataset is hosted on Kaggle, but its specific contents, size, and origin are unknown. Users must download the dataset to verify its actual scope and quality.

AudioSpeech DataAudio ProcessingAutomatic Speech Recognition+1

0 views

Speech & Audio

ASR-50hour_chunk: 50-Hour Speech Corpus for Automatic Speech Recognition

ASR-50hour_chunk of lipighor is a dataset for automatic speech recognition (ASR) tasks, published on Kaggle. The title suggests it contains approximately 50 hours of audio data, likely segmented into chunks. The dataset's specific source, collection method, and detailed contents require verification after download.

AudioSpeech RecognitionAutomatic Speech Recognition+1

0 views

Speech & Audio

Sung Poetry Recordings from Tajik Badakhshan in 1998

Encompassing audio recordings of sung poetry from the Pamir Mountains in Tajikistan's Gorno Badakhshan Autonomous Region, collected during fieldwork in 1998. The recordings were made by Jan van Belle and are part of a larger collection spanning multiple years.

Arts And HumanitiesPamir+1

0 views

Speech & Audio

Speech Utterances: A Collection of Human Non-Speech Vocal Sounds

A collection of human non-speech vocal sound datasets in WebDataset format, useful for audio classification tasks. The collection includes the 'NonSpeech7k' subset with 7,014 samples across 7 classes like breathing and laughing, sourced from Zenodo. The dataset was authored by 'gijs' and last updated on Hugging Face in January 2026.

AudioHuman VocalizationsAudio ClassificationNon Speech Vocal SoundsSpeech Utterances+1

0 views

Speech & Audio

Pittsburgh Sleep Quality Index Survey Responses

Survey responses measuring sleep quality using the Pittsburgh Sleep Quality Index (PSQI). The data is sourced from Kaggle and likely contains self-reported assessments from healthy individuals. Specific details on the number of records, collection period, and original authors are not provided in the metadata.

TabularHealth MetricsSleep QualitySurvey Data+1

0 views

Speech & Audio

Muse: 116,000 Synthetic Songs with Lyrics and Style Descriptions

Muse contains 116,000 synthetic music tracks in Chinese and English, synthesized using SunoV5 and paired with automatically generated lyrics and style descriptions. Created by bolshyC and introduced in early 2026, the collection supports research into reproducible long-form song generation. The data is divided into Chinese (CN) and English (EN) subsets to facilitate multilingual audio modeling.

AudioArxiv260103973LanguagezhLanguageenSize Categories100 Kn1 MTask Categoriestext To AudioSong GenerationRegionusLicensemit+1

0 views

Speech & Audio

GAMETES_Epistasis_3-Way_20atts_0.2H_EDM-1_1

Serving as from the GAMETES repository, which generates simulated genetic data for studying epistasis. The specific file name suggests it models a 3-way epistatic interaction with 20 attributes and a heritability of 0.2. No row count, column details, or sample data are available.

0 views

Speech & Audio

GAMETES Epistasis 2-Way 20 Attributes 0.4H EDM-1 1

A release from the GAMETES repository for generating epistasis models. The specific configuration is a 2-way epistasis model with 20 attributes and a heritability of 0.4. Details on row count, columns, and sample data are unavailable.

0 views

Speech & Audio

GAMETES Epistasis 2-Way 20 Attributes with 0.1 Heritability

A GAMETES dataset for epistasis detection, focusing on 2-way interactions with 20 attributes and a heritability of 0.1. The dataset is generated using the EDM-1 model. Specific details on row count, columns, and sample data are unavailable.

0 views

Speech & Audio

Kannada Speech Recognition Benchmark Dataset

A benchmark dataset for Kannada speech recognition tasks, created by thezholdoshbekov. The dataset was last updated in March 2026 and is hosted on the Hugging Face platform with a size category of 1K to 10K entries. It is associated with libraries for tabular and text data processing.

AudioParquetSize Categories1 Kn10 KLibrarypolarsAudio ClassificationModalitytextModalitytabularLibrarymlcroissantLibrarydatasetsLibrarypandasRegionusBenchmark DatasetSpeech Recognition+1

0 views

PreviousPage 73 of 130Next