DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,585 datasets

Speech & Audio

Music Text Queries Dataset

Music Text Queries is a dataset hosted on HuggingFace by author sb-lucas. The title suggests it contains textual queries related to music, likely for information retrieval or search tasks. The dataset was last updated on 2026-05-29.

TextAudioText QueriesInformation Retrieval+1

0 views

Speech & Audio

F5TTS-Base-Model: A Text-to-Speech Foundation Model

A base model for text-to-speech synthesis, published on Kaggle. The dataset's specific architecture, training data, and performance characteristics are not detailed in the provided metadata. Further details regarding the model's origin, size, and intended use require verification after accessing the dataset.

AudioText To SpeechSpeech SynthesisBase Model+1

0 views

Speech & Audio

FunASR: Multilingual Speech Recognition Model

FunASR-Nano is a flagship model for automatic speech recognition supporting 31 languages. It is described as an LLM-ASR model and is the default recommendation within its platform. The dataset likely contains audio data and associated metadata for training or evaluating this model.

AudioMultilingualMachine LearningAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Saint Kitts and Nevis: Active IATI Humanitarian and Development Aid Activities

Active humanitarian and development aid activities in Saint Kitts and Nevis are documented in this CSV provided by the International Aid Transparency Initiative (IATI). The records track ongoing projects and funding initiatives as of March 2026.

Who Is Doing What And Where 3w 4w 5wFunding+1

0 views

Speech & Audio

Acoustic Characteristics of Musical and Complex Sounds

Sixteen distinct sounds, including eight musical and eight complex sounds, are characterized in this dataset. Mako Katagiri published this small dataset on figshare in April 2026. It is stored in a single 5.5 KB Excel (XLS) file.

TabularExcelPsychoacousticsAcousticsMusic ScienceSound Characteristics+1

0 views

Speech & Audio

Waxal ASR Challenge: Audio Data for Speech Recognition

Audio data likely associated with an Automatic Speech Recognition (ASR) challenge hosted on Kaggle. The dataset's specific content, size, and origin are not detailed in the available metadata. It is published on the Kaggle platform.

AudioMachine LearningChallengeAutomatic Speech Recognition+1

0 views

Speech & Audio

Voiceconvdesign: Conversational AI Audio Dataset with TTS Prompts

Voiceconvdesign is a dataset of conversational turns generated for text-to-speech (TTS) voice design. It contains structured records linking system prompts, agent transcripts, and synthesized audio clips. The dataset was authored by ShiniChien and last updated on Hugging Face on 2026-05-12.

TabularAudioText To SpeechConversational AiVoice DesignAudio Synthesis+1

0 views

Speech & Audio

CommonVoice En Min Vad: English Speech Audio Samples

An audio dataset likely containing English speech samples, sourced from the Common Voice project. The dataset was published by the author WTFO on the Hugging Face platform. It was last updated on June 2, 2026.

AudioCommon VoiceEnglish LanguageSpeech Recognition+1

0 views

Speech & Audio

AUDETER: A Large-scale Dataset for Deepfake Audio Detection

AUDETER (AUdio DEepfake TEst Range) is a large-scale dataset for deepfake audio detection. It consists of over 4,500 hours of synthetic audio generated by 11 recent text-to-speech models and 10 vocoders. The dataset was created by author wqz995 and was last updated on April 4, 2026.

AudioAudio ForensicsSpeech SynthesisBenchmarkDeepfake DetectionLarge ScaleSyntheticSynthetic Audio+1

0 views

Speech & Audio

25,000 Modern Standard Arabic Speech Clips with Full Diacritization

25,000 fully-diacritized Modern Standard Arabic text and audio pairs synthesized by a single Saudi male neural voice. The dataset was created by HeshamHaroon and was last updated on April 20, 2026. Audio clips are rendered at 48 kHz / 16-bit PCM and are organized across 10 thematic categories.

TextAudioText To SpeechModern Standard ArabicArabic SpeechSpeech SynthesisNatural Language ProcessingDiacritization+1

0 views

Speech & Audio

Soroll-IA: Weakly Labeled Audio for Industrial Port Monitoring

Soroll-IA is a weakly labeled audio dataset designed for real-world industrial port monitoring. The dataset likely contains audio recordings from port environments, which can be used for sound event detection and classification tasks. Its specific collection methodology, size, and provenance details are not provided in the available metadata.

AudioIndustrial MonitoringPort EnvironmentWeakly Labeled+1

0 views

Speech & Audio

Omnidistil: Multimodal Conversational Speech Dataset

A dataset of conversational speech audio paired with transcripts and prompts. It contains turn-based dialogue data with columns for conversation identifiers, speaker agents, text prompts, transcripts, and audio files. The dataset was uploaded by ShiniChien to Hugging Face and last updated on 2026-05-15.

TabularAudioAudio DatasetSpeech SynthesisConversational AiMultimodal Dialogue+1

0 views

Speech & Audio

Background Music Audio Collection

A dataset titled 'bgmusic' is hosted on Kaggle. The dataset's specific content, size, and structure are not detailed in the available metadata. Further details such as the author, license, and update history are unknown and require verification from the source page.

AudioBackground Music+1

0 views

Speech & Audio

Google WAXAL ASR Challenge: Original Audio Data

Original audio data for the Google WAXAL Automatic Speech Recognition (ASR) Challenge. The dataset is hosted on Kaggle, but its specific size, content details, and creation date are not provided in the available metadata. Further verification is required to confirm the exact nature and scope of the audio recordings.

AudioAudio DataAsr ChallengeGoogle WaxalSpeech Recognition+1

0 views

Speech & Audio

Google Waxal ASR Challenge: Original Automatic Speech Recognition Data

An audio dataset associated with the Google Waxal Automatic Speech Recognition (ASR) challenge. The dataset's specific content, size, and collection details are not provided in the available metadata. It is hosted on the Kaggle platform.

AudioMachine LearningChallenge DatasetAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Synthetic ASR ZH: Chinese Speech Recognition Data

Synthetic data for Chinese Automatic Speech Recognition (ASR) tasks, published on Kaggle. The dataset's author, organization, and specific size are unknown. Its last update date is also unknown.

TextAudioChinese LanguageSynthetic DataSpeech RecognitionSynthetic+1

0 views

Speech & Audio

dttts_paper2: Speech Synthesis Data for Acoustic Modeling

A dataset titled 'dttts_paper2' is hosted on Kaggle. The title suggests a connection to speech synthesis, likely for acoustic modeling tasks. No further metadata, such as author, size, or specific content, is provided.

TabularAudioText To SpeechSpeech SynthesisAcoustic Modeling+1

0 views

Speech & Audio

NOAA Coastal Orthoimagery of Maine with Infrared and Tidal Data

NOAA's Integrated Ocean and Coastal Mapping initiative produced orthorectified mosaic image tiles for coastal Maine. The dataset includes true color (RGB) and infrared (IR) imagery for Cutts Island, Penobscot, and Reversing Falls, captured from June 5 to 21, 2011, with a ground sample distance of 0.50 meters per pixel. Imagery is provided in TIFF format with associated metadata and browse graphics.

ImageGeospatialComputer VisionCoastal MappingOrthoimagery+1

0 views

Speech & Audio

Music Listening Time Prediction Dataset

A dataset for predicting music listening time, published on Kaggle. The dataset's specific size, features, and collection methodology are not detailed in the available metadata. Its content and structure require verification after download.

TabularTime SeriesBehavioral PredictionMusic Listening+1

0 views

Speech & Audio

Music Extract Labeled Features

Music Extract Labeled Features is a dataset published on Kaggle. The dataset likely contains audio samples or tracks with associated feature labels. Metadata is minimal; actual content requires verification after download.

AudioLabeled DataAudio Features+1

0 views

PreviousPage 29 of 130Next