DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,587 datasets

Speech & Audio

CompSpoof V2: 250,000 Audio Samples for Component-Level Anti-Spoofing

CompSpoof V2 contains over 250,000 audio samples totaling approximately 283 hours, developed by XuepingZhang for component-level anti-spoofing research. The data simulates real-world acoustic scenarios where speech, environmental sounds, or both components are spoofed, with each sample provided at multiple sampling rates.

WEBDATASETSize Categories10 Kn100 KModalityaudioLibrarywebdatasetModalitytextLibrarymlcroissantLibrarydatasetsLicensecc By Nc 40Regionus+1

0 views

Speech & Audio

Saint Kitts and Nevis: VIEWS Monthly Conflict and Fatality Forecasts

Monthly conflict forecasts for Saint Kitts and Nevis produced by the Violence & Impacts Early-Warning System (VIEWS) consortium. The system generates predictive data for violent conflict and fatalities up to 36 months in advance using iterative research models. This CSV-formatted data is updated monthly and includes HXL tags for humanitarian interoperability.

HxlFatalitiesConflict ViolenceForecasting+1

0 views

Speech & Audio

Music Audio Data from AudioFLAN

Music is an audio dataset published on the Hugging Face platform by AudioFLAN. The dataset was last updated on 2026-05-14 07:03:08. Its specific content, size, and structure are unknown from the provided metadata.

AudioAudioflan+1

0 views

Speech & Audio

North American Butterfly and Moth Verified Occurrence Records

Verified occurrence and life history data for butterflies and moths across North America. The project aggregates quality-controlled observations from citizen scientists, museum collections, literature, and professional lepidopterists. BAMONA is directed by Kelly Lotts and Thomas Naberhaus at Montana State University.

TabularGeospatialCitizen ScienceSpecies OccurrenceLepidopteraGeospatial DataBiodiversity+1

0 views

Speech & Audio

English Audio Transcripts with 3.4 Million Hours of Speech

OLMoASR-Pool contains approximately 3.4 million hours of audio and 18.8 million unique transcripts collected from the public internet. It was created by AllenAI to train English speech recognition models and includes a variety of speaking styles, accents, and audio setups.

JSONLibrarypolarsSize Categories10 Mn100 MModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasRegionusArxiv250820869Licenseodc By+1

0 views

Speech & Audio

ASOS Ceilometer Cloud Height Measurements from 25 U.S. Stations

ASOS 30-Second Ceilometer Data is a high-resolution time-series of cloud layer observations from Automated Surface Observing System (ASOS) stations. The dataset contains 30-second samples of cloud base height, layer thickness, and sensor status from 25 reference sites across the contiguous United States. It is archived by the National Climatic Data Center (NCDC) under NOAA, with the earliest records from June 1998.

Time SeriesAtmospheric ScienceCloud ObservationMeteorologyWeather Station Data+1

0 views

Speech & Audio

OVSpeech: Open-Vocabulary Instruct Text-to-Speech Dataset

OVSpeech is a dataset built for the ICASSP 2026 paper titled 'OV-InstructTTS: Towards Open-Vocabulary Instruct Text-to-Speech'. It is constructed upon the ContextSpeech framework and is authored by y-ren16. The dataset was last updated on the Hugging Face platform in April 2026.

AudioText To SpeechSpeech SynthesisOpen VocabularyAudio Generation+1

0 views

Speech & Audio

RWKV-ASR: Speech Recognition Data for Frozen RWKV Language Models

An exploratory experiment to enable frozen pretrained RWKV language models to accept speech modality input. The dataset, created by author 'echodict', is hosted on Hugging Face and was last updated on 2026-04-01. It follows the SLAM_ASR approach to bridge the gap between text-trained LLMs and speech recognition tasks.

AudioMultimodalExperimentalMultimodal AiLanguage ModelRegionusArxiv240208846Audio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Linguistically Challenging English Text-to-Speech Benchmark

Tricky TTS is a benchmark dataset designed to stress-test text-to-speech models on challenging English text. Each row targets a specific failure mode to separate capable systems from weaker ones. The dataset was created by Trelis and last updated in March 2026.

TextAudioEnglishOPTIMIZED-PARQUETParquetText To SpeechLibrarypolarsLanguageenSize Categoriesn1 KModalitytextLibrarymlcroissantEvaluationLibrarydatasetsBenchmarkLibrarypandasLinguistic ChallengeRegionusSpeech EvaluationLicensemit+1

0 views

Speech & Audio

Codemixed New: A Unified Collection of Code-Mixed Automatic Speech Recognition Datasets

A unified collection of code-mixed automatic speech recognition datasets. The dataset was uploaded by author RidheshBhati to the Hugging Face platform and was last updated on May 1, 2026.

AudioMultilingualCode MixingSpeech Recognition+1

0 views

Speech & Audio

AoVp: Brain and Audio Data from a Musical Performance

7998514482 bytes of data comprise this multimodal dataset from the 'Art of Virtuosity' performance within the Music-in-Medicine program. It likely contains EEG recordings and audio files, such as piano performances, to study brain synchrony. The dataset is available under a CC-BY-4.0 license.

AudioMultimodalBrain SynchronyBrainMo Bi NeuroscienceEegPianoNeuroscience+1

0 views

Speech & Audio

AoVr: Multimodal EEG and Audio from a Music-in-Medicine Rehearsal

9556620369 bytes of multimodal data were collected during the 'Art of Virtuosity' rehearsal, part of the Music-in-Medicine program. The dataset includes EEG and audio recordings, suggesting a focus on the neurological and acoustic aspects of musical performance. Its cross-platform presence and open license indicate it is intended for research in music cognition and therapy.

AudioMultimodalRehearsalMo BiEegMusic In Medicine+1

0 views

Speech & Audio

Enhanced Audiosnippets Long 2.8M: Speech Audio with Emotion and Speaker Features

2.6 million audio snippets totaling 4,932 hours of speech, enhanced with emotion annotations and speaker embeddings. The dataset, created by ai-music4you3, contains WAV files at 48kHz mono with durations ranging from 3.0 seconds to over 18 minutes. It was last updated on March 17, 2026.

AudioEmotion AnalysisSpeech ProcessingSpeaker Embeddings+1

0 views

Speech & Audio

F5-TTS Offline Wheels: Text-to-Speech Model Components

F5-TTS Offline Wheels is a dataset published on Kaggle. The title suggests it contains components for an offline text-to-speech system. The dataset's specific contents, scale, and authorship are not detailed in the provided metadata.

AudioText To SpeechSpeech SynthesisOffline Tts+1

0 views

Speech & Audio

FunASR: Speech Recognition Dataset

FunASR is a dataset hosted on Kaggle. The dataset's title suggests a focus on automatic speech recognition. Specific details regarding its size, origin, and content are not provided in the available metadata.

AudioMachine LearningAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Top 10 Spotify Music Trends for 2023

Top 10 Spotify Music Trends 2023 is a dataset from Kaggle. It likely contains ranked lists or metrics related to popular songs, artists, or genres on the Spotify platform during the 2023 calendar year. The dataset's specific columns, size, and author are unknown.

TabularAudioSpotifyMusic TrendsTop Charts+1

0 views

Speech & Audio

mushinhttpapiprodrtx20260629asrc: HTTP API Production Logs

mushinhttpapiprodrtx20260629asrc is a dataset of HTTP API logs from a production system, published on Kaggle. The title suggests it contains records from a system named 'mushin' for a date in June 2026. Its specific content and structure are not detailed in the available metadata.

TabularSystem MonitoringProductionHttpApi+1

0 views

Speech & Audio

Call Center Audio With 13,000 Hours Of Customer Service Calls

Call Center Audio is a large audio dataset containing over 13,000 hours of real-world customer service calls. It features time-stamped transcripts and over 90% unique speakers, supporting tasks like speech recognition and speaker diarization. The dataset was created by ud-nlp and was last updated in March 2026.

AudioAUDIOFOLDERLicensecc By Nc Nd 40Call CenterSize Categoriesn1 KLibrarymlcroissantVoice RecognitionTask Categoriestext To AudioLibrarydatasetsRegionusTask Categoriesautomatic Speech RecognitionAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

QuFp: Multimodal EEG and Audio Data for Music-in-Medicine

8165719852 bytes of multimodal data, including EEG and audio recordings, are provided for the "Quasi una Fantasia" piece from the Music-in-Medicine program. The dataset is published under a CC-BY-4.0 license and contains files in formats such as XLSX, MP3, WAV, CSV, MAT, and MP4. It is intended for research exploring the intersection of music, neuroscience, and therapeutic applications.

AudioMultimodalMo BiEegMusic In Medicine+1

0 views

Speech & Audio

RinBp: Multimodal EEG and Audio Recording of Rhapsody in Blue

Multimodal data recording from Rhapsody in Blue performance from the Music-in-Medicine program. The dataset includes brain activity and audio recordings, likely containing EEG signals synchronized with musical performance audio. Its 4.3 GB size suggests a detailed capture of the event.

AudioTime SeriesMultimodalBrainMo BiEegMusic In MedicinePianoNeuroscience+1

0 views

PreviousPage 39 of 130Next