DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,582 datasets

Speech & Audio

Sonata 3 in G Major Musical Score, Berkeley Ms 793, Molto Adagio Movement

A 26.0 KB PDF file containing the second movement (Molto adagio) of Sonata 3 in G major from a set of three sonatas for keyboard, cello, and violin. The score is from Berkeley Ms 793, pages 12r-13r, and was uploaded by Matthew James Zenas Dicken to figshare in April 2026.

TextAudioClassical MusicMusicologyChamber MusicMusical Scores+1

0 views

Speech & Audio

Sonata in G Major for Keyboard, Cello, and Violin, Berkeley Ms 793

Berkeley Ms 793, ff. 12v-14r, contains the second movement (Allegrino) of Sonata 3 in G major from a set of three sonatas for keyboard, cello, and violin. The dataset is a 24.9 KB PDF file authored by Matthew James Zenas Dicken and last updated on 2026-04-13. It is shared under a CC-BY-4.0 license on the figshare platform.

TextAudioClassical MusicMusic ScoreChamber MusicSonata+1

0 views

Speech & Audio

Berkeley Ms 795: 25 Variations on an Ascending Scale for Treble and Bass

Berkeley Ms 795 is a manuscript containing 25 variations on a theme, structured for two parts (treble and bass). The 18.0 MB PDF file, authored by Matthew James Zenas Dicken, was last updated on figshare in April 2026. This sketch can be performed by two separate instruments or on a single keyboard.

TextAudioManuscriptChamber MusicMusical ScoresMusic Composition+1

0 views

Speech & Audio

God-Level Producer Mindframe Dataset: 15,000 Examples for Music AI Training

15,000 examples intended to train large language models to emulate the creative decision-making of prominent hip-hop producers. The dataset, created by user gss1147, was last updated on Hugging Face in April 2026. It aims to teach AI the combined mindset of producers like Lil Jon, Dr. Dre, and Pharrell.

TextAudioLlm TrainingMusic ProductionCreative Process+1

0 views

Speech & Audio

Agri STT Benchmarking Dataset: Multilingual Agricultural Speech for ASR

A domain-specific, multilingual agricultural speech dataset with a primary focus on Hindi, Telugu, and Odia. It features human-annotated transcriptions and is intended for benchmarking ASR model performance in real-world agricultural scenarios, created by DigiGreen. The dataset page was last updated on 2026-04-15.

AudioMultilingualIndian LanguagesBenchmarkingBenchmarkAgricultureSpeech Recognition+1

0 views

Speech & Audio

Key Components of Melodic RAS: Parkinson's Gait and Neural Response Data

An exploratory pilot study protocol investigating the behavioral and neurophysiological response to two types of Rhythmic Auditory Stimulation (RAS) in individuals with Parkinson's Disease. The study, authored by Kyurim Kang and last updated in March 2026, likely contains data on gait parameters and local field potentials recorded from deep brain stimulation devices. The dataset is small, with a file size of 5.5 KB.

TabularAudioDeep Brain StimulationRecording Neural SignalsDiv PGait AnalysisGlobus Pallidus InternusRoutine CareRhythmic Auditory StimulationTwo RasDedicated Gait TherapiesHealthcareRhythmic BeatsPure Rhythmic RasParkinsons DiseaseWithout MelodyProvide InsightsUses MetronomeIncluding Gait AbnormalitiesStudy ProtocolSubthalamic NucleusGait Remains LimitedGait ParametersTm SupStride LengthSyntheticNeural Signals+1

0 views

Speech & Audio

Waxal-ASR: Speech Recognition Audio Data

Waxal-ASR is a dataset hosted on Kaggle. The title suggests it contains audio data for automatic speech recognition tasks. No further details on size, origin, or specific content are available from the provided metadata.

AudioAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Vocal Burst Annotation ASR Tuning Dataset: 500,000 Synthetic Multilingual Audio Samples

500,000 synthetic audio samples for training automatic speech recognition models. The dataset, created by TTS-AGI, contains approximately one minute of audio per sample with interleaved speech and vocal bursts. It was last updated on Hugging Face in April 2026.

AudioMultilingualVocal BurstsAudio AnnotationSynthetic DataSpeech RecognitionSynthetic+1

0 views

Speech & Audio

Saint Kitts and Nevis Human Development and Poverty Indicators from UNDP

UNDP Human Development Reports Office (HDRO) data on human development and multidimensional poverty for Saint Kitts and Nevis. The dataset includes the Human Development Index (HDI), which measures average achievement in health, knowledge, and living standards, and the 2019 Global Multidimensional Poverty Index (MPI). The data was last updated on 2026-03-04 03:36:00.965148.

TabularIndicatorsEducationDevelopmentHuman DevelopmentHealthFinanceDemographicsPoverty IndicatorsGenderSocioeconomicsSustainable Development+1

0 views

Speech & Audio

SPGISpeech 2.0: Financial Domain Speech Transcription Dataset

SPGISpeech 2.0 is a dataset for speaker-tagged transcription in the financial domain, created by Kensho. It contains audio snippets and their corresponding fully formatted text transcriptions, suitable for end-to-end automatic speech recognition (ASR). The dataset improves the diversity of applicable modeling tasks while maintaining the core characteristics of the original SPGISpeech dataset.

AudioMultimodalFinancial DomainSpeaker TaggedFinanceSpeech Recognition+1

0 views

Speech & Audio

VieNeu-TTS-140h: 74,858 High-Quality Vietnamese Audio Samples for Speech Synthesis

74,858 high-quality Vietnamese audio samples with phonemized transcripts, designed for fine-tuning modern Text-to-Speech models. The dataset was created by LanguaMan, who collected audio from YouTube, cleaned background noise, and used the Whisper-large-v3 model for transcription, followed by agent-assisted spelling correction and human feedback. The dataset page was last updated on April 21, 2026.

TextAudioText To SpeechPhonemized TranscriptsSpeech SynthesisVietnamese+1

0 views

Speech & Audio

Arabic Tashkeel Speech: 1,093 Diacritized Recordings from 10 Speakers

An open-source collection of 1,093 fully diacritized Arabic speech recordings, crowd-sourced from native speakers via Nahw.ai. The dataset contains audio recordings resampled to 16 kHz paired with their fully diacritized transcriptions. It was created by NahwAI and last updated on 2026-04-21.

AudioArabic SpeechAudio CorpusDiacritizationSpeech Recognition+1

0 views

Speech & Audio

Omnidistilthinking: Conversational AI Speech and Transcripts

A collection of conversational turns with audio recordings and transcripts. The dataset includes columns for conversation identifiers, speaker agents, prompts sent to a Gemini Live model, spoken transcripts, and audio durations. It was created by ShiniChien and last updated on May 18, 2026.

TabularAudioAudio TranscriptSpeech SynthesisConversational AiMultimodal Dialogue+1

0 views

Speech & Audio

Parkinson's Disease Motor Training Data from VR and Music Therapy Clinical Trial

A secondary analysis dataset from a quasi-randomized clinical trial involving 20 participants with idiopathic Parkinson's disease. The data compares immersive virtual reality gait training alone versus combined with Neurologic Music Therapy, with outcomes measured across 12 rehabilitation sessions. Paolo De Pasquale authored the dataset, which was last updated on March 18, 2026.

TabularAudioTime SeriesParkinson DiseaseGait AnalysisVirtual RealityNeurologic Music TherapyHealthcareNeurorehabilitationGait And BalanceCarenBiomechanics Of Gait+1

0 views

Speech & Audio

Parkinson's Disease Motor Training: VR and Music Therapy Clinical Trial Results

A secondary analysis of a quasi-randomized clinical trial involving 20 participants with idiopathic Parkinson's disease. The study compares immersive virtual reality gait training alone versus combined with Neurologic Music Therapy, assessing clinical and biomechanical outcomes. The dataset includes pre- and post-treatment results from 12 rehabilitation sessions conducted over four weeks.

TabularAudioTime SeriesParkinson DiseaseGait AnalysisVirtual RealityNeurologic Music TherapyHealthcareParkinsons DiseaseNeurorehabilitationGait And BalanceCarenBiomechanics Of Gait+1

0 views

Speech & Audio

Fleurs-mn: Mongolian Speech Recognition Dataset with 15.5 Hours of Audio

4,428 audio samples totaling 15 hours and 33 minutes of Mongolian speech, recombined and split into train and test sets. The dataset, created by Batuka0901, contains 3,985 training samples and 443 test samples. It was last updated on the platform in April 2026.

AudioAudio DatasetSpeech RecognitionMongolian Language+1

0 views

Speech & Audio

Dzongkha ASR: Speech Recognition Data for Bhutan's National Language

Dzongkha ASR data published on HuggingFace by Jyoti-77. The dataset's last update was recorded as 2026-06-01. Its specific size, format, and content details are not provided in the metadata.

AudioDzongkhaSpeech Recognition+1

0 views

Speech & Audio

Python Packages for MusicGen AI Music Generation

A dataset related to Python packages for the MusicGen AI music generation system. The dataset is hosted on the Kaggle platform. Specific details regarding the number of packages, their versions, or the included data are not provided in the available metadata.

AudioPython PackagesMusic GenerationArtificial Intelligence+1

0 views

Speech & Audio

FLEURS-Kobani: Northern Kurdish Speech Benchmark with 18 Hours of Audio

FLEURS-Kobani is a speech dataset for Northern Kurdish (Kurmanji, ISO 639-3: KMR), designed as an extension of the FLEURS benchmark. It contains 5,162 utterances totaling 18 hours and 24 minutes of audio from 31 native speakers. The dataset was created by author 'aranemini' and was last updated on Hugging Face in April 2026.

AudioKurmanjiAudio DatasetBenchmarkSpeech TranslationMultilingual BenchmarkSpeech Recognition+1

0 views

Speech & Audio

Whisper-Large-V3: Lingala Speech Recognition Dataset

whisper-large-v3-lingala-asr is a dataset for Automatic Speech Recognition (ASR) in the Lingala language. It is hosted on Kaggle, but its specific size, creation method, and author are not detailed in the provided metadata. The dataset likely contains audio recordings and corresponding transcriptions for training or evaluating ASR models.

TextAudioSpeech To TextLingalaAudio ProcessingAutomatic Speech Recognition+1

0 views

PreviousPage 27 of 130Next