DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

Urdu Text-to-Speech Corpus for Speech Synthesis Models

A speech corpus for the Urdu language, published on Kaggle. The dataset likely contains audio recordings paired with corresponding text transcripts for training text-to-speech systems. Specific details on size, collection method, and contributors are not provided in the available metadata.

TextAudioText To SpeechSpeech SynthesisUrdu LanguageNatural Language ProcessingAudio Corpus+1

0 views

Speech & Audio

Hindi Speech Audio Dataset for Speech Recognition Models

An audio dataset of Hindi speech, published on the Kaggle platform. The dataset likely contains audio files of spoken Hindi, which can be used for training and evaluating speech processing models. Specific details on the number of recordings, speakers, recording conditions, and collection methodology are not provided in the available metadata.

AudioMachine LearningHindi LanguageAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Georgian Text-to-Speech Audio Dataset

Georgian language audio recordings for text-to-speech synthesis, published on Kaggle. The dataset's size, collection method, and specific content are not detailed in the available metadata. Further details regarding the number of samples, recording quality, and speaker demographics require verification after download.

AudioText To SpeechAudio DatasetSpeech SynthesisGeorgian Language+1

0 views

Speech & Audio

Emotion-Aware Music Dataset with Multimodal Audio Features

Emotion-Aware Music Sentiment Dataset provides multimodal audio features and contextual metadata for emotion-based music AI. The dataset originates from Kaggle, though specific details on volume, authorship, and recency are unavailable.

AudioMultimodalMultimodal DataMusic AnalysisAudio FeaturesEmotion RecognitionNatural Language ProcessingData Analytics+1

0 views

Speech & Audio

Satellite Imagery of Saint Kitts and Nevis

Satellite imagery data covers the Caribbean nation of Saint Kitts and Nevis. The dataset is provided by Techsalerator and hosted on Kaggle. Specific details on data volume, collection date, and resolution are not provided in the input.

ImageGeospatialSatellite ImageryGeospatial AnalysisCaribbean Region+1

0 views

Speech & Audio

Archive of Our Own Music and Bands Fan Fiction Metadata

Archive of Our Own (AO3) data related to music and bands, collected via web scraping. The dataset's size, row count, and specific attributes are unknown. The author, organization, and last update date are also unspecified.

TextAudioFan FictionBands+1

0 views

Speech & Audio

Neuro Parakeet Food: German Medical Speech Recognition Dataset

Synthetic German medical speech data targets neuro-oncology and neurology terminology for fine-tuning ASR models. The dataset was generated by NeurologyAI using Resemble AI Chatterbox TTS for voice and Qwen/Qwen3-30B-A3B for text. It was last updated on January 15, 2026.

TextAudioGerman LanguageMedical SpeechNeurologyHealthcareSynthetic DataAutomatic Speech RecognitionSynthetic+1

0 views

Speech & Audio

Nexora-Music-PD v1-mini: Historical Public-Domain Music Recordings

A curated collection of historical audio recordings sourced from the Library of Congress Citizen DJ collections. The dataset is designed for open research, audio analysis, music information retrieval, remixing, and AI/ML experimentation. This mini release is intended as a lightweight subset for testing pipelines, educational use, and small-scale experiments.

AudioMusic Information RetrievalAudio RecordingsPublic Domain MusicHistorical Audio+1

0 views

Speech & Audio

AsramaGH: Housing or Community Data

AsramaGH is a dataset published on Kaggle. Its title suggests it may contain information related to housing or community structures. The specific content, size, and origin are unknown.

TabularGhCommunityHousingAsrama+1

0 views

Speech & Audio

AsramaGH: Housing or Community Data

AsramaGH likely contains data related to housing or community structures. The dataset is published on Kaggle, but its creator, size, and specific content are unknown. Its last update date is also unknown.

TabularGhCommunityHousingAsrama+1

0 views

Speech & Audio

Participation in School Music or Other Performing Arts by Child Trends

Child Trends provides data on student engagement in arts education. The dataset's specific variables, size, and temporal coverage are not detailed in the available metadata. The original source is listed as paperswithcode, a platform for machine learning resources.

TabularAudioChild DevelopmentPsychologyEducationPerforming ArtsMathematics EducationArtVisual ArtsThe Arts+1

0 views

Speech & Audio

Urdu Text-to-Speech Corpus, Processed

A processed corpus for Urdu text-to-speech (TTS) applications, published on Kaggle. The dataset likely contains audio recordings and corresponding text transcriptions. Specific details on size, source, and processing methods are not provided in the available metadata.

TextAudioText To SpeechSpeech SynthesisUrdu LanguageNatural Language ProcessingAudio Corpus+1

0 views

Speech & Audio

DataIndexTTS3: Text-to-Speech Audio Samples

A dataset titled 'dataindextts3' is hosted on Kaggle. The title suggests it contains data related to text-to-speech (TTS) synthesis. No further metadata, such as author, size, or sample details, is provided.

AudioText To SpeechSpeech SynthesisAudio Generation+1

0 views

Speech & Audio

DataIndexTTS4: Text-to-Speech Audio Samples

DataIndexTTS4 is a dataset published on Kaggle. Its title suggests it is related to text-to-speech (TTS) technology. The dataset's specific content, size, and origin require verification after download.

AudioText To SpeechSpeech SynthesisAudio Generation+1

0 views

Speech & Audio

30 Musical Instruments Audio Samples

30 Musical Instruments is a dataset hosted on Kaggle. The title suggests it contains audio samples or information related to a collection of thirty different musical instruments. The dataset's specific content, size, and origin are not detailed in the provided metadata.

AudioMusical Instruments+1

0 views

Speech & Audio

Urdu TTS Corpus Processed - 5: Text and Audio for Speech Synthesis

Urdu TTS Corpus Processed - 5 is a dataset for text-to-speech applications, published on Kaggle. The title suggests it contains processed audio and corresponding text data for the Urdu language. The specific content, scale, and creation details require verification after download.

TextAudioText To SpeechSpeech SynthesisUrduNatural Language ProcessingAudio Corpus+1

0 views

Speech & Audio

VoxCeleb: Speaker Recognition Audio Dataset

VoxCeleb is an audio dataset hosted on Hugging Face. The dataset was uploaded by author N02N9 and was last updated on 2026-02-24. Its specific content, scale, and collection method are not detailed in the provided metadata.

AudioMachine LearningSpeaker VerificationAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Urdu Speech Dataset for ASR Fine-Tuning with 17,476 Processed Samples

17,476 preprocessed Urdu speech samples from Mozilla Common Voice, split into training, validation, and test sets. The dataset is processed for Whisper models, with audio resampled to 16kHz. It was uploaded by khawajaaliarshad and last updated on 2025-12-27.

AudioBenchmarkUrdu LanguageAsr Fine TuningAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Journal of the Musical Arts in Africa: Academic Publications

The Journal of the Musical Arts in Africa is a source of academic publications. It likely contains scholarly articles and research papers on music and related arts from an African context. The dataset is aggregated from the paperswithcode platform.

TextAudio🌍 AfricaArtsGeographyArtAcademic PublicationsMusicalCULTURAL STUDIESVisual ArtsThe Arts+1

0 views

Speech & Audio

Modern TTS Dataset for Speech Synthesis Models

modern-tts-dataset is a dataset for text-to-speech (TTS) research, published on Kaggle. The dataset likely contains audio recordings paired with corresponding text transcripts. Specific details on size, source, and creation date are not provided in the available metadata.

TextAudioText To SpeechSpeech SynthesisAudio Generation+1

0 views

PreviousPage 76 of 130Next