DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

Nhom_05_SGU26_Challenge: Music Genre Classification

A dataset for music genre classification, likely containing audio files or features for a machine learning challenge. It was published on the Kaggle platform. The specific collection method, size, and temporal coverage are not detailed in the available metadata.

AudioMachine LearningAudio ClassificationMusic Genre+1

0 views

Speech & Audio

Indian Language Speech Recordings with Transcripts for Telugu Tamil Gujarati

Featuring conversational and phrasal speech training and test data for the Telugu, Tamil, and Gujarati languages. Each entry includes an audio recording and its corresponding transcript, provided by Microsoft and SpeechOcean.com for research purposes.

ParquetLibrarypolarsLibrarydaskModalitytextSize Categories100 Kn1 MLibrarymlcroissantLibrarydatasetsLanguageteRegionusTask Categoriesautomatic Speech Recognition+1

0 views

Speech & Audio

Speakers_xtts: Text-to-Speech Audio Samples

Speakers_xtts is a dataset hosted on Kaggle. Its title suggests it contains audio data related to speech synthesis, likely for text-to-speech applications. The dataset's specific content, scale, and origin are not detailed in the available metadata.

AudioText To SpeechSpeech SynthesisAudio Generation+1

0 views

Speech & Audio

speakers_xttsv3: Text-to-Speech Voice Samples

speakers_xttsv3 is a dataset hosted on Kaggle. The title suggests it contains audio samples for text-to-speech applications. The dataset's author, organization, and specific content details are unknown.

AudioText To SpeechSpeech SynthesisVoice Cloning+1

0 views

Speech & Audio

Hatrang Voice 4H: Vietnamese Speech Dataset for TTS Model Fine-Tuning

A Vietnamese text-to-speech dataset containing 1,805 paired audio recordings and text transcriptions for fine-tuning VieNeu-TTS models. The dataset was created by author 'quocs' and last updated on February 10, 2026. Audio files are in WAV format at 24kHz, mono, with 16-bit PCM encoding.

TextAudioAUDIOFOLDERSize Categories1 Kn10 KText To SpeechTask Categoriestext To SpeechSpeech DataSpeech SynthesisModalitytextLibrarymlcroissantVietnameseLibrarydatasetsRegionusAudio TranscriptionNeucodecTask Categoriesautomatic Speech RecognitionLanguagevi+1

0 views

Speech & Audio

Local Business Impact Index for Independent Music Venue Zones in the U.S.

Packed with an index analyzing the impact of 109 independent music venue zones on 4,190 local businesses across the United States. It was created by Stanislas Renard to measure how these zones reinforce local economic resilience, with approximately 95% of surrounding businesses being locally owned. The data categorizes establishments by type and distinguishes between total business impact and specific local business impact.

Arts And HumanitiesBusiness and ManagementIndependent Music Venues Music Zones Local Busines+1

0 views

Speech & Audio

Music Foundry Phase 0: Vocal and Instrumental Stems

Phase 0 UVR5/Demucs vocal + instrumental stems from the Music Foundry project. The dataset likely contains separated audio tracks for music source separation tasks. It is hosted on Kaggle, but details on size, format, and creation date are unspecified.

AudioInstrumentalAudio StemsVocal Isolation+1

0 views

Speech & Audio

Audio Classification Dataset for Machine Learning

An audio classification dataset published on Kaggle. The dataset likely contains audio samples with associated labels for classification tasks. Specific details on size, source, and creation date are not provided in the available metadata.

AudioSound AnalysisMachine LearningAudio Classification+1

0 views

Speech & Audio

Hausa TTS Dataset

A speech synthesis dataset for the Hausa language. It was published on Kaggle, but the author, organization, and creation date are unknown. The dataset's size, specific content, and structure are not detailed in the available metadata.

AudioText To SpeechSpeech Synthesis+1

0 views

Speech & Audio

bn-bd-tts: Bengali Text-to-Speech Audio Data

bn-bd-tts is a dataset hosted on Kaggle. The title suggests it contains data for Bengali text-to-speech synthesis, likely including audio recordings and corresponding text transcripts. Specific details on volume, creator, and update history are not provided in the available metadata.

AudioText To SpeechBengali LanguageSpeech SynthesisAudio Generation+1

0 views

Speech & Audio

EMOPIA: Multi-Modal Pop Piano Clips with Emotion Labels

EMOPIA is a dataset of 1,087 pop piano music clips from 387 songs, annotated with clip-level emotion labels by four dedicated annotators. It was created by researchers including Hsiao-Tzu Hung from Academia Sinica and presented at ISMIR in 2021. The dataset includes multi-modal data in audio and MIDI formats.

AudioMultimodalModalComputer SciencePsychologyMusic EmotionMidiArt HistoryPiano MusicEmotion RecognitionCognitive psychologyCommunicationChemistryPianoArtSpeech Recognition+1

0 views

Speech & Audio

Khmer Speech News Audio and Transcripts from WMC

A collection of Khmer speech audio files and corresponding transcripts sourced from the Women's Media Centre of Cambodia (WMC) website. The dataset is prepared for machine learning tasks, with scripts provided to process audio and metadata into Parquet files. It was created by user 'vichetkao' and last updated on February 21, 2026.

AudioKhmer LanguageNews AudioSpeech Recognition+1

0 views

Speech & Audio

Issai Kazakhtts2: Kazakh Speech Audio Dataset

A Kazakh speech audio dataset published on the Hugging Face platform by the organization ai4kazakh. The dataset was last updated on March 30, 2026. The specific content, size, and collection methodology are not detailed in the available metadata.

AudioAudio CorpusKazakh Language+1

0 views

Speech & Audio

XTTS Checkpoint 3000: A Text-to-Speech Model Checkpoint

XTTS Checkpoint 3000 is a dataset published on Kaggle. The title suggests it contains a checkpoint for an XTTS (text-to-speech) model, likely used for speech synthesis tasks. The specific content, size, and origin of the checkpoint require verification after download.

AudioText To SpeechSpeech SynthesisAi Checkpoint+1

0 views

Speech & Audio

Coastal Hiking Trail Maps for Massachusetts Preserved Lands

Trail map locations for select preserved lands along the Massachusetts coast. The dataset is provided by the organization SCIOPS via the NASA Earthdata platform.

GeospatialGeospatial LocationsPreserved LandsHikingCoastal Trails+1

0 views

Speech & Audio

Sea Floor Geologic Map of Western Massachusetts Bay from Sonar and Samples

A geologic map characterizes the sea floor of Western Massachusetts Bay. It was constructed by the CEOS_EXTRA organization using sidescan-sonar imagery, photography, and sediment samples. The temporal coverage and specific data volume are not provided.

GeospatialMultimodalSea Floor GeologyMassachusetts BayMarine GeologySidescan SonarSediment Analysis+1

0 views

Speech & Audio

UK Live Music Blog And Guides Corpus

71 articles and 474 FAQs comprise this text corpus focused on UK live music. Published on Kaggle, the dataset likely contains blog posts and guides related to music events. The raw description indicates a total of 174,000 words across the collection.

TextAudio🇬🇧 United KingdomBlogLive MusicNatural Language ProcessingFaqText Corpus+1

0 views

Speech & Audio

Sea Ice Algae Samples from Antarctic Vestfold Hills Lakes

Samples collected from Ace, Watts, Dingle, and Williams Lakes in Antarctica in 1994. Three to five 13 cm diameter ice cores were taken from each lake, sectioned at 20 cm intervals, and examined under a microscope at Davis station. No living cells were found, and as a result nothing was published.

TabularAlgaeSea iceMicrobiologyPolar ResearchAntarctic Lakes+1

0 views

Speech & Audio

Tree Ring Chronology from Whangarei, New Zealand (384-239 BP)

Calendar years 384 to 239 before present (BP) of tree-ring width measurements from the Coutts Chemist Building site in Whangarei, New Zealand. This paleoclimatology dataset is part of the NOAA NCEI World Data Service for Paleoclimatology archive, contributed by the NOAA National Centers for Environmental Information.

Time SeriesGeospatialEnvironmental scienceTree RingPaleoclimatologyNew Zealand+1

0 views

Speech & Audio

Shrimp_counter_TTS_2.0: Annotated Shrimp Images for Object Detection

5 college students spent 2 months annotating shrimps for use with the YOLO26 object detection model. The dataset is designed for computer vision tasks related to counting and detection. Its specific scale and annotation methodology are detailed in the provided description.

ImageYoloShrimp CountingComputer VisionObject Detection+1

0 views

PreviousPage 63 of 130Next