DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,587 datasets

Speech & Audio

NOAA Ichthyoplankton Survey Data for Gulf of Maine (1977-1988)

NOAA's Northeast Fisheries Science Center collected standardized ichthyoplankton survey data from 1977 to 1988 along the continental shelf between Cape Hatteras, NC and Cape Sable, NS. A subset of 6,406 bongo samples from this broader collection of 25,000 samples was used to model abundance and distribution within the Gulf of Maine. The dataset supports studies on fish community structure changes and recruitment mechanisms.

TabularTime SeriesIchthyoplanktonGulf Of MaineSpecies AbundanceFinanceMarine Fisheries+1

0 views

Speech & Audio

VoxCeleb2: Audio-Visual Speaker Recognition Dataset

VoxCeleb2 is a dataset for speaker recognition and audio-visual research, published on Kaggle. The dataset likely contains speech samples from a large number of speakers, potentially sourced from media interviews. Specific details on the number of speakers, utterances, and collection methodology require verification after download.

AudioVideoSpeaker VerificationAudio ProcessingCelebritiesSpeech Recognition+1

0 views

Speech & Audio

Yaninfstep15000jaxdpasrc: JAX-Related Data

Kaggle hosts a dataset titled 'yaninfstep15000jaxdpasrc'. The dataset's content and structure are not described, and its author, size, and update history are unknown. Its origin and collection method are unspecified.

TabularUnknown DomainJaxSTEP+1

0 views

Speech & Audio

TTS-Payload-Latest: Text-to-Speech Audio Data

A dataset related to text-to-speech (TTS) technology, published on Kaggle. The specific content, size, and creation details are not provided in the available metadata. Its structure and intended use are inferred from the title.

AudioText To SpeechSpeech SynthesisAudio Generation+1

0 views

Speech & Audio

Kazakh Speech Corpus with Punctuation-Restored Transcripts

KSC2 Structured is an enhanced version of the Kazakh Speech Corpus 2, providing audio recordings paired with transcripts that have restored punctuation and capitalization. Developed by Inflexion Lab, this dataset addresses the limitation of the original KSC2's plain lowercase transcripts. The dataset page was last updated in March 2026.

TextAudioAudio TranscriptKazakh SpeechNatural Language ProcessingPunctuation RestorationSpeech Recognition+1

0 views

Speech & Audio

Brazilian Portuguese Medical Audio Sample with Five Content Types

A public sample of a Brazilian Portuguese medical audio dataset built for ASR, TTS, and conversational AI evaluation. This repository contains 1 record, 20 aligned audio segments, 1 speaker, and about 5.26 minutes of audio, derived from deidentified clinical source material. The full dataset and commercial licensing are available from juliasdata.com.

AudioMultimodalAUDIOFOLDERTask Categoriestext To SpeechLicenseotherPt BrSpeech SynthesisSize Categoriesn1 KModalitytextLibrarymlcroissantLibrarydatasetsBenchmarkMedical AudioHealthcareModalityvideoRegionusTask Categoriesautomatic Speech RecognitionBrazilian PortugueseMultilingualitymonolingualAutomatic Speech RecognitionMedical+1

0 views

Speech & Audio

TTS Payload Terminal Test

A dataset titled 'tts-payload-terminal-test' published on Kaggle. The title suggests it is likely used for testing text-to-speech systems or payloads. Its specific content, size, and origin are unknown from the provided metadata.

AudioText To SpeechTts TestingAudio Synthesis+1

0 views

Speech & Audio

Geographical Origin of Music with 1059 Traditional Tracks and Audio Features

1059 traditional music tracks from 33 countries or areas, with geographical origin determined by the artist's main residence. Audio features were extracted from wave files using the MARSYAS program, resulting in 116 feature columns plus latitude and longitude targets. The dataset is licensed under CC-BY-4.0.

TabularAudioAudio FeaturesMusic OriginCultural MusicologyGeographic ClassificationWorld Music+1

0 views

Speech & Audio

ASRU Speech Data

Kaggle hosts a dataset titled 'asruaspeech'. The dataset likely contains speech audio samples, potentially related to the Automatic Speech Recognition and Understanding (ASRU) field. Metadata is minimal; actual content requires verification after download.

AudioAsru+1

0 views

Speech & Audio

IndicTTS Bengali: High-Quality Speech Recordings for Text-to-Speech Research

Bengali Indic TTS Dataset contains high-quality speech recordings with corresponding text transcriptions. It is derived from the Indic TTS Database project, specifically using Bengali monolingual recordings from four native speakers. The dataset was authored by Abdullah500 and last updated on 2026-03-28.

TextAudioText To SpeechAudio DatasetBengali LanguageSpeech Synthesis+1

0 views

Speech & Audio

Dahih Tts2 Demucs Cleaned: Processed Audio for Speech Synthesis

A dataset hosted by YomnaGharib on Hugging Face, last updated on 2026-05-11. The title suggests it contains audio data processed using the Demucs source separation tool, likely for text-to-speech (TTS) applications. The specific content, scale, and original source require verification after download.

AudioText To SpeechSpeech SynthesisDemucsAudio Processing+1

0 views

Speech & Audio

YouMe ASR Vosk: Speech Recognition Data

YouMe ASR Vosk is a dataset for automatic speech recognition (ASR) tasks, likely containing audio samples and transcriptions. It is hosted on Kaggle, but the specific content, size, and creation details are not provided in the metadata. The dataset's purpose is inferred from its title and platform.

AudioSpeech DataVosk ModelAutomatic Speech Recognition+1

0 views

Speech & Audio

Music Playlist Data from Kaggle

Music Playlist is a dataset hosted on the Kaggle platform. The dataset's specific content, size, and structure are not detailed in the available metadata. Its origin and creation details are also unspecified.

TabularAudioAudio ContentPlaylist+1

0 views

Speech & Audio

Glasno Vosk TTS Runtime: Text-to-Speech Synthesis Data

Glasno Vosk TTS Runtime is a dataset hosted on Kaggle. The title suggests it contains data related to text-to-speech synthesis, likely for runtime or inference purposes. Specific details regarding its contents, size, and authorship are not provided in the available metadata.

AudioText To SpeechMachine LearningAudio Synthesis+1

0 views

Speech & Audio

Lwazi Afrikaans TTS Corpus: Phonetically Balanced Speech

Phonetically balanced sentences from reference texts were recorded in a studio environment. The dataset contains orthographic transcriptions and phonemically aligned transcriptions in TextGrid format, paired with 16 KHz, 16-bit WAV audio files. This resource is designed for speech synthesis and natural language processing research.

TextAudioSpeech SynthesisPhonetic TranscriptionNatural Language ProcessingAfrikaansStudio Recording+1

0 views

Speech & Audio

Lwazi Afrikaans ASR Corpus: Telephone Speech and Transcriptions

Lwazi Afrikaans ASR corpus provides matched audio recordings and orthographic transcriptions designed for speech recognition systems. Audio files are telephone-quality, recorded at 8 KHz, 16-bit, and single-channel, with each utterance stored in a separate text file. This dataset was created to support the development of Automatic Speech Recognition (ASR) for the Afrikaans language.

TextAudioTranscriptionSouth AfricaNatural Language ProcessingAfrikaansSpeech RecognitionAutomatic Speech Recognition+1

0 views

Speech & Audio

British Musical Theatre Productions from 2010 to 2019

British musical theatre productions from the 2010s are documented in this dataset collated by Sarah K. Whitfield and Clare Chandler. It covers a ten-year period from 2010 to 2019. The dataset is hosted on figshare and was last updated in April 2026.

TabularBritish CultureTheatre ProductionPerforming ArtsMusical Theatre+1

0 views

Speech & Audio

Khmer ASR Cultural Dataset

A Khmer language dataset likely containing speech or audio data for cultural applications. It is published on HuggingFace by author rinabuoy and was last updated on 2026-05-01 09:31:45.

AudioKhmer LanguageCultural DataSpeech Recognition+1

0 views

Speech & Audio

NZQA Music Producer Golden Dataset

New Zealand Qualification Authority data likely concerning music producer qualifications. The dataset is published on Kaggle, but its specific contents, size, and creation details are not provided in the available metadata.

TabularAudioEducation AssessmentNew ZealandMusic Production+1

0 views

Speech & Audio

Persian TTS Benchmark Dataset

A benchmark dataset for Persian text-to-speech (TTS) systems, published on Kaggle. The dataset likely contains audio samples and corresponding text transcripts for evaluating and comparing TTS models. Specific details on size, collection method, and temporal coverage are unavailable from the provided metadata.

AudioSpeech SynthesisPersian LanguageBenchmarkAudio Processing+1

0 views

PreviousPage 40 of 130Next