DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

Betterset: Russian-Language Audio Samples for Speech Tasks

A high-quality Russian-language audio dataset for Text-to-Speech and Automatic Speech Recognition tasks. It contains 17,670 audio samples totaling 39 hours, 35 minutes, and 29 seconds of speech, processed using a modern pipeline by TeraTTS. The dataset was last updated on February 27, 2026.

AudioAudio DatasetRussian LanguageSpeech SynthesisSpeech Recognition+1

0 views

Speech & Audio

HattSet-12: Arabic Calligraphy Style Dataset

Arabic calligraphy styles likely collected for machine learning applications. The dataset is published on Kaggle. Its specific size, creation date, and author are unknown.

ImageFont StyleArabic CalligraphyCultural Heritage+1

0 views

Speech & Audio

Ewe Bible TTS 50: Text-to-Speech Audio for the Bible in Ewe

A text-to-speech dataset for the Bible in the Ewe language, likely containing 50 audio files or chapters. It was published by the Ghana NLP Community on the Hugging Face platform and was last updated on April 10, 2026. The dataset's primary purpose appears to be generating spoken audio from biblical text.

TextAudioText To SpeechEwe LanguageBibleAudio Synthesis+1

0 views

Speech & Audio

F5-TTS: Gujarati and Malayalam Text-to-Speech Data

F5-TTS_Guj_Malyalam is a dataset published on Kaggle. The title suggests it contains audio data for text-to-speech synthesis in the Gujarati and Malayalam languages. The dataset's specific content, size, and collection details are unknown from the provided metadata.

AudioText To SpeechAudio DataMalayalamSpeech SynthesisGujarati+1

0 views

Speech & Audio

Nemo ASR Wheels: Speech Recognition Model Artifacts

nemo-asr-wheels is a dataset published on Kaggle. The title suggests it contains artifacts related to the Nvidia NeMo automatic speech recognition toolkit, likely including pre-built wheels or model files. The dataset's specific content, size, and origin are not detailed in the provided metadata.

AudioMachine LearningSpeech ProcessingAudio ProcessingAutomatic Speech Recognition+1

0 views

Speech & Audio

jp-asr-eval-data: Japanese Automatic Speech Recognition Evaluation Data

jp-asr-eval-data is a dataset for evaluating Automatic Speech Recognition (ASR) systems on Japanese language audio. Published on Kaggle, its specific size, creation date, and author are unknown. The dataset likely contains audio files paired with transcriptions for performance benchmarking.

TabularAudioEvaluation DataJapanese LanguageSpeech Recognition+1

0 views

Speech & Audio

F5-TTS Tele Kannada SD: Kannada Speech Synthesis Dataset

F5-TTS_Tele_Kannada_SD is a dataset hosted on Kaggle. The title suggests it contains data for text-to-speech synthesis in the Kannada language, likely including audio recordings and corresponding text transcripts. No further metadata about its size, origin, or structure is provided.

AudioText To SpeechAudio DatasetSpeech SynthesisKannada+1

0 views

Speech & Audio

Librispeech Synth 300h: Synthetic Speech Audio

Librispeech Synth 300h max 5spks is a speech audio dataset published on Kaggle. The title suggests it contains synthetic speech audio derived from the LibriSpeech corpus, likely comprising up to 300 hours of audio from a maximum of five speakers. The specific source, creation method, and exact content require verification after download.

AudioMachine LearningSpeech SynthesisAudio Processing+1

0 views

Speech & Audio

F5-TTS_Bengali_SD: Bengali Speech Synthesis Data

A dataset for text-to-speech synthesis in the Bengali language, published on Kaggle. The specific data volume, collection method, and temporal coverage are unknown. The dataset likely contains audio samples and corresponding text transcripts.

AudioText To SpeechBengali LanguageSpeech Synthesis+1

0 views

Speech & Audio

F5-TTS_Hindi_SD: Hindi Speech Synthesis Audio Samples

A Kaggle-hosted dataset with the title 'F5-TTS_Hindi_SD'. The title suggests it contains audio data for Hindi text-to-speech synthesis, potentially including a standard definition (SD) version. Platform tags indicate it may also relate to Punjabi language and audio generation. The dataset's author, size, and specific contents are unknown.

AudioText To SpeechHindi LanguagePunjabi LanguageSpeech SynthesisAudio Generation+1

0 views

Speech & Audio

F5-TTS_Tamil_SD: Tamil Speech Synthesis Data

F5-TTS_Tamil_SD is a dataset published on Kaggle. The title suggests it contains data for Tamil text-to-speech synthesis. The dataset's specific size, origin, and update date are unknown.

AudioText To SpeechSpeech SynthesisAudio GenerationTamil+1

0 views

Speech & Audio

Massachusetts Corporate Accounting Practices 1870-1895

This dataset supports a study on the adoption of double-entry bookkeeping and depreciation accounting by Massachusetts corporations from 1875 to 1895. It contains data used to estimate that 60% of firms balanced returns in 1875, rising to over 96% by 1895. The proportion considering depreciation increased from 18% to 24% over the same period.

0 views

Speech & Audio

F5-TTS_Hindi_SD: Hindi Text-to-Speech Audio Samples

F5-TTS_Hindi_SD is a dataset published on Kaggle. The title suggests it contains audio data for Hindi text-to-speech synthesis. Metadata is minimal; the specific content, size, and creation details require verification after download.

AudioHindiText To SpeechSpeech SynthesisAudio Generation+1

0 views

Speech & Audio

Wake Word Akylai: Audio Samples for Keyword Spotting

Wake Word Akylai is a dataset published on huggingface by the-cramer-project. It likely contains audio samples for training and evaluating wake-word or keyword-spotting models. The dataset was last updated on April 8, 2026.

AudioAudio ClassificationSpeech RecognitionWake Word+1

0 views

Speech & Audio

Darija ASR: Checkpoints for Moroccan Arabic Speech Recognition

Darija ASR checkpoints likely contain model weights for a speech recognition system trained on Moroccan Arabic dialect. The dataset is hosted on Kaggle, a platform for sharing data and machine learning models. Specific details on the data size, collection method, and creators are not provided in the available metadata.

AudioDarijaArabic DialectsSpeech ProcessingAutomatic Speech Recognition+1

0 views

Speech & Audio

StoryNiche Voice Library: Reference Voices for Text-to-Speech

A library of reference audio voices for text-to-speech applications, published on Kaggle. The dataset is associated with the StoryNiche platform and is intended for use in Kaggle TTS (Text-to-Speech) tasks. Specific details on the number of voices, audio characteristics, and collection methodology are not provided in the available metadata.

AudioText To SpeechVoice SynthesisAudio Library+1

0 views

Speech & Audio

TIMIT: Acoustic-Phonetic Continuous Speech Corpus with 630 Speakers

The TIMIT corpus provides broadband recordings of 630 speakers from eight major American English dialects, each reading ten phonetically rich sentences. It was created through a joint effort by MIT, SRI International, and Texas Instruments, with recordings made at TI and transcriptions verified at MIT and NIST. The corpus includes time-aligned orthographic, phonetic, and word transcriptions alongside 16-bit, 16kHz speech waveform files for each utterance.

TextTabularAudioSpeech TechnologyTimitSpeech SynthesisComputer ScienceSpeech CorpusBenchmarkNistArtificial IntelligenceSpeech ProcessingDialect StudyUtteranceNatural Language ProcessingHidden Markov ModelAcoustic PhoneticSpeech Recognition+1

0 views

Speech & Audio

Tuananh Music: Audio Data for Music Analysis

A music dataset published on Kaggle by a user named Tuananh. The dataset's specific content, size, and collection method are not detailed in the provided metadata. Its title suggests it contains audio data or related features for music analysis tasks.

AudioMusic Information Retrieval+1

0 views

Speech & Audio

Music Clips from Sub-Genres for Audio Model Fine-Tuning

Music sub-genres is a collection of audio clips from various music sub-genres. The dataset is intended for fine-tuning audio models, as described on Kaggle. Details regarding its size, creator, and update history are not provided.

AudioMachine LearningAudio DataAudio ClassificationMusic Genres+1

0 views

Speech & Audio

African Speech Dataset Som: Speech Audio Collection

A speech audio dataset with content likely related to African languages or contexts. It was published on the Hugging Face platform by the author 'amanuelbyte'. The dataset's record was last updated on March 31, 2026.

AudioAfrican LanguagesSpeech Recognition+1

0 views

PreviousPage 62 of 130Next