DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,577 datasets

Speech & Audio

AppTek Call-Center Dialogues: 128.6 Hours of Multi-Accent English Speech

128.6 hours of long-form conversational speech data designed as a benchmark for automatic speech recognition. The dataset features diverse English accents across 16 service domains, with conversations lasting 5–15 minutes. It was created by apptek-com and last updated on the Hugging Face platform in April 2026.

AudioCall CenterMulti AccentBenchmarkConversational SpeechLong FormSpeech Recognition+1

0 views

Speech & Audio

Music School No. 3 KCC: Financial Acts and Invoices from 2019 to Present

Financial records from the KZSMO "Music School No. 3" in the KMR region of Ukraine. The dataset contains acts and invoices from 2019 to the present, sourced from the States site of Ukraine. It was last updated on May 6, 2026.

TabularAudioCSVFinancial RecordsUkraineGovernment InvoicesPublic SpendingMusic Education+1

0 views

Speech & Audio

Brisbane Music Events Calendar with Daily Updates

Daily-updated information on music events in Brisbane, sourced from the Trumba Calendar API. The dataset includes details on event dates, costs, booking requirements, venues, and locations, with the external feed limited to the next 1,000 upcoming events.

AudioBandJazzConcertEvent+1

0 views

Speech & Audio

ESC-50: 2,000 Labeled Environmental Audio Clips Across 50 Classes

Kaggle hosts the ESC-50 dataset, a collection of 2,000 labeled environmental sound recordings. The dataset is balanced across 50 distinct sound classes, making it suitable for classification tasks. The original author and specific collection methodology are not detailed in the provided metadata.

AudioMachine LearningEnvironmental SoundAudio Classification+1

0 views

Speech & Audio

Rickettsia Typhi Laboratory and Clinical Strains for Macrolide Gene Analysis

A 9.5 KB Excel spreadsheet summarizing Rickettsia typhi strains, including their origin and references for macrolide target gene analysis. The dataset was authored by Weerawat Phuklia and last updated on April 27, 2026. Its small size suggests a focused collection of bacterial strain metadata.

TabularExcelBacterial StrainsRickettsia TyphiMacrolide ResistanceMicrobiologyHealthcareClinical Isolates+1

0 views

Speech & Audio

Emilia S2S Mimi Q8 Named Speakers: A TinyAya Codec-Tokenized TTS Dataset

A text-to-speech dataset built from TinyAya codec-tokenized examples. The dataset contains named speaker examples for Ira, Aisha, Siya, and Zoya, with training text that includes speaker prefixes. It was created by rumik-ai and last updated on 2026-05-26.

AudioText To SpeechSpeech SynthesisCodec TokensNamed Speakers+1

0 views

Speech & Audio

WildElder: Chinese Elderly Speech Dataset with Fine-Grained Annotations

WildElder is a speech dataset focused on elderly scenarios, containing raw audio and corresponding text annotations. The data was collected and cleaned from real-world environments to preserve diversity. The dataset is authored by Hui519 and was last updated on 2026-05-02.

AudioMultimodalChineseElderlySpeech Recognition+1

0 views

Speech & Audio

Saint Kitts and Nevis Internal Displacement Data from IDMC

The Global Internal Displacement Database (GIDD) from the Internal Displacement Monitoring Centre (IDMC) provides validated annual estimates of internal displacement. This dataset for Saint Kitts and Nevis includes figures for people living in displacement at year-end and counts of new displacement incidents. The data is licensed under CC-BY-3.0-IGO and was last updated on 2026-03-18.

TabularTime SeriesInternally Displaced Persons IdpPopulation FlowCaribbeanHumanitarian CrisisFinanceDisplacementInternal DisplacementConflict DisasterNatural Disasters+1

0 views

Speech & Audio

Egyptian Arabic Speech Dataset with Emotion Labels and Speaker Diarization

An Egyptian Arabic dataset combining text and audio, annotated with emotions and speaker diarization. Created by OmarAhmedSobhy, it is designed for training Text-to-Speech and Automatic Speech Recognition models. The dataset was last updated on May 7, 2026.

AudioMultimodalText To SpeechEgyptian-ArabicSpeech EmotionAudio DiarizationSpeech Recognition+1

0 views

Speech & Audio

Adaption Low Resource Audio: 3,704 Audio-Text Pairs for Underrepresented Languages

Adaption Low Resource Audio is a subset of the PolyglotAudio dataset, remastered with Adaption's Adaptive Data platform. It contains 3,704 rows of paired audio clips and text, spanning 10 languages typically underrepresented in open corpora. The dataset was created by Reubencf and last updated on April 24, 2026.

TabularAudioPolyglot AudioBenchmarkSpeech Model TrainingLow Resource LanguageAudio Text Pairs+1

0 views

Speech & Audio

HanMATE: Plant Protein Sequence Similarity Data

HanMATE proteins are associated with various ASR MATE proteins in other plants based on more than 75% sequence similarity. The dataset was authored by Mohammad Nazmol Hasan and last updated on April 13, 2026. It is a 13.5 KB Excel file available under a CC-BY-4.0 license.

TabularExcelProtein SequencePlant BiologyComparative GenomicsBioinformatics+1

0 views

Speech & Audio

Persian ASR Argilla Review Audio

Reza2kn published a dataset titled 'Persian Asr Argilla Review Audio' on the Hugging Face platform. The dataset appears to contain audio data, likely for Persian automatic speech recognition tasks, given the title. The dataset was last updated on June 12, 2026.

AudioAudio DatasetPersian LanguageArgilla PlatformSpeech Recognition+1

0 views

Speech & Audio

Hindi ASR Benchmark: Speech Recognition Performance Across Six Test Subsets

A benchmark dataset created by SkunkWorkLabs, last updated in May 2026, for evaluating Hindi automatic speech recognition (ASR) systems. It compares the performance of the SkunkWorks model against commercial providers like ElevenLabs, Deepgram, and Sarvam. The evaluation is conducted across six distinct subsets sourced from projects like AI4Bharat Kathbath, Mozilla Common Voice, and Google FLEURS.

TabularAudioHindiBenchmarkAsr EvaluationSpeech Recognition+1

0 views

Speech & Audio

CEAEval-D: Mandarin Speech Expressive Appropriateness in Rich Contexts

CEAEval-D is a Mandarin speech dataset annotated for context-rich expressive appropriateness. It was released by TianRW in association with an ACL paper and is hosted on Hugging Face. The dataset was last updated on May 10, 2026.

TextAudioExpressive AppropriatenessContextual SpeechSpeech EvaluationMandarin Speech+1

0 views

Speech & Audio

ViVoice-34: Vietnamese Speech Recordings from 34 Provinces

ViVoice-34 is a Vietnamese speech dataset featuring audio recordings from speakers across 34 provinces of Vietnam. Each audio sample includes full transcripts and metadata about the speaker and content. The dataset was created by anonymous-vivoice34 and was last updated on Hugging Face in May 2026.

AudioSpeaker DiversityAudio DatasetVietnamese LanguageSpeech Recognition+1

0 views

Speech & Audio

Saint Kitts and Nevis: IFRC Emergency Appeals and DREF Funding

This dataset tracks humanitarian funding and disaster response actions in Saint Kitts and Nevis, managed by the International Federation of Red Cross and Red Crescent Societies (IFRC). It documents Emergency Appeals for large-scale disasters and Disaster Response Emergency Fund (DREF) allocations for smaller crises. The data is provided in CSV format and was last updated in March 2026.

Funding+1

0 views

Speech & Audio

Meddies ASR Raw Audios: Speech Data for Automatic Speech Recognition

Meddies ASR Raw Audios is a collection of audio files hosted on Hugging Face. The dataset was created by the author 'Meddies' and was last updated on June 24, 2026. Its specific content and scale are not detailed in the available metadata.

AudioAudio DataMedical SpeechSpeech Recognition+1

0 views

Speech & Audio

Ttsdistil2: A Distilled Text-to-Speech Model

Ttsdistil2 is a dataset or model artifact for text-to-speech, published on Hugging Face by ShiniChien. The record was last updated on 2026-06-20 14:53:15. Its specific content, size, and structure are not detailed in the available metadata.

AudioText To SpeechDistilled ModelSpeech Synthesis+1

0 views

Speech & Audio

PRMS-Audio: A Dataset for Multimodal Sentiment Analysis

PRMS-Audio Dataset for Multimodal Sentiment Analysis. The dataset is hosted on Kaggle, but details on its creator, size, and collection period are not provided. Its primary purpose is to support research in analyzing sentiment using audio and potentially other modalities.

AudioMultimodalAudio DataSentiment AnalysisMultimodal Sentiment AnalysisNatural Language Processing+1

0 views

Speech & Audio

Khmer YouTube Voice Dataset with 1,000+ Hours of Diarized Speech

Khmer Yt Voice Dataset contains 3,945 YouTube videos totaling 1,033.4 hours of Khmer speech audio. The dataset includes diarization metadata for 16,021 speaker turns with transcripts. It was created by author manhp and last updated on 2026-04-19.

AudioYoutubeKhmer LanguageAudio DiarizationSpeech Recognition+1

0 views

PreviousPage 22 of 129Next