DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

ASR New: Audio Data for Automatic Speech Recognition

ASR new is a dataset published on Kaggle. The title suggests it contains audio data for training or evaluating automatic speech recognition systems. The dataset's specific content, size, and origin require verification after download.

AudioSpeech DataAudio ProcessingAutomatic Speech Recognition+1

0 views

Speech & Audio

ASR Arabic Checkpoints: Automatic Speech Recognition Models

ASR Arabic checkpoints likely contain pre-trained model weights for Arabic automatic speech recognition. The dataset is published on Kaggle, but its specific size, creation date, and author are unknown. Its content suggests it is intended for developers working on Arabic speech technology.

AudioMachine LearningCheckpointsSpeech Recognition+1

0 views

Speech & Audio

GigaMIDI: 2.1 Million MIDI Files with Expressive Loop Annotations

Metacreation released this collection of 2.1 million unique MIDI files in 2026 for symbolic music research. The dataset features detailed annotations for expressive loop detection, incorporating performance nuances such as microtiming and dynamics.

ParquetSource DatasetsoriginalLibrarypolarsLibrarydaskSize Categories1 Mn10 MModalitytimeseriesModalitytextModalitytabularLibrarymlcroissantLibrarydatasetsRegionus+1

0 views

Speech & Audio

Philippine English Pharmacy Call Recordings for Medical Speech AI

Real-world Philippine English pharmacy calls for medical speech AI training. The dataset appears to consist of audio recordings from pharmacy interactions. Specific details on size, collection date, and creator are not provided in the input.

AudioPhilippine EnglishMedical SpeechHealthcarePharmacy CallsSpeech RecognitionHealthcare Ai+1

0 views

Speech & Audio

Music Audio and Text Dataset

A dataset containing audio and text data, hosted on Hugging Face by author NjNBrl. It was last updated in March 2026. Specific content details such as genre, instruments, or recording sources are not provided.

TextAudioSize Categoriesn1 KModalitytextLibrarymlcroissantLibrarydatasetsRegionus+1

0 views

Speech & Audio

YodaLingua-Swedish: 112 Hours of Swedish Speech for TTS and ASR

Swedish speech data containing 43,048 audio-transcription pairs totaling 112 hours of audio from 1,946 distinct speakers. The YodaLingua-Swedish dataset, created by Thomcles, is designed for training text-to-speech and automatic speech recognition systems. It was last updated on the Hugging Face platform in January 2026.

AudioMultimodalMultilingualText To SpeechMultilingual SpeechAudio TranscriptionSwedish Language+1

0 views

Speech & Audio

Spanish Speech Data for Medical Telemarketing AI Training

High-quality Spanish speech data is available for training AI models in medical telemarketing contexts. The dataset is hosted on Kaggle, but its creator, size, and specific recording details are not provided. Its primary purpose is to support the development of speech recognition and synthesis systems for a specific commercial domain.

AudioSpanish-languageHealthcareTelemarketingMedical+1

0 views

Speech & Audio

Wenetspeech Wu: A Chinese Speech Recognition Benchmark

Wenetspeech Wu Asr Bench is a dataset for automatic speech recognition, likely containing audio and corresponding transcriptions. It is hosted on the Hugging Face platform by the author 'yuekai' and was last updated on March 17, 2026. The dataset's specific size, format, and detailed content are not provided in the available metadata.

TextAudioBenchmarkChinese LanguageAudio ProcessingSpeech RecognitionAutomatic Speech Recognition+1

0 views

Speech & Audio

Music and Sound Data for AI Applications

A dataset titled 'Musicsoundai' is hosted on Kaggle. Its content likely pertains to music or audio signals for artificial intelligence tasks. The dataset's specific contents, scale, and authorship are unknown due to minimal metadata.

AudioMachine LearningArtificial Intelligence+1

0 views

Speech & Audio

Kumush TTS Dataset Kitobiy: Uzbek Speech Synthesis

Kumush TTS Dataset Kitobiy is a speech synthesis dataset published on HuggingFace by YMA-MamunAI. The dataset title suggests it contains audio data for text-to-speech, likely in the Uzbek language. It was last updated on March 19, 2026.

AudioText To SpeechSpeech SynthesisUzbek Language+1

0 views

Speech & Audio

Italian Speech Audio Dataset of General Utterances

An audio dataset of general utterances spoken by Italian speakers from Italy. The dataset's author, organization, size, and specific recording details are not provided in the available metadata. Further information regarding the number of speakers, audio length, and collection methodology is unknown.

AudioUtterancesItalian Language+1

0 views

Speech & Audio

ASVspoof 2015: Genuine and Spoofed Speech for Speaker Verification Security

106 speakers (45 male, 61 female) contributed genuine speech recordings with minimal channel or background noise. The database includes spoofed speech generated from the genuine data using several different spoofing algorithms. It is partitioned into training, development, and evaluation subsets for use in the ASVspoof 2015 challenge.

AudioSpeaker VerificationComputer ScienceAudio DatabaseBenchmarkDatabaseSpoofing AttackSpeech RecognitionSpeaker RecognitionSyntheticComputer Security+1

0 views

Speech & Audio

Bengali Noisy and Clean Paired Audio for Speech Enhancement

A Bengali language dataset containing paired noisy and clean audio files. The data is described as ready for machine learning tasks. The dataset was sourced from Kaggle, but details on its creator, size, and update date are unavailable.

AudioBengali LanguageSpeech EnhancementNoisy AudioAudio Processing+1

0 views

Speech & Audio

German Medical Speech Dataset for Positivity Tips

German-language audio recordings of patient-doctor interactions. The dataset is described as being for generating positivity tips. It was sourced from Kaggle, but details on its size, creation date, and authorship are unknown.

AudioGerman LanguageMedical SpeechHealthcareDoctor PatientPositivity+1

0 views

Speech & Audio

YodaLingua Croatian: 11 Hours of Speech for Text-to-Speech and ASR Models

YodaLingua-Croatian is a speech dataset by Thomcles containing 5,655 audio-transcription pairs. The collection totals 11 hours of Croatian speech from 230 distinct speakers, last updated in January 2026. It is designed for training text-to-speech and automatic speech recognition systems.

TextAudioMultilingualParquetSize Categories1 Kn10 KText To SpeechTask Categoriestext To SpeechLibrarypolarsTask Categoriesvoice Activity DetectionLanguagehrTask Categoriesaudio To AudioModalitytextCroatianLibrarymlcroissantTask Categoriesaudio ClassificationTask Categoriestext To AudioMultilingual SpeechLibrarydatasetsLibrarypandasLicensecc By 40Croatian LanguageRegionusTask Categoriesautomatic Speech RecognitionSpeech RecognitionAudio Text Pairs+1

0 views

Speech & Audio

Malay Speech Audio Dataset of General Utterances

An audio dataset featuring general utterances spoken by Malay speakers from Malaysia. The dataset is hosted on Kaggle, but specific details on size, collection method, and licensing are not provided. The original author and organization are unknown.

AudioUtterancesMalay Language+1

0 views

Speech & Audio

Musical Instruments Audio Dataset

MusicalInstruments-dataset is a collection of audio data related to musical instruments, published on Kaggle. The dataset's specific contents, such as the number of samples, recording conditions, and instrument types, are not detailed in the available metadata. Users must download the dataset to verify its scale, format, and suitability for their projects.

AudioMusic Information RetrievalAudio ClassificationMusical Instruments+1

0 views

Speech & Audio

Algerian Arabic Customer Speech Dataset

Algerian Arabic speech recordings likely contain general conversation and customer service interactions. The dataset appears to be sourced from Kaggle, but its size, author, and update date are unknown. Its specific collection method and time range are not provided.

AudioCustomer ServiceAlgerian ArabicConversation+1

0 views

Speech & Audio

ASR Librispeech Subset: Speech Audio for Automatic Speech Recognition

A subset of the Librispeech corpus, published on huggingface by sahara22 and last updated on 2026-03-22. The dataset likely contains audio files and corresponding transcriptions for training and evaluating automatic speech recognition models. Its specific size, format, and licensing details are not provided in the available metadata.

AudioAudio DataRegionusLibrispeechSpeech RecognitionAutomatic Speech Recognition+1

0 views

Speech & Audio

Fusha Arabic General Conversation Speech Data for Customer Service

Fusha Arabic speech data for general conversation scenarios, likely related to customer service interactions. The dataset is hosted on Kaggle and includes platform tags indicating its use for speech recognition. Specific details on volume, collection method, and recency are not provided in the input.

AudioCustomer ServiceArabic LanguageSpeech Recognition+1

0 views

PreviousPage 69 of 130Next