DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

Indotts: Audio Dataset

An audio dataset published on Kaggle. The dataset is associated with platform tags for audio and 'Indotts'. Specific details regarding its size, content, and creation are not provided in the available metadata.

AudioIndotts+1

0 views

Speech & Audio

French Math ASR Benchmark by Lexia-Labs

Lexia-Labs published this French-language benchmark for automatic speech recognition on Hugging Face in February 2026. The dataset likely contains audio recordings of mathematical speech for evaluating ASR systems. Its specific content, size, and structure require verification after download.

AudioMathematicsBenchmarkFrench LanguageAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

L2 Librittsr: Speech Recognition and Synthesis Dataset

L2 Librittsr is a speech dataset published on huggingface by Piping. The dataset's title and platform tags suggest it contains audio and text data, likely for speech recognition or text-to-speech tasks. Its last recorded update was on 2026-02-13.

TextAudioParquetText To SpeechLibrarypolarsLibrarydaskSpeech SynthesisModalitytextSize Categories100 Kn1 MLibrarymlcroissantLibrarydatasetsRegionusAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Voxenes-2026: Bilingual Benchmark for Audio Deepfake Detection

A bilingual benchmark dataset for evaluating audio deepfake detectors against text-to-speech and voice conversion systems. The dataset's author, organization, and specific scale are not provided in the metadata. It was sourced from the Kaggle platform, but the last update date is unknown.

AudioText To SpeechAudio Deepfake DetectionSpeech SynthesisAudio ClassificationBenchmarkLinguisticsBilingual Benchmark+1

0 views

Speech & Audio

NASA Aviation Safety Reports (ASRS) from 2005 to 2025

111,000 aviation incident reports collected by NASA's Aviation Safety Reporting System (ASRS) between 2005 and 2025. The dataset likely contains narrative descriptions of safety events and anomalies. The raw description indicates it is sourced from NASA.

TextTime SeriesIncident ReportsTime Series AnalysisNatural Language ProcessingPublic SafetyAviationAviation Safety+1

0 views

Speech & Audio

Turkish Music Emotion Dataset from UCI

UCI hosts the Turkish Music Emotion dataset, which contains audio recordings and extracted features for emotion analysis. The dataset's specific size, creator, and creation date are not provided in the available metadata. It is designed for computational analysis of emotional content in music.

TabularAudioAffective ComputingAudio FeaturesMusic EmotionTurkish Music+1

0 views

Speech & Audio

Free Music Archive Audio Features and Metadata

FMA is a dataset for music analysis, containing audio tracks and associated metadata. It includes features for music genre classification, audio analysis, and music information retrieval. The dataset was created by researchers and is hosted on the UCI Machine Learning Repository.

TabularAudioMusic AnalysisAudio FeaturesMusic GenresMusic Metadata+1

0 views

Speech & Audio

Geographic Origin Prediction from Musical Audio Features

Geographical Origin of Music is a dataset from the UCI Machine Learning Repository for predicting the geographic origin of music recordings. It contains audio features extracted from songs, likely for classification tasks. The original creator and specific collection date are not provided.

TabularAudioGeographic DataAudio FeaturesMusic OriginCultural Analysis+1

0 views

Speech & Audio

Chichewa Customer Speech Audio for Language AI

Featuring audio speech recordings for the Chichewa language, intended for training language AI models. The specific row count, column structure, and recording details are not provided in the input.

AudioComputer ScienceProgramming+1

0 views

Speech & Audio

Egyptian Arabic Customer Speech Audio Collection

Encompassing audio recordings of general conversation customer speech in Egyptian Arabic. The specific number of recordings, duration, and features are not detailed in the input.

AudioArabicComputer ScienceProgramming+1

0 views

Speech & Audio

Myanmar Speech Dataset for Automatic Speech Recognition

A combined collection of Myanmar language speech data from three sources for ASR tasks. The dataset merges the Myanmar Speech Dataset from Google Fleurs, OpenSLR-80, and a third audio-transcription repository. It was created by chuuhtetnaing and last updated on Hugging Face in December 2025.

AudioAudio DataMyanmar LanguageSpeech Recognition+1

0 views

Speech & Audio

Common Voice 18 Arabic: Speech Corpus for Automatic Speech Recognition

An unofficial Arabic-only extraction of Mozilla Common Voice Corpus 18.0, prepared for Automatic Speech Recognition research. The dataset was created by MohamedRashad and last updated on 2025-12-27. It is derived from the original Common Voice 18 release, filtered to include only Arabic speech data while preserving the original dataset structure, splits, and metadata fields.

AudioArabicMachine LearningNatural Language ProcessingSpeech Recognition+1

0 views

Speech & Audio

Turkish Speech Dataset with 41,427 Audio Segments and Emotion Labels

A merged speech dataset containing 41,427 audio segments from 88 original source datasets. The collection includes 222 speakers and features transcriptions and emotion labels for neutral, angry, sad, and happy speech. It was created by umutkkgz and last updated on Hugging Face in December 2025.

TextAudioMultilingualSpeaker IdentificationEmotion Recognition+1

0 views

Speech & Audio

MAC-SLU: Multi-Intent Spoken Language Understanding Commands for Automotive Cabins

MAC-SLU is a benchmark dataset designed to evaluate Spoken Language Understanding systems on complex, multi-intent user commands within an automotive environment. It addresses limitations in diversity and complexity found in existing SLU datasets. The dataset, created by author Gatsby1984, was last updated on the Hugging Face platform in December 2025.

TextAudioJSONSize Categories10 Kn100 KLibrarypolarsSpoken Language UnderstandingLanguageenModalitytextLibrarymlcroissantTask Categoriesaudio Text To TextLibrarydatasetsBenchmarkLibrarypandasRegionusAutomotiveMulti IntentArxiv251201603Licenseapache 20+1

0 views

Speech & Audio

Piper Zholi Zh Cn Checkpoints: Chinese TTS Model for Character Voice

A Chinese text-to-speech model trained to replicate the voice of the game character 'Zhongli'. The model was trained for approximately 48 hours over 150 epochs using about 45 hours of generated speech data cloned from the AISHELL-3 dataset. Created by LF83 and last updated on December 15, 2025, it is built on the Piper zh/zh_CN/huayan/medium foundation model.

AudioText To SpeechCharacter VoiceAi TrainingChinese LanguageVoice Synthesis+1

0 views

Speech & Audio

FLEURS Arabic–Egyptian Edition: Egyptian Arabic Speech Data for ASR

An unofficial, language-specific subset of the FLEURS dataset, last updated on 2025-12-27. The dataset is focused on Arabic (Egyptian) speech data and is designed for Automatic Speech Recognition (ASR) research and evaluation. It was created by MohamedRashad and follows the original FLEURS structure while being packaged as a standalone Arabic-focused dataset.

AudioAudio DataArabic LanguageBenchmarkEgyptian DialectSpeech Recognition+1

0 views

Speech & Audio

SOREVA Multilingual Speech Dataset for African Languages

SOREVA is a multilingual speech dataset designed for evaluating text-to-speech and speech representation models. It contains approximately 150 audio and transcription samples for each of 49 African languages and dialects. The dataset was created by OlameMend and last updated in December 2025.

AudioMultimodalMultilingualMultilingual EvaluationSpeech SynthesisLow Resource NlpBenchmarkAfrican Languages+1

0 views

Speech & Audio

Billboard Hot 100 & more

340,000+ weekly chart entries documenting every song on the Billboard Hot 100 from August 1958 through the current year. The records categorize music performance by rank, artist, and song columns across more than six decades of US music history.

TabularAudio🌍 GlobalCategoricalArts And Entertainment+1

0 views

Speech & Audio

Omnilingual ASR Corpus: Spontaneous Speech for 348 Under-Served Languages

348 under-served languages are represented in this collection of spontaneous speech recordings and transcriptions. The corpus was collected by Meta FAIR's Omnilingual ASR project for training automatic speech recognition and spoken language identification models. It was last updated on the Hugging Face platform in December 2025.

AudioMultilingualNatural Language ProcessingAudio TranscriptionSpeech RecognitionLow Resource Languages+1

0 views

Speech & Audio

Aria MIDI Collection of Solo Piano Recordings

Aria-MIDI contains 1,186,253 MIDI files representing approximately 100,629 hours of transcribed solo-piano music. The dataset was created by author loubb and includes metadata categories such as genre, composer, performer, and compositional identifiers. It was last updated on December 14, 2025.

AudioTask Categoriesaudio To AudioSize Categories1 Mn10 MMusic Information RetrievalMusic GenerationMidiPiano MusicClassificationRegionusPiano+1

0 views

PreviousPage 81 of 130Next