DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,577 datasets

Speech & Audio

Meddies ASR Test: Audio Data for Speech Recognition

Meddies ASR Test is a dataset hosted on HuggingFace, authored by Meddies. The dataset's title suggests it contains audio data likely intended for testing automatic speech recognition systems. Its last update was recorded as 2026-06-26 09:05:30.

AudioAudio TestMedical AudioSpeech Recognition+1

0 views

Speech & Audio

Jenny 30H Tokens: Pre-tokenized Audio Codec Tokens for TTS Training

somu9 provides 20,141 pre-extracted audio codec tokens for text-to-speech training, derived from the reach-vb/jenny_tts_dataset. The collection contains 26.4 hours of audio, tokenized using the MOSS-Audio-Tokenizer-Nano codec at 48 kHz stereo and a frame rate of 12.5 Hz.

AudioText To SpeechSpeech SynthesisPre TokenizedAudio Tokens+1

0 views

Speech & Audio

Meedies Asr Human Labels: Speech Data with Annotations

Meedies Asr Human Labels is a dataset published on the HuggingFace platform by an author named Meddies. The dataset appears to contain human-labeled data, likely for training or evaluating automatic speech recognition systems. Its last recorded update was on June 30, 2026.

AudioHuman LabelsSpeech Recognition+1

0 views

Speech & Audio

YodaLingua-Farsi: 72 Hours of Farsi Speech for TTS and ASR

23,419 audio-transcription pairs totaling 72 hours of Farsi speech data, contributed by 678 distinct speakers. This dataset is part of the YodaLingua multilingual collection, designed for training text-to-speech and automatic speech recognition models. It was uploaded by Thomcles to Hugging Face and last updated on 2026-04-27.

AudioMultimodalMultilingualFarsi LanguageSpeech SynthesisMultilingual SpeechSpeech Recognition+1

0 views

Speech & Audio

Text2Score: Large-Scale Symbolic Music Dataset in ABC Notation

A large-scale symbolic music dataset in ABC notation, curated to support text-driven sheet music generation. It was released as part of the Text2Score project by emotionwave-company and last updated on May 12, 2026.

TextAudioMusic GenerationText To MusicLarge ScaleAbc Notation+1

0 views

Speech & Audio

Synthetic Egyptian Arabic Speech Dataset with Quality Metadata

A curated collection of 1000 Egyptian Arabic speech samples, each representing the best audio selected from four different text-to-speech models. The dataset was created by MohamedGomaa30 and includes transcription text and quality metadata for each entry. It was last updated on May 14, 2026.

TabularAudioText To SpeechEgyptian-ArabicSpeech SynthesisAudio QualitySynthetic+1

0 views

Speech & Audio

OpenSLR-132: Quran Speech to Text Dataset

OpenSLR-132 is a Quran speech to text dataset sourced from OpenSLR.org. The dataset was uploaded to Hugging Face by the user deepdml and was last updated on June 8, 2026. The specific content, size, and structure of the audio files are not detailed in the provided metadata.

AudioArabicQuranReligious TextSpeech Recognition+1

0 views

Speech & Audio

LMD AI Detection: A Benchmark for AI-Generated Symbolic Music

A benchmark dataset for detecting AI-generated symbolic music, focusing on the MIDI format. The dataset was created by dhlee3000 and last updated on May 15, 2026. It addresses concerns about authenticity in digital music by providing a resource for a domain previously less explored than audio deepfake detection.

AudioMusic DetectionAudio ForensicsAi Generated MusicMidiBenchmarkSynthetic+1

0 views

Speech & Audio

Major Head and Neck Reconstruction During COVID-19: The University of Pittsburgh Approach

A manuscript from the University of Pittsburgh Medical Center describes a surgical approach for head and neck cancer patients during the COVID-19 pandemic. Authored by Mark Kubik, it details methods for providing timely reconstructive care while minimizing infectious risk to providers, patients, and families.

TextCovid 19Medical SurgeryClinical Procedures+1

0 views

Speech & Audio

Meddies Asr Test: Audio Data for Speech Recognition

Meddies Asr Test is a dataset uploaded to HuggingFace by an author named Meddies. The dataset was last updated on June 26, 2026. Its specific content and scale are not detailed in the provided metadata.

AudioAudio TestSpeech Recognition+1

0 views

Speech & Audio

Hinglish: A Large-Scale Speech Dataset for Hindi, Hinglish, and Indian English

815,171 audio clips totaling over 2,264 hours of speech, compiled by agarwalayushi and last updated in April 2026. This dataset covers Hindi, Hinglish (Hindi-English code-switching), and Indian English, sourced from 14 public corpora and custom recordings, unified into a single Parquet file.

AudioMultilingualCode SwitchingLarge ScaleAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Ru Asr Audio: Russian Speech Recognition Data

Russian speech audio data published on the Hugging Face platform by PotatoHD. The dataset's last recorded update was on 2026-06-22. Columns, sample data, and exact size are currently unknown.

AudioSpeech DataRussian LanguageSpeech Recognition+1

0 views

Speech & Audio

Meddies ASR External Data: Engineering Resources for Speech Recognition

Meddies provides engineering resources for automatic speech recognition systems. The dataset's specific content and scale are not detailed in the available metadata. It was last updated on July 1, 2026.

AudioEngineeringAudio ProcessingExternal DataSpeech Recognition+1

0 views

Speech & Audio

Amazon Musical Instrument Reviews with 5-Core Filtering, 2018-2023

A sampled subset of Amazon Reviews from the Musical Instruments category, filtered for a recommendation system project. The data covers reviews from January 2018 to September 2023 and was processed with iterative 5-core filtering to ensure users and items have at least five interactions. It was created by oyku-tugana and includes a held-out test set of 5000 users for cold-start evaluation.

TabularE CommerceRecommender SystemsBenchmarkUser ReviewsAmazon ReviewsMusical Instruments+1

0 views

Speech & Audio

IOAI 2026 Home Task 1 Audio Dataset: AST Checkpoint and Sound Classes

IOAI 2026 Home Task 1 Audio Dataset provides audio data and an Audio Spectrogram Transformer checkpoint for 29 sound classes. The dataset includes 16 retained and 13 new sound classes. It was created for a Kaggle competition task, though the specific author and organization are unknown.

AudioMachine LearningAudio ClassificationAst CheckpointSound Classes+1

0 views

Speech & Audio

Emo-TTS: High-Arousal Emotional Speech Evaluation Datasets

Emo-TTS Evaluation Datasets are used for evaluating a training-free inference framework for high-arousal emotional speech synthesis. The collection includes the HIED dataset with 400 samples across four emotions and the ESD dataset with 20 speakers across two languages. The dataset was created by author 'erminga' and was last updated on 2026-04-29.

AudioText To SpeechSpeech SynthesisBenchmarkEvaluation DatasetEmotional Speech+1

0 views

Speech & Audio

Polish Historical-Scan OMR Benchmark: 112 Scored Pages

Polish Historical-Scan OMR Benchmark is a page-level Optical Music Recognition evaluation dataset containing 112 real historical score scans. It provides paired **kern (Humdrum) and MusicXML ground-truth transcriptions. The dataset was derived from the PRAIG/polish-scores dataset, with kern normalization and manual fixes applied.

ImageAudioMultimodalHistorical DocumentsMusic ScoresBenchmarkOptical Music Recognition+1

0 views

Speech & Audio

LibriHeavy TTS 3: A 50,000-Hour Speech Corpus for Text-to-Speech Training

LibriHeavy TTS 3 is an improved version of the LibriHeavy dataset, designed specifically for text-to-speech training quality. It is built on a 50,000-hour labeled ASR corpus derived from LibriLight, with audio encoded using the Opus 68kbps codec. The dataset, authored by brthor and last updated in April 2026, focuses on providing better audio and text supervision quality.

TextAudioSpeech SynthesisSpeech CorpusAudio TextNatural Language ProcessingLibriheavy+1

0 views

Speech & Audio

Armenian YouTube Speech Audio and Captions from Azatutyun

Armenian speech audio and caption files collected from the Azatutyun YouTube channel. The dataset includes a 'train' split with clean audio and captions and a 'bad_subtitles' split with known noisy captions. It was created by Arthuryann and last updated on May 5, 2026.

AudioMultimodalAudio TranscriptionArmenian LanguageSpeech RecognitionYoutube Content+1

0 views

Speech & Audio

Musicians Seamounts Expedition Oceanographic and Meteorological Data

Oceanographic data from a NOAA expedition mapping the Musicians Seamounts chain up to 650 nautical miles north of Hawaii. The dataset includes shipboard sensor measurements for navigation, meteorology, and oceanography collected from August 8 to August 31, 2017. It is produced by the National Oceanic and Atmospheric Administration and is available on multiple government data platforms.

TabularAudioTime SeriesGeospatialOceanographySeamountsPacific OceanMeteorologyNavigation+1

0 views

PreviousPage 20 of 129Next