DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,577 datasets

Speech & Audio

PianoCoRe: Combined and Refined Piano MIDI Dataset with 250,000 Performances

PianoCoRe is a large-scale piano MIDI dataset that unifies and refines major open-source piano corpora. It contains 250,046 performances of 5,625 pieces written by 483 composers, totaling 21,763 hours of performed music. The dataset was created by SyMuPe and was last updated on 2026-04-27.

AudioDigital MusicMusic AiLarge ScaleMusic PerformancePiano Midi+1

0 views

Speech & Audio

Filimo: Persian Speech Audio Dataset, 134,994 Samples

A Persian speech dataset containing audio files resampled to 16000 Hz. The collection includes 134,994 samples totaling 97 hours and 20 minutes of audio, split into training and test sets. It was uploaded by user 'veziriii' to Hugging Face and last updated on 2026-05-25.

AudioPersianFarsiAudio Processing+1

0 views

Speech & Audio

Nonverbal TTS Filtered Tokens: Pre-Extracted Audio Codec Tokens for TTS Training

Pre-extracted audio codec tokens for TTS training, containing 6,082 samples totaling 15.6 hours of audio. The dataset was created by author somu9 and was last updated on 2026-05-18. It uses the MOSS-Audio-Tokenizer-Nano codec at a sample rate of 48,000 Hz and a frame rate of 12.5 Hz.

AudioText To SpeechSpeech SynthesisAudio CodecPre TokenizedAudio Tokens+1

0 views

Speech & Audio

Music Surprisal Benchmarks: MIR Dataset Derivatives

MEX assets include metadata and precomputed baseline MID artifacts derived from standard Music Information Retrieval datasets. The dataset is a derivative of public sources like SALAMI and session, with licenses including CC0-1.0 and MIT. It was last updated on 2026-05-21 by author muthissar.

TabularAudioMusic Information RetrievalMidiBenchmarkSurprisal BenchmarkSurprisal+1

0 views

Speech & Audio

Large He Synthetic TTS Dataset: Hebrew Text-to-Speech Audio

A synthetic text-to-speech dataset for the Hebrew language, published on the HuggingFace platform by author notmax123. The dataset was last updated on June 25, 2026. Its specific size, format, and content details are not provided in the available metadata.

AudioText To SpeechSpeech SynthesisHebrew LanguageSynthetic SpeechSynthetic+1

0 views

Speech & Audio

Meddies ASR External Data ZH: Chinese Speech Recognition Dataset

Meddies ASR External Data ZH is a dataset for Chinese automatic speech recognition, published by the author 'Meddies' on the Hugging Face platform. The dataset's last recorded update was on July 1, 2026. Its specific content, size, and structure require verification after download due to minimal provided metadata.

AudioAudio DataChinese LanguageSpeech RecognitionAutomatic Speech Recognition+1

0 views

Speech & Audio

Slayprincess: Voice-Acted Audio Lines from a Video Game

TeraTTS provides a dataset of 9,394 high-quality audio clips paired with transcript text, extracted from the video game Slay the Princess. The collection totals approximately 13 hours of audio across three primary speakers. The dataset was last updated on Hugging Face in May 2026.

TextAudioSpeech SynthesisGame AudioAudio TranscriptsVoice Acting+1

0 views

Speech & Audio

Meddies ASR: External Audio Data for Speech Recognition

Meddies ASR External Data is a speech dataset published by Meddies on HuggingFace. The dataset's specific content and size are not detailed in the available metadata. It was last updated on July 1, 2026.

AudioAudio ProcessingSpeech RecognitionMedical+1

0 views

Speech & Audio

Indic ASR Eval: Multilingual Speech Recognition Test Set

A curated evaluation set for Indic-language automatic speech recognition. It contains 6,169 audio samples across 7 dataset configurations, totaling approximately 13.3 hours of audio at 16 kHz. The dataset was created by ayush-shunyalabs and last updated on 2026-04-23.

AudioMultilingualEvaluationBenchmarkNatural Language ProcessingSpeech Recognition+1

0 views

Speech & Audio

Commonwealth Cares for Children (C3) Funds: Massachusetts Child Care Program Disbursements

C3 provides monthly operational funding to child care programs across Massachusetts. Each row represents the amount of C3 funds disbursed to a program by fiscal year, which runs from July 1 to June 30. The data is published by educationtocareer.data.mass.gov and was last updated on April 13, 2026.

TabularCSVXMLJSONGovernment FundingMassachusettsFiscal DataChild Care+1

0 views

Speech & Audio

CORD-19 COVID-19 Open Research Papers

Dataset contains 19,000 open-access research papers related to COVID-19 collected from various sources between 2020 and 2021. Includes metadata such as titles, authors, abstracts, publication dates, and source repositories.

TabularGeospatialExcelSatellite ImageryLand Use+1

0 views

Speech & Audio

Exploratory Factor Analysis Practices in Music Research, 80.8 KB

Data and code for a systematic review of exploratory factor analysis practices in music psychology and music education. The dataset includes an Excel file with a codebook for each variable and an R Markdown file. It was authored by Daniel Yeom and last updated on April 17, 2026.

TabularAudioExcelSystematic ReviewMusic PsychologyFactor AnalysisMusic EducationResearch Data+1

0 views

Speech & Audio

Meddies ASR Raw Audios 1: A Collection of Speech Recordings

Meddies ASR Raw Audios 1 is a dataset of audio files published by the author 'Meddies' on the Hugging Face platform. The dataset was last updated on June 25, 2026. The title suggests it contains raw audio recordings, likely intended for use in automatic speech recognition (ASR) tasks.

AudioAudio DataMedical SpeechSpeech Recognition+1

0 views

Speech & Audio

TTS Pretrain Clones 3M: 3 Million Synthesized English Voice-Clone Utterances

2,967,779 clone utterances across 2,971 English speakers, generated by the echo-tts synthesizer. The dataset was created by SynDataLab and last updated on 2026-04 25. It contains WAV audio at 44.1 kHz, stored in Parquet files, with each speaker represented by 10 voice-clone latents and 100 synthesized texts.

AudioText To SpeechSpeech SynthesisVoice CloningAudio GenerationSynthetic+1

0 views

Speech & Audio

COM3D2: Japanese-to-Chinese Video Game Script Translations

A Japanese-to-Simplified Chinese pre-translation dataset extracted from the COM3D2 and CM3D2 video game series. The dataset includes text from the base games, their expansions, and nearly all DLCs up to April 4, 2026. It was created by author mollyadams, with translations primarily generated by GPT-5.2 xhigh and refined by GPT-5.4 xhigh, with a last recorded update on April 24, 2026.

TextJapanese ChineseMachine TranslationVideo Game TextNatural Language Processing+1

0 views

Speech & Audio

Massachusetts AP Exam Scores by School and Student Group Since 2007

Advanced Placement exam score data for Massachusetts public and charter schools from 2007 onward. The dataset includes counts of students receiving each score (1-5) and percentages scoring in low (1-2) and high (3-5) ranges, disaggregated by student demographic groups. Data is published by the Massachusetts Department of Elementary and Secondary Education (DESE).

TabularAudioTime SeriesCSVXMLJSONDemographicElEnglish LearnerEconomically DisadvantagedAdvanced PlacementStandardized TestApEducation AssessmentTestUniversityStudent PerformanceRaceStandardized TestingStudents With DisabilitiesHigh NeedsPublic SchoolsDisabilityScienceGenderEcodisEnglish LearnersRace And Ethnicity+1

0 views

Speech & Audio

Hindi STT Benchmarking Dataset with 10,000 Utterances from Six Sources

10,000 Hindi utterances across six Vistaar-derived parts provide a benchmark for speech-to-text systems. The dataset contains about 15.5 hours of 16 kHz mono WAV audio, each with a reference transcript and outputs from four ASR services. It was published by RinggAI and last updated in April 2026.

TabularAudioHindiBenchmarkingSpeech Recognition+1

0 views

Speech & Audio

Saint Kitts and Nevis Population Grid Projections 2015-2030

100-meter resolution gridded population estimates for Saint Kitts and Nevis, created using a Random Forest-based dasymetric redistribution method. The dataset provides annual estimates of the total number of people per pixel from 2015 to 2030 in GeoTIFF format. WorldPop produced this 2025 Alpha release version in September 2025.

GeodataBaseline Population+1

0 views

Speech & Audio

Quebec ISBN-Registered Documents Since 2010

Bibliothèque et Archives nationales du Québec (BAnQ) provides a dataset of all ISBN-registered documents published in Quebec since 2010, acquired through legal deposit, purchase, or donation. The collection includes musical scores, official publications, books, and show programs.

0 views

Speech & Audio

Massachusetts AP Exam Participation by Subject and Student Group

Advanced Placement exam participation counts for all Massachusetts public school students from 2007 onward. Data is disaggregated by test subject, student demographic group, and school district. The Massachusetts Department of Elementary and Secondary Education publishes this dataset.

TabularAudioTime SeriesCSVXMLJSONEducation PolicyDemographicElLow IncomeMassachusettsEnglish LearnerEconomically DisadvantagedAdvanced PlacementApTestUniversityStudent DemographicsRaceStandardized TestingStudents With DisabilitiesHigh NeedsDisabilityScienceGenderEcodisEnglish LearnersRace And Ethnicity+1

0 views

PreviousPage 21 of 129Next