DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,577 datasets

Speech & Audio

Site-Averaged Neutron Soil Moisture Data from 1987-1989 FIFE Experiment

1987 to 1989 data provides site-averaged daily neutron probe soil moisture measurements collected during the FIFE experiment. The dataset is a processed product where samples were averaged first by site and then by day. It is hosted by the ORNL_CLOUD organization.

TabularTime SeriesZIPTextLAND SURFACESoil MoistureNeutron-probeEarth Science+1

0 views

Speech & Audio

Site Averaged Soil Moisture Data from 1987 FIFE Experiment

Site Averaged Gravimetric Soil Moisture Data from the 1987 (Betts) dataset provides daily averages of soil water content collected during the 1987-1989 FIFE field campaign. The data represents site-averaged product samples from 1987 only. It is managed by the ORNL_CLOUD organization.

TabularZIPTextFife ExperimentLAND SURFACESoil MoistureEarth Science+1

0 views

Speech & Audio

Site Averaged Gravimetric Soil Moisture Data From 1989

1989 data from the FIFE experiment provides site-averaged gravimetric soil moisture measurements. Samples were averaged for each site and then for each day. The dataset is managed by ORNL_CLOUD.

TabularTime SeriesZIPTextHydrologyLAND SURFACEAgricultureSoil Moisture+1

0 views

Speech & Audio

Creativeguy Persian ASR: Speech Recognition Dataset with 18808 Samples

18808 audio samples of Persian speech, totaling 143 hours and 45 minutes of audio. The dataset is split into 13165 training samples and 5643 test samples. It was uploaded by user 'veziriii' to Hugging Face in May 2026.

AudioAudio DataPersian LanguageSpeech Recognition+1

0 views

Speech & Audio

Gigaspeech2 Typhoon: 1,000 Thai Speech Samples for ASR Benchmarking

A metadata-only reference dataset containing 1,000 test samples for Thai speech recognition benchmarking. The dataset, created by typhoon-ai, provides audio IDs and human transcriptions derived from the Gigaspeech2 corpus, with the last update recorded on 2026-05-18. Each audio_id links to the original Gigaspeech2 dataset for audio file retrieval.

AudioAudio MetadataBenchmarkingNatural Language ProcessingSpeech RecognitionThai Language+1

0 views

Speech & Audio

MLS Eng Tokens: Pre-tokenized Audio Codec Tokens for TTS Training

somu9's mls_eng_tokens dataset provides pre-extracted audio codec tokens from the Multilingual LibriSpeech English corpus, tokenized using MOSS-Audio-Tokenizer. The dataset includes train, dev, and test splits and was last updated on 2026-05-17. Audio is processed at a 48,000 Hz sample rate and a 12.5 Hz frame rate.

TextAudioMultilingualText To SpeechMachine LearningTokensSpeech SynthesisMls CorpusNatural Language ProcessingAudio Tokens+1

0 views

Speech & Audio

Noise Reduction Method Performance and Time Complexity Comparisons

Comparison results of noise reduction performance and time complexity of methods in different environments. The dataset is a 5.5 KB Excel file authored by Hao Pei and last updated on 2026-05-08. It is licensed under CC-BY-4.0 and hosted on figshare.

TabularExcelSignal ProcessingPerformance ComparisonTime ComplexityNoise Reduction+1

0 views

Speech & Audio

Nav Train Gnjr S T T: Persian Speech Audio, 221,455 Samples

221,455 audio samples of Persian speech, totaling 753 hours and 52 minutes of audio. The dataset was uploaded by user 'argoveziriii' to Hugging Face and was last updated on May 25, 2026. Audio files have been resampled to 16000 Hz.

AudioPersianSpeech Recognition+1

0 views

Speech & Audio

Neyshekar V3 Asr Aligned: A Curated Persian Speech Recognition Dataset

A repaired subset of the Neyshekar v3 dataset for Persian automatic speech recognition. The dataset contains real audio clips and transcripts, curated by matching multiple ASR hypotheses back to the original transcript pool to ensure alignment. It was created by Peacockery and last updated on 2026-05-13.

AudioAudio TranscriptPersian LanguageBenchmarkAsr TrainingSpeech Recognition+1

0 views

Speech & Audio

Egyptian Arabic Speech Recognition Data for Food Ordering

Egyptian Arabic STT Dataset is a synthetic speech dataset containing 50 samples totaling 85.2 seconds of audio. The samples were generated by the Synthetic Egyptian Speech Data Pipeline and have been human-reviewed and quality-validated using Whisper ASR, achieving an average WER of 0.4136 and CER of 0.1642. The dataset focuses on the topic of food ordering.

AudioEgyptian-ArabicFood OrderingSynthetic DataSpeech RecognitionSynthetic+1

0 views

Speech & Audio

FormulaSpeech Datasets for Scientific Formula Verbalization

FormulaSpeech Datasets are designed to improve the verbalization of scientific formulas by large speech language models. The datasets support accessible learning scenarios, particularly for blind or low-vision learners relying on speech-enabled AI tutors. The repository is maintained by Stephen-Lee and was last updated on May 21, 2026.

TextAudioSpeech SynthesisAi TutorsScientific FormulasComputer VisionAccessible Learning+1

0 views

Speech & Audio

Live Music Venues In Melbourne Municipality

Melbourne municipality contains a list of dedicated live music venues and other spaces presenting live music. The data defines venues based on the Melbourne Live Music Census Report 2017 criteria of presenting live music at least two nights per week. It includes information on venue types, locations, and operating frequencies.

AudioNightclubBarHotelMelbourneLive MusicGigsVenueCityreactivation+1

0 views

Speech & Audio

IWSLT2026 IF Augmented: Chat-Style Audio-Instruction Examples for Speech Translation

RUN12 audio-instruction examples prepared from local files for chat-style training and evaluation. The dataset was uploaded by YapayNet and last updated on April 30, 2026. Each row contains one example, likely including an audio array and sampling rate for speech processing tasks.

AudioMultimodalAudio InstructionChat TrainingBenchmarkSpeech TranslationIwslt+1

0 views

Speech & Audio

PROCESS-2: Remote Speech Recordings for Cognitive Health Assessment

PROCESS-2 contains speech recordings from older adults performing three standard cognitive tests. The dataset was collected remotely via the CognoMemory automated digital assessment platform for research on speech-based biomarkers. It includes participants spanning healthy cognition, mild cognitive impairment (MCI), and dementia.

AudioRemote HealthBiomarkersCognitive AssessmentMedical ResearchSpeech Analysis+1

0 views

Speech & Audio

Persian TTS: Speech Synthesis Dataset with 51 Hours of Audio

A Persian text-to-speech dataset containing 19,458 audio samples totaling 51 hours and 43 minutes of speech. The audio has been resampled to 16000 Hz and is paired with Persian language transcripts. The dataset was uploaded by user 'veziriii' to Hugging Face and was last updated in May 2026.

TextAudioText To SpeechAudio DataSpeech SynthesisPersian Language+1

0 views

Speech & Audio

Thai Synthesized Audio Generated with OmniVoice TTS

Thai Synthesized Audio is a dataset created by ReopenAI and last updated on June 1, 2026. It contains audio generated by the OmniVoice text-to-speech model from example sentences simulating real-life scenarios. The example sentences were generated using common Thai vocabulary and the Gemma-4-31B-it model.

AudioText To SpeechSpeech SynthesisAudio GenerationThai Language+1

0 views

Speech & Audio

Sp Hw5: Persian Text-to-Speech Audio and Transcripts

Persian TTS Dataset contains 52,112 audio samples and corresponding transcripts, totaling 53 hours and 15 minutes of speech. Audio files have been resampled to 16000 Hz. The dataset was uploaded by author 'veziriii' and was last updated on 2026-05-25.

TextAudioText To SpeechAudio DatasetPersianSpeech Synthesis+1

0 views

Speech & Audio

Horn ASR Benchmark: Multilingual Speech Recognition for Horn of Africa Languages

A multilingual evaluation benchmark for automatic speech recognition covering four under-served languages of the Horn of Africa: Amharic, Oromo, Somali, and Tigrinya. It contains 4,000 utterances totaling 15.44 hours of audio, drawn from spontaneous interview-style speech with transcripts validated by native speakers. The dataset was created by LesanAI and last updated on May 7, 2026.

AudioMultilingualBenchmarkHorn Of AfricaAutomatic Speech Recognition+1

0 views

Speech & Audio

Canada-Saint Kitts Air Transport Agreement

One bilateral air transport agreement establishes the framework for commercial air services between Canada and Saint Kitts and Nevis. The document is an archived publication from Global Affairs Canada, referenced for research or recordkeeping. It was last updated on the platform in April 2026.

Text🇨🇦 CanadaSaint Kitts NevisInternational TreatyBilateral AgreementAir Transport+1

0 views

Speech & Audio

Supplementary Material for Validation of Stop Criteria in ASReview

Supplementary Material 2 from the research article 'When to stop reviewing: validation of stop criteria in ASReview'. The text file is 11.1 KB in size and was published on figshare by C. Kempny under a CC-BY-4.0 license. It was last updated on May 10, 2026.

TextSystematic ReviewStop CriteriaValidationActive LearningResearch MethodologyText Data+1

0 views

PreviousPage 19 of 129Next