Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,907 datasets
Site Averaged Gravimetric Soil Moisture Data from the 1987 (Betts) dataset provides daily averages of soil water content collected during the 1987-1989 FIFE field campaign. The data represents site-averaged product samples from 1987 only. It is managed by the ORNL_CLOUD organization.
A metadata-only reference dataset containing 1,000 test samples for Thai speech recognition benchmarking. The dataset, created by typhoon-ai, provides audio IDs and human transcriptions derived from the Gigaspeech2 corpus, with the last update recorded on 2026-05-18. Each audio_id links to the original Gigaspeech2 dataset for audio file retrieval.
somu9's mls_eng_tokens dataset provides pre-extracted audio codec tokens from the Multilingual LibriSpeech English corpus, tokenized using MOSS-Audio-Tokenizer. The dataset includes train, dev, and test splits and was last updated on 2026-05-17. Audio is processed at a 48,000 Hz sample rate and a 12.5 Hz frame rate.
Comparison results of noise reduction performance and time complexity of methods in different environments. The dataset is a 5.5 KB Excel file authored by Hao Pei and last updated on 2026-05-08. It is licensed under CC-BY-4.0 and hosted on figshare.
221,455 audio samples of Persian speech, totaling 753 hours and 52 minutes of audio. The dataset was uploaded by user 'argoveziriii' to Hugging Face and was last updated on May 25, 2026. Audio files have been resampled to 16000 Hz.
A repaired subset of the Neyshekar v3 dataset for Persian automatic speech recognition. The dataset contains real audio clips and transcripts, curated by matching multiple ASR hypotheses back to the original transcript pool to ensure alignment. It was created by Peacockery and last updated on 2026-05-13.
Egyptian Arabic STT Dataset is a synthetic speech dataset containing 50 samples totaling 85.2 seconds of audio. The samples were generated by the Synthetic Egyptian Speech Data Pipeline and have been human-reviewed and quality-validated using Whisper ASR, achieving an average WER of 0.4136 and CER of 0.1642. The dataset focuses on the topic of food ordering.
FormulaSpeech Datasets are designed to improve the verbalization of scientific formulas by large speech language models. The datasets support accessible learning scenarios, particularly for blind or low-vision learners relying on speech-enabled AI tutors. The repository is maintained by Stephen-Lee and was last updated on May 21, 2026.
Melbourne municipality contains a list of dedicated live music venues and other spaces presenting live music. The data defines venues based on the Melbourne Live Music Census Report 2017 criteria of presenting live music at least two nights per week. It includes information on venue types, locations, and operating frequencies.
RUN12 audio-instruction examples prepared from local files for chat-style training and evaluation. The dataset was uploaded by YapayNet and last updated on April 30, 2026. Each row contains one example, likely including an audio array and sampling rate for speech processing tasks.
PROCESS-2 contains speech recordings from older adults performing three standard cognitive tests. The dataset was collected remotely via the CognoMemory automated digital assessment platform for research on speech-based biomarkers. It includes participants spanning healthy cognition, mild cognitive impairment (MCI), and dementia.
A Persian text-to-speech dataset containing 19,458 audio samples totaling 51 hours and 43 minutes of speech. The audio has been resampled to 16000 Hz and is paired with Persian language transcripts. The dataset was uploaded by user 'veziriii' to Hugging Face and was last updated in May 2026.
Thai Synthesized Audio is a dataset created by ReopenAI and last updated on June 1, 2026. It contains audio generated by the OmniVoice text-to-speech model from example sentences simulating real-life scenarios. The example sentences were generated using common Thai vocabulary and the Gemma-4-31B-it model.
Persian TTS Dataset contains 52,112 audio samples and corresponding transcripts, totaling 53 hours and 15 minutes of speech. Audio files have been resampled to 16000 Hz. The dataset was uploaded by author 'veziriii' and was last updated on 2026-05-25.
A multilingual evaluation benchmark for automatic speech recognition covering four under-served languages of the Horn of Africa: Amharic, Oromo, Somali, and Tigrinya. It contains 4,000 utterances totaling 15.44 hours of audio, drawn from spontaneous interview-style speech with transcripts validated by native speakers. The dataset was created by LesanAI and last updated on May 7, 2026.
One bilateral air transport agreement establishes the framework for commercial air services between Canada and Saint Kitts and Nevis. The document is an archived publication from Global Affairs Canada, referenced for research or recordkeeping. It was last updated on the platform in April 2026.
Supplementary Material 2 from the research article 'When to stop reviewing: validation of stop criteria in ASReview'. The text file is 11.1 KB in size and was published on figshare by C. Kempny under a CC-BY-4.0 license. It was last updated on May 10, 2026.
somu9 provides 20,141 pre-extracted audio codec tokens for text-to-speech training, derived from the reach-vb/jenny_tts_dataset. The collection contains 26.4 hours of audio, tokenized using the MOSS-Audio-Tokenizer-Nano codec at 48 kHz stereo and a frame rate of 12.5 Hz.
23,419 audio-transcription pairs totaling 72 hours of Farsi speech data, contributed by 678 distinct speakers. This dataset is part of the YodaLingua multilingual collection, designed for training text-to-speech and automatic speech recognition models. It was uploaded by Thomcles to Hugging Face and last updated on 2026-04-27.
A large-scale symbolic music dataset in ABC notation, curated to support text-driven sheet music generation. It was released as part of the Text2Score project by emotionwave-company and last updated on May 12, 2026.