Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,908 datasets
25,000 fully-diacritized Modern Standard Arabic text and audio pairs synthesized by a single Saudi male neural voice. The dataset was created by HeshamHaroon and was last updated on April 20, 2026. Audio clips are rendered at 48 kHz / 16-bit PCM and are organized across 10 thematic categories.
A dataset of conversational speech audio paired with transcripts and prompts. It contains turn-based dialogue data with columns for conversation identifiers, speaker agents, text prompts, transcripts, and audio files. The dataset was uploaded by ShiniChien to Hugging Face and last updated on 2026-05-15.
NOAA's Integrated Ocean and Coastal Mapping initiative produced orthorectified mosaic image tiles for coastal Maine. The dataset includes true color (RGB) and infrared (IR) imagery for Cutts Island, Penobscot, and Reversing Falls, captured from June 5 to 21, 2011, with a ground sample distance of 0.50 meters per pixel. Imagery is provided in TIFF format with associated metadata and browse graphics.
June and July 2024 field data collected by the Alaska Division of Geological & Geophysical Surveys across an approximately 24,479 km2 area near Anaktuvuk Pass, Alaska. The dataset presents field station locations, observations, sediment sample descriptions, grain-size analyses, and links to photographs. This work was completed in support of a sand and gravel resource assessment for the Arctic Strategic Transportation and Resources (ASTAR) project.
Pittsburgh region meteorological data collected between July 2001 and November 2002 as part of the EPA's Particulate Matter Supersite Program. The dataset includes measurements of temperature, relative humidity, precipitation, wind speed and direction, UV intensity, and solar intensity from a central site and five satellite locations. It was produced by the NARSTO partnership and archived by NASA.
Cc100 Nepali TTS Shristi Encoded is a dataset for text-to-speech (TTS) applications in the Nepali language. The dataset was uploaded by user lilgoose777 to the Hugging Face platform and was last updated on May 31, 2026. Its specific content and structure are not detailed in the available metadata.
MinSpeech is a cleaned, multi-dialect Min-nan (Southern Min) speech dataset maintained as a private research fork by user 'scbz'. The dataset is intended for fine-tuning Automatic Speech Recognition and Speech-to-Text Translation models. It was last updated on the Hugging Face platform on April 20, 2026.
Palynology and paleoecology data from the Mattson Formation in northwest Canada, published by the Government of Yukon. The dataset was last updated in March 2026 and is available under a yk-oglyk license.
LibriSpeech Segment is an English read-speech corpus with phone-level time alignments generated by the Montreal Forced Aligner. The dataset is derived from the LibriSpeech corpus and is suitable for training and evaluating phone recognition and phonetic segmentation models. It was created by changelinglab and last updated on the platform in April 2026.
DELEGATE52 is a benchmark dataset for evaluating large language models on long-horizon delegated document editing across 52 professional document domains. The dataset was developed by Microsoft to study the readiness of AI systems for delegated workflows, where knowledge workers instruct LLMs to edit documents on their behalf over long sessions. It was last updated on 2026-04-20.
The Armed Conflict Location & Event Data Project (ACLED) provides weekly aggregated counts of political violence, civilian-targeting, and demonstration events in Saint Kitts and Nevis. Organized by country-year and country-month intervals, the data enables longitudinal monitoring of conflict trends through early 2026.
2,740 audio recordings at 16 kHz form a dataset for traditional Chinese speech synthesis and recognition. Each entry includes the audio, its length, traditional Chinese text, and a corresponding normalized simplified Chinese text. The dataset, originally from ivanzhu109/zh-taiwan, is mirrored and formatted by lianghsun.
A professional-grade German text-to-speech training corpus created by author semidark and last updated on 2026-04-13. It combines high-quality human narration from the HUI Audio Corpus and LibriVox with synthetic augmentation to provide a legally safe alternative for training models like kokoro. The dataset is described as a work in progress.
Seed Tts Eval Arrow is a dataset for evaluating text-to-speech systems, published on HuggingFace by zhaochenyang20. The dataset was last updated on 2026-05-22. Its specific content and scale require verification after download.
Dari Wavs is an audio dataset created by Sanji27. The description suggests the dataset could be expanded in size and include transcripts ready for automatic speech recognition (ASR). The dataset was last updated on May 17, 2026.
Voicebench Ja contains 4 subsets created by applying speech synthesis to samples from three Japanese text benchmarks: Elyza-tasks-100, M-IFEval, and JamC-QA. The dataset was constructed by SB Intuitions using their internal TTS model and JVS corpus audio prompts to quantitatively evaluate performance gaps between audio and text inputs for language models. It was last updated on March 30, 2026.
A test audio dataset for the ADLIB language-aware ASR benchmark framework for Japanese. It contains 247 test cases with audio from 3 speakers, focusing on the DevTerm (software development terminology) domain. Reference transcripts and term annotations are provided in a separate JSONL file within the project's GitHub repository.
529 audio segments totaling 46 minutes provide speech data for Turkana, an Eastern Nilotic language with roughly 1 million speakers in Kenya. The dataset was created by Speedykom using Bible narratives from the Global Recordings Network, segmented via silence detection. Transcripts were auto-generated using the facebook/mms-1b-all model with a Teso adapter.
A text-to-speech dataset published on HuggingFace by author thach124. The dataset was last updated on 2026-05-28 10:23:30. Its specific content and scale are unknown from the provided metadata.
NASA's CYGNSS constellation provides calibrated Delay Doppler Maps (DDMs) measuring ocean surface scattering. The dataset contains daily files from up to 8 spacecraft, with a typical latency of 6 days from measurement. Version 3.1, produced by POCLOUD, supersedes Version 3.0 with improved antenna gain calibration and quantization correction.