Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,907 datasets
Version 3.0 geo-located Delay Doppler Maps (DDMs) calibrated into Power Received and Bistatic Radar Cross Section from the CYGNSS satellite constellation. The dataset includes other scientific parameters like Normalized BRCS, Delay Doppler Map Average, and Leading Edge Slope, plus quality flags, error estimates, and geolocation parameters. NASA provides up to 8 netCDF files daily, with a latency of approximately 6 days from the last measurement.
561 players of the 2024 video game Black Myth Wukong provided feedback for a study on music's influence on consumer behavior. The dataset supports analysis of how music cognition, immersion, and emotional arousal mediate purchase intentions, based on the Stimuli-Organism-Response (S-O-R) theory. It was analyzed using Partial Least Squares Structural Equation Modeling (PLS-SEM) to examine the role of gamification design and cultural confidence.
A PDF of the third movement, a Rondo andante, from Sonata 1 in G major for keyboard, violin, and cello, as found in Berkeley manuscript 793. The movement is described as a set of variations on a theme, likely with repeated thematic episodes. The dataset was authored by Matthew James Zenas Dicken and last updated on 2026-04 13.
A PDF musical score for a symphonia in G major, sketched out in four parts. The score is part of Berkeley Ms 794 and represents the Allegro movement. It was authored by Matthew James Zenas Dicken and published on figshare in April 2026.
A remastered version of Reubencf/fma-labeled prepared using Adaption's Adaptive Data platform contains descriptive text prompts designed to generate diverse musical tracks. The prompts detail instrumentation, rhythmic patterns, atmospheric qualities, and emotional tones across genres like pop, techno, ambient, and rock. Author Reubencf last updated the dataset on 2026-04-24.
272 perforated shells of Tritia cf. gibbosula from US 8 of El Mnasra cave, compared with specimens from Djerba and Taforalt. The dataset categorizes shell perforation types and conditions for archaeological analysis. It was authored by Emilie Campmas and shared under a CC BY 4.0 license.
Information on more than 120,000 games published on Steam, the largest PC gaming platform. The dataset was created by Fronkon Games using code and APIs from Steam and Steam Spy, and is maintained by user zjgeritz. It was last updated on April 13, 2026.
3,805 annotated audio recordings of classical Arabic poetry verses, totaling approximately 9 hours of data. The dataset was created by Dr. Abdul Kareem Saleh Al-Zahrani and published via Harvard Dataverse, with a last update in April 2026. Each sample is a single verse labeled according to one of 16 canonical Arabic poetic meters.
Offering unified tropical cyclone best-track data for Saint Kitts and Nevis, merging historical and recent records from multiple meteorological agencies via the IBTrACS project. It contains storm identifiers, temporal data, and physical parameters such as wind speed and central pressure. The data is maintained by HDX and was last updated in March 2026.
A 2017 study presents U–Pb zircon dating and palynological data from the middle Permian Canning Basin in Western Australia. The data reveals an apparent age conflict of 1.7 million years between tuffs in non-marine and marginal-marine facies, challenging established spore-pollen zonation. The dataset is associated with Geoscience Australia and the cited research article.
REAP observer output captures per-token routing decisions and expert activation norms for every MoE layer in the moonshotai/Kimi-K2.6 model. The dataset, authored by 0xSero, contains the results of a full calibration pass, providing saliency ingredients for analysis. It was last updated on April 23, -2026.
Geospatial boundaries and metadata for marine and terrestrial protected areas and Other Effective Area-based Conservation Measures (OECMs) in Saint Kitts and Nevis. Maintained by the UNEP-WCMC as part of the Protected Planet Initiative, this data is updated on a monthly basis to support international biodiversity reporting. It serves as a primary source for tracking progress toward the Kunming-Montreal Global Biodiversity Framework.
Teochew-Wild is a dataset of 12,500 audio clips from 20 native Teochew speakers. It was created by 'panlr' from online sources like news, storytelling, and TV programs, with annotations for standard characters and pinyin. The dataset was last updated in April 2026.
Common Voice Corpus 11.0 is a multilingual speech dataset consisting of MP3 audio files paired with corresponding text transcriptions. The dataset contains 24,210 recorded hours, with 16,413 validated hours across 100 languages. Many recordings include demographic metadata such as age, sex, and accent.
Tidmarsh, a former cranberry farm restored to a wetland in Plymouth, Massachusetts, is the source of this data. The dataset contains surface water discharge, nitrogen and nitrate concentrations, and specific conductivity measurements collected between 2016 and 2024. It was created by the Department of Agriculture to support watershed-scale modeling and analysis of nutrient retention.
A 214.8 KB PDF guide for the project "From Reed to Ney: Documenting Musical Craftsmanship and Pedagogy in Turkey." The guide was authored by Banu Senay and last updated on April 22, 2026. It is hosted on figshare under a CC-BY-NC-SA 4.0 license.
EGYSpeak is a curated dataset of 147,979 single-speaker Egyptian Arabic audio clips paired with transcriptions. It was created by MohamedGomaa30, sourced from the fadisarwat/egyptian-arabic-lines Kaggle dataset and processed through an ASR pipeline. The dataset was last updated on Hugging Face in April 2026.
A spatial land surface temperature (LST) index dataset for Massachusetts produced by MAPC Data Services. The data is derived from satellite imagery captured between April and October from 2018 to 2020, providing a relative heat tendency measure for each 30-meter pixel. The download includes three complementary datasets: the LST index, a variability raster, and a shapefile of the hottest 5% of areas.
Ukrainian speech dataset for TTS and ASR tasks, processed from the Yehor/audiobooks-xxl source. The audio has been filtered for music and noise, resampled to 24 kHz, and transcribed using the nvidia/canary-1b-v2 model. The dataset was created by Mikhailo and last updated on April 29, 2026.
TikTok Trending Hashtags and Music (2024 - 2025) contains the top 100 daily trending hashtags and music records from the TikTok Creative Center. The dataset covers the period from 2024-05-23 to 2025-07-09 and includes 13,399 unique hashtags and 11,157 unique songs. It was uploaded by author lingbow to Hugging Face.