Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
43,049 datasets
126 football pitches and turf samples were tested for vertical compliance and rotational stiffness using FIFA-approved devices. The supplementary PDF files contain data that informed revisions to the FIFA Quality Programme's performance thresholds for playing surfaces. Author David James published the files under a CC BY 4.0 license in May 2026.
Plasma Science and Fusion Center Dataverse hosts a dataset by Jintao Hu, Patricia Sadde, Liangjun Shao, Philip C. Michael, and Dongkeun Park describing a novel insulated magnet design. The dataset likely contains experimental and design parameters for a REBCO magnet using Pyralux insulation and a four-tape co-winding technique. The record was last updated on June 18, 2026.
545 records form this release, combining a seed corpus with captured LoopGym trajectories for LoopNet. The dataset was created by KanakMalpani and was last updated on June 14, 2026. Records conform to the 'ln/record-v1' schema.
The Gippsland Lakes Local Coastal Hazard Assessment provides the extent of a 10% Average Exceedance Probability water level event, incorporating 0.2 meters of sea level rise based on hydrodynamic modelling. It was produced by the Department of Energy, Environment and Climate Action, with the dataset last updated on April 8, 2026. The hazard extent results from a combination of catchment inflows, coastal ocean levels, and wind setup.
Reporte de novedades realizadas por los afiliados al Sistema General de Seguridad Social en Salud tracks administrative changes for health system affiliates in Colombia. The dataset includes columns for origin and destination regimes, municipalities, EPS providers, and demographic details like age and sex. It is hosted on the Colombian open data portal datos.gov.co and was last updated on 2026-05-18.
NASA's SASSIE field campaign deployed two types of profiling floats with different ice-avoidance behaviors to capture the transition from summer melt to autumn ice advance in the Beaufort Sea. ALTO floats halted transmissions at near-freezing surface temperatures to survive winter, while ALAMO floats continued reporting during freeze-up at the likely cost of their survival. This dataset provides in situ temperature and salinity measurements from August-October 2022 within approximately 200 kilometers of the sea ice edge.
The Gippsland Lakes Local Coastal Hazard Assessment (LCHA) provides data on the extent of coastal hazards for the Gippsland Lakes coastal environment. This specific dataset represents the inundation extent for a 10% Average Exceedance Probability (AEP) water level, incorporating 0.4 meters of sea level rise based on hydrodynamic modelling. It was published by the Department of Energy, Environment and Climate Action and was last updated on 2026-04-08.
Bo Sun's dataset from 2026 contains the experimental setup for a ground motion inversion model combining Generalized Chaotic Particle Swarm Optimization and Generalized Inversion Technique (GCPSO-GIT). The 9.5 KB XLS file likely details parameters and results from tests demonstrating the model's performance in decoupling source, path, and site effects. Results cited include a site effect variation coefficient peak of 12% and a median source stress drop of approximately 42 bar.
A 24-marker initial panel of biomarkers linked to cardiovascular phenotype in people with HIV, expanded with 31 exploratory biomarkers. The dataset was created by Rachel Mac Cann and published on figshare under a CC-BY-4.0 license, last updated on 2026-04 27. It is a 9.5 KB Excel file used to evaluate recursive feature addition models for biomarker-driven clustering.
Weiguang Dong's experimental results for a hybrid emotion recognition model, published on figshare in April 2026. The 5.5 KB XLS file contains performance metrics from evaluations on three benchmark datasets: SemEval-2018, RAVDESS, and CMU-MOSEI.
Weiguang Dong's ablation experiment results, uploaded to figshare on 2026-04-27. The 5.5 KB XLS file contains performance metrics for a hybrid emotion recognition model evaluated on three benchmark datasets: SemEval-2018, RAVDESS, and CMU-MOSEI. The model achieved state-of-the-art results, including 58.5% accuracy on SemEval-2018 and 82.3% accuracy on CMU-MOSEI.
Australia's National Environmental Science Program (NESP) compiled a spatial index of environmental literature for 100 threatened and migratory marine species in relation to Offshore Renewable Energy (ORE) areas. The inventory records study locations, methodologies, and potential impacts of ORE infrastructure on species like birds, cetaceans, and turtles. Data was sourced from a systematic literature review and observation repositories including BirdLife Australia and the Atlas of Living Australia.
Nemotron Personas Belgium uses a compound AI approach to generate multilingual Belgian personas. The personas are grounded in real-world distributions, suggesting they model demographic or behavioral patterns. Created by NVIDIA and last updated on June 17, 2026, this dataset is intended for AI training tasks.
ASR-KCSC is an open-source Korean conversational speech corpus containing 5.22 hours of transcribed audio. The data consists of 22 conversations between seven pairs of speakers recorded on mobile devices in indoor environments. Author MagicHub released the dataset on Hugging Face, with a last recorded update in June 2026.
Chimera-XTRM is a synthetic dataset engineered for fine-tuning Large Language Models in advanced Red-Team operations. The dataset was created by author Umranz and was last updated on June 21, 2026. It is intended strictly for authorized security research and defensive training.
A hierarchical framework of geomorphological spatial entities at three tiers, with Tier 1 containing 8 Divisions, Tier 2 containing 34 categories, and Tier 3 containing 95 categories. The dataset, created by the Department of Energy, Environment and Climate Action, provides a spatial system to assist planning, monitoring and reporting for natural resource management in Victoria and Australia. It was last updated on 2026-04-08.
85 variables provide fire detection and retrievals of Fire Radiative Power (FRP), fire Visible Energy Fraction (VEF), and Modified Combustion Efficiency (MCE). The NASA/NOAA Suomi NPP VIIRS FILDA-2 product is generated in 6-minute orbit segments at a 750-meter spatial resolution, designed to detect smaller and cooler fires using visible band observations at night. This dataset supports analysis of fire characteristics and combustion efficiency globally.
Global satellite-derived data provides cumulative 8-day composites of Gross Primary Productivity (GPP) and Net Photosynthesis (PSN) at a 500-meter spatial resolution. The dataset, based on the radiation use efficiency concept, is designed as an input for models calculating terrestrial energy, carbon, and water cycle processes. It contains three primary variables for GPP and PSN alongside a quality control layer.
Alberta, Canada, contains wetland inventory data for four pilot study areas totaling approximately 39,045 km². The Government of Alberta, Ducks Unlimited Canada, and Alberta Biodiversity Monitoring Institute collaborated to develop this inventory using Earth Observation imagery and machine learning techniques. The dataset identifies wetland class and form according to the Alberta Wetland Classification System.
Seasat-A Scatterometer (SASS) data provides monthly averaged ocean surface wind stress from July to October 1978. The data is gridded on a 2.5-degree global grid, with vector wind stress stored in dynes per square centimeter. It is derived from 96 days of SASS vector winds processed to remove directional ambiguities using a GSFC atmospheric model.