Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
41,939 datasets
20.6 KB of research data investigating the relationship between math anxiety and attention. The dataset, authored by Sarit Ashkenazi and last updated in May 2026, contains results from a spatial flanker task using negative, neutral, and mathematical picture distractors. It includes both verbal and nonverbal measures of math anxiety.
EMS Computer Aided Dispatch System records track incident lifecycle from creation to closure, including resource assignment and Fire Department response. To protect personal identifying information under HIPAA, specific incident locations are aggregated to a higher level of detail. The data is hosted by data.cityofnewyork.us and was last updated on 2026-05-15.
Acoustic detection data from 20 lemon sharks (Negaprion brevirostris) tagged with acceleration-sensing transmitters. Shark movements and body acceleration were monitored from April to December 2019 via a network of 60 receivers around Bimini, Bahamas. The dataset was authored by Evan Byrnes and shared under a CC-BY-4.0 license.
Four 70-78 year old lodgepole pine stands in British Columbia were measured during the 1984 growing season to study soil water content's influence on resource allocation. The dataset includes stand characteristics, soil data, biomass distribution, production allocation, and climate data from a nearby weather station. It was produced by ORNL_CLOUD and has been revised to correct a sampling date error.
A prospective observational cohort of 642 adults undergoing elective gastrectomy or partial colectomy for gastrointestinal malignancy. The dataset, authored by Zi Liu and shared under CC-BY-4.0, quantifies energy and protein intake during postoperative days 1-2 and tracks complications within 30 days after surgery to model dose-response relationships.
Statistics Canada provides data on characteristics of occupied private dwellings in rural Canada. The table includes counts, number of rooms, number of bedrooms, year of construction, and owner-estimated value. It was last updated on 2026-06-08.
Exaud Tweve's dataset on figshare contains technical specifications for an IoT-based demand-side energy management framework for industrial solar microgrids. The data likely includes results from a high-fidelity simulation and physical prototype, showing reductions in apparent and real power demand. The dataset was last updated on 2026-05-04.
9,240 model specifications from a multiverse analysis investigating school-related stress contagion among Swedish secondary school students. The dataset, 9.5 KB in size, was created by Björn Högberg and last updated on May 4, 2026. It contains results from systematically varying statistical models, stress measurements, and sample restrictions to test the robustness of stress contagion effects.
The Secretariat of Health of the Municipality of Fusagasugá provides data on commercial establishments inspected for sanitary compliance as of July 31, 2025. The dataset includes the establishment type, inspection date, risk level, and the current sanitary concept issued. Data is available in CSV, JSON, XML, and RDF formats and was last updated on May 18, 2026.
NASA/NOAA's Suomi NPP VIIRS Burned Area product provides monthly, global 500-meter gridded data on fire-affected land. The hybrid algorithm uses 750 m VIIRS imagery and active fire observations with a burn-sensitive Vegetation Index to detect changes and assign a burn date to the nearest day. This dataset is designed to continue the MODIS burned area record, providing layers for Burn Date, Burn Date Uncertainty, Quality Assurance, and detection period.
Yearly intervals of global land surface phenology metrics are provided at a 0.05-degree (~5,600 meter) spatial resolution. The dataset contains 19 science layers derived from VIIRS satellite data, including six phenological transition dates, growing season length, and confidence metrics for up to two growing cycles per year. It is designed for comparison with the MCD12C2 product, with known issues documented on a dedicated NASA assessment website.
The Government of Canada AI Register collects information about AI systems used within the federal public service. This minimum viable product version was assembled from existing sources including Algorithmic Impact Assessments and Access to Information requests. It was published by the Treasury Board of Canada Secretariat and last updated on 2026-04-28.
Newcastle City Council publishes monthly lists of all payments exceeding £250, detailing recipients, amounts, and purposes to promote spending transparency. The dataset excludes staff salaries, housing benefits, and sensitive personal information, with some supplier names redacted for data protection. Records from before October 2012 only include payments over £500, indicating a historical policy change.
CORINE Land Cover 5 ha (CLC5) 2012 is a vector dataset describing landscape in Germany. It combines land cover and land use information from detailed models, generalized to a minimum mapping unit of 5 hectares. The data is provided by the Bundesamt für Kartographie und Geodäsie under the Data licence Germany – attribution – Version 2.0.
Sean O’Hagan's research outputs from a 2026 paper proposing a self-aware framework to accelerate Approximate Bayesian Computation (ABC) rejection sampling. The dataset includes code, supplementary materials, and likely contains simulation results from applications in epidemiological modeling and alloy material discovery. The 4.6 MB collection is licensed under CC-BY-4.0.
CORINE Land Cover 5 ha (CLC5) 2012 provides a generalized grid-based description of the German landscape. The Bundesamt für Kartographie und Geodäsie produced it by deriving unique CLC classes from detailed 1-ha land cover and land use models, then generalizing to a 5-hectare minimum mapping unit. It integrates aspects of land cover, land use, soil sealing, and vegetation content under a standardized European nomenclature.
CORINE Land Cover 5 ha (CLC5) 2015 presents a landscape description in grid format under the CLC nomenclature, reflecting land cover and use. The dataset was produced by the Bundesamt für Kartographie und Geodäsie, generalizing detailed land cover and land use models to a 5-hectare minimum mapping unit. It is derived from intersecting land cover models for 2015 and 2018 to identify changes and recalculate the 2015 status.
A 2026 replication package by Zijie Huang for research on automated code smell detection. It includes the MLCQ benchmark with 14,739 annotations from 522 repositories, results from 40 developer interviews, and evaluation data from 1,840 developer assessments. The package contains datasets, source code for a Java subsystem, and Python scripts for reproducing machine learning experiments.
Experimental data for four iron-based ionic liquids under Martian surface conditions. Thermal characterization includes differential scanning calorimetry and thermogravimetric analysis under neat, CO₂-saturated, and nanoconfined conditions. Spectroscopic data include ATR-FTIR and confocal Raman spectra, with companion code for analyzing Mars atmospheric warming events and screening SuperCam Raman spectra from the Perseverance rover.
Characterization data for the population receiving funeral aid after being declared victims of armed conflict in Colombia. The dataset includes family nuclei, age, gender, life course, ethnicity, and types of victimization from January 2017 to March 2026. It is hosted by the Colombian open data portal www.datos.gov.co and was last updated on May 18, 2026.