Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
141,959 datasets
Ruolan Xiong developed a machine learning model for storm surge forecasting using data from 42 tropical cyclones between 2000 and 2023. The dataset likely contains model outputs and inputs, including tropical cyclone parameters and observed surge data from nine stations in the Pearl River Estuary. It was last updated on 2026-05-19 and is shared under a CC-BY-4.0 license.
A global experiment dataset supporting research on spatial self-organization of termites and fungi as primary deadwood decomposers. The data, authored by Donghao Wu and last updated in May 2026, includes measurements of decomposition rates, termite and fungal spatial occupancy, and associated environmental variables. It is used to analyze how temperature and anthropogenic pressure influence decomposer clustering and overdispersion, shaping global carbon cycling.
United States wastewater surveillance data provides a complete weekly history of viral activity levels for SARS-CoV-2, Influenza A, and RSV from sampling locations across the country. The dataset is updated weekly on Fridays and includes site-level metrics, population served, and geographic identifiers. Columns suggest it tracks pathogen concentration trends over time for public health monitoring.
Eighteen Science Dataset layers provide Nadir BRDF-Adjusted Reflectance (NBAR) estimates for nine VIIRS moderate bands at 1-kilometer resolution. The product is generated daily using a 16-day rolling window of VIIRS/NPP data and employs the RossThick/Li-Sparse-Reciprocal BRDF model to correct for view angle effects. Researchers can use the provided BRDF parameters and albedo layers to model surface anisotropy and calculate black-sky, white-sky, and instantaneous blue-sky albedo.
A semiannually updated registry of information assets from the Colombian Unit for the Attention and Comprehensive Reparation of Victims (UARIV). The dataset catalogs records from 2019 to 2022, detailing the format, description, and management of public information assets. It is published via the Socrata platform on the national open data portal.
AMGST, an Adaptive Multi-Graph Convolution and Spatiotemporal Multi-Head Self-Attention Network, demonstrates performance on four public traffic datasets. The 5.5 KB Excel file, authored by Pei Shi and last updated in June 2026, compares results under different K values. It includes speed and flow measurements from traffic datasets.
AMGST is a collection of four public traffic datasets used for benchmarking a novel traffic forecasting model. The datasets contain speed and flow measurements, as described in the research paper by Pei Shi. The data was last updated on 2026-06-04.
Experimental results from the AMGST model for traffic forecasting, shared by author Pei Shi on figshare. The dataset, last updated on 2026-06-04, contains results from tests on four public traffic datasets measuring speed and flow. The file is 9.5 KB in size and is available in XLS format.
5.5 KB of traffic data in XLS format, published by Pei Shi on figshare in June 2026. The dataset contains speed and flow measurements from four public traffic datasets and was used to evaluate the AMGST forecasting model.
5.5 KB of data in XLS format, uploaded by Pei Shi on figshare in June 2026. The dataset contains computation time metrics for the PEMS08 traffic dataset, used in experiments evaluating the AMGST traffic forecasting model. The model integrates adaptive multi-graph convolution and spatiotemporal attention to predict traffic speed and flow.
Spatially-contiguous global mean daily solar-induced chlorophyll fluorescence (SIF) estimates are provided at 0.05-degree (approximately 5 km) spatial and 16-day temporal resolution. The dataset was produced by NASA using an artificial neural network trained on OCO-2 SIF observations and MODIS reflectance data, covering the period from September 2014 through July 2020. This high-resolution product enhances the synergy between satellite SIF measurements and ground-based photosynthesis studies.
Fifty spectral bands of calibrated radiance data were captured by the MODIS/ASTER Airborne Simulator during 12 NASA ER-2 flights over California and Nevada in spring 2023. This dataset provides Level 1B georeferenced imagery and derived Level 2 products including land surface temperature and emissivity. It serves as a benchmark for observing ecosystem states and natural disaster impacts.
TLC authorized For-Hire vehicles in New York City that are currently inactive. The dataset is updated daily between 4–7 PM and provides a snapshot of vehicles with suspended, expired, or otherwise inactive medallion licenses. It includes details on suspension reasons, expiration dates, and vehicle identification for regulatory tracking.
52 soccer players' creatine kinase concentrations were collected 24 to 48 hours after matches across six full seasons of the Brazilian championship. Alessandro Haupenthal published this dataset on figshare to investigate individualized threshold values for CK monitoring. The data includes CK concentrations from 20 matches and elements from the 5th matches' box plots used to determine individual reference thresholds.
11.8 MB of simulation results and analysis supporting a method for fitting Sparse Markov Models to categorical time series. The dataset, authored by Tuhin Majumder and last updated in May 2026, includes PDF, ZIP, and TXT files detailing an approach using convex clustering and regularization. It contains extensive simulation studies under different set-ups and a real data analysis on disease sub-type modeling and classification.
UK land areas at risk of flooding from rivers and the sea under a 3.3% annual exceedance probability scenario, assuming flood defences function as designed. The dataset is produced by the Environment Agency and was last updated in May 2026. It is intended for area-level risk indication, not for assessing risk to individual properties.
7.3 MB of supplemental materials for the Nether platform, a hybrid educational tool integrating a physical interactive book, a gamified mobile app, and Augmented Reality resources for Environmental Education. The repository includes a digital prototype of the educational material 'Entre Faíscas e Cinzas', an application script, an anonymized consent form, and app interface screens. These artifacts were authored by Giovanna Calado da Cruz Bonilha and last updated on 2026 05 31.
A one-year study of 24 emerging adults (ages 18–21) collected daily ecological momentary assessments and continuous smartphone sensor data. The dataset, authored by Coralie S. Phanord and shared under CC-BY-4.0, compares idiographic and nomothetic XGBoost models for predicting daily affect and stress from behavioral features like sleep, activity, mobility, and phone use.
A document outlining the six foundational principles guiding Geoscience Australia's scientific activities. The principles are Relevance to Government, Collaborative science, Quality science, Transparent science, Communicated science, and Sustained science capability. The document was published by the Australian Ocean Data Network and last updated in June 2026.
The 2025 edition of the Guía Peñín provides the source for this analysis of visual communication in Spanish wine. Fernando Suárez-Carballo conducted a content analysis on 63 labels from the 100 highest-rated wines, examining plastic, iconic, and linguistic signs. Results show a high degree of visual similarity and associations between Denomination of Origin and label components.