Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
141,764 datasets
A publicly available tabular dataset for predicting the presence of gallstones, a condition affecting the gallbladder. The dataset likely contains 38 clinical features, including C-Reactive Protein (CRP) and Vitamin D, which were identified as the most influential for prediction. It was authored by Prosenjit Das and last updated on June 1, 2026.
Geoscience Australia and CSIRO Marine & Atmospheric Research collected three years of continuous methane and carbon dioxide measurements at the 'Arcturus' monitoring station in the Bowen Basin, Australia. The dataset underpins a simulation study on the sensitivity of atmospheric techniques for detecting fugitive emissions from a simulated new coal seam gas field. Results were presented at the American Geophysical Union meeting in December 2013.
Experimental results for State of Charge (SOC) estimation errors in lithium-ion batteries under different temperature conditions. The 5.5 KB XLS file was authored by Huipin Lin and last updated on June 4, 2026. It contains results from experiments using the University of Maryland’s Dynamic Stress Test (DST), US06 test, and Federal Urban Driving Scheme (FUDS) datasets.
Huipin Lin published experimental results on June 4, 2026, comparing a hybrid time-series convolutional network with Unscented Kalman Filter against other deep learning models for battery state-of-charge estimation. The dataset likely contains error metrics from tests using the University of Maryland's Dynamic Stress Test (DST), US06, and Federal Urban Driving Scheme (FUDS) datasets. The proposed method achieved mean absolute error values between 1.015% and 1.470% under different driving conditions.
Yu He's dataset supports a comparative study of baseline 18F-FDG PET/CT habitat radiomics versus dual-channel deep learning for predicting early metabolic response in diffuse large B-cell lymphoma. It contains data from 148 patients (101 EMR, 47 non-EMR) retrospectively enrolled between December 2018 and August 2024. The dataset was last updated on June 4, 2026.
268 survey responses evaluate Continuing Medical Education programs from 2021–2022. The data includes participant demographics, overall satisfaction scores, and criterion-based evaluations across course phases. Hela Ghali published this dataset on figshare under a CC-BY-4.0 license.
MOP03N_109 provides daily mean-gridded carbon monoxide (CO) profile and total column retrievals from near-infrared radiances measured by the MOPITT instrument aboard NASA's Terra satellite. This is a non-validated beta product subject to recalibration, containing gridded averaging kernels alongside the retrievals. Data collection is ongoing, originating from an instrument launched in 1999 and funded by the Canadian Space Agency.
80 machine learning models were benchmarked for cardiovascular risk prediction using the Kaggle Cardiovascular Disease dataset. The results, ordered by AUC-ROC, compare four base architectures across 20 distinct methodological experiments. The dataset was published by 'Predicción Cardiovascular' on figshare in June 2026.
51 chrysoprase gemstone samples and 676 synthetic green reference points measured for CIE L*a*b* color values using an X-Rite SP62 spectrophotometer. Author Yuansheng Jiang published this dataset on figshare under a CC-BY-4.0 license, last updated on 2026-05-15. The data was used to train and validate machine learning models, including logistic regression and neural networks, for objective gemstone color grading.
An AI-driven fire risk forecasting framework based on real incident data from 55 urban villages in Beijing. The proposed IGWO-LSTM-IL model achieved a 92.57% reduction in mean squared error compared to a baseline LSTM. The dataset, shared by Jiangxue Tian, was last updated in June 2026.
Quarterly updated postal code geolocation data from the Quebec Address Repository allows automated matching of postal codes to various administrative divisions. The Institut de la Statistique du Québec developed this table, which is based on data from the Ministry of Natural Resources and Forests. It facilitates geocoding thousands of observations by linking postal codes to municipalities, regional county municipalities, and electoral districts.
Approximately 300 water samples from July 2016-2019 in Alaska's Yukon-Kuskokwim Delta were used to train gradient boosting models predicting dissolved methane and carbon dioxide concentrations for ~17,000 waterbodies. The models integrate Sentinel-2, Sentinel-1, DEM, and landcover data within a Google Earth Engine and SAGA workflow to estimate diffusive fluxes. Outputs include watershed landcover data and predicted fluxes in GeoTIFF and shapefile formats.
Data from the SWOT mission launched on December 16, 2022, provides interim geophysical measurements of sea surface height, significant wave height, and wind speed from the Poseidon-3C nadir altimeter. The dataset consists of discrete measurements along the satellite's nadir track with sampling resolutions of approximately 6-km at 1Hz and 300-m at 20Hz, processed using preliminary orbit and auxiliary data. It is distributed as one netCDF-4 file per half-orbit with a nominal latency of less than 1.5 days.
41.5 MB of COMSOL Multiphysics model files supporting a 2026 paper on fracture-matrix modeling. The dataset includes models for sample S2S4 under confining stress in two directions and three validation files. It was created by Anne Sofie Darket and published on figshare under a CC-BY-4.0 license.
A Chinese study from the Second Affiliated Hospital of Zhejiang University retrospectively reviewed 189 patients who underwent elective lung transplantation between May 2023 and November 2024. The research developed a machine learning-based prediction model for venous thromboembolism risk in lung transplant recipients supported by ECMO. The Random Forest model demonstrated an AUC of 0.895 on the validation set.
Yan Zhu's study developed a machine learning model to predict venous thromboembolism (VTE) in lung transplant recipients supported by ECMO. The work is based on a retrospective review of 189 patients who underwent elective lung transplantation at the Second Affiliated Hospital of Zhejiang University from May 2023 to November 2024. The Random Forest model demonstrated an AUC of 0.895 on the validation set.
752 patients undergoing oral cancer free flap reconstruction between 2017 and 2025 were analyzed to predict enteral feeding intolerance. The dataset includes 35 perioperative variables, with a random forest model achieving an AUC of 0.889 for predicting the 36.04% incidence of FI. This interpretable model was developed by Baolin Jia and shared under a CC-BY-4.0 license.
A single-center retrospective study of 752 patients undergoing radical resection with free flap reconstruction for oral cancer between 2017 and 2025. The dataset, created by Baolin Jia, includes 35 perioperative variables used to predict feeding intolerance, which occurred in 36.04% of patients.
Jamie Davis published three pre-rendered time-series reference datasets for embedded firmware engineers on figshare in June 2026. The data is structured in uniform signed Q16.16 fixed-point formats for direct microcontroller integration. It includes benchmarks for BLDC motor step response, sensor fault transients, and thermal winding stress.
A lookup table linking Lower Layer Super Output Areas (LSOA) to multiple English health and administrative geographies as of 1 April 2026. The dataset contains 12 text fields mapping LSOA codes and names to Sub Integrated Care Board Locations (SICBL), Integrated Care Boards (ICB), Cancer Alliances (CAL), and local authority districts (LAD). It was published by the Office for National Statistics and last updated on 13 May 2026.