Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
141,962 datasets
A dataset supporting a machine learning model for predicting chronic kidney disease progression in elderly adults with hyperglycemia. The study followed TRIPOD+AI guidelines, using data from four community sites for training and validation. The XGBoost model achieved AUCs of 0.905, 0.809, and 0.837 on training, internal test, and external validation sets.
632,252 expert annotations are associated with seafloor images from 21 Antarctic research campaigns between 1985 and 2019. This dataset is a static snapshot of the Antarctic Seafloor Annotated Imagery Database (AS-AID) images, collated by the Australian Ocean Data Network. All images are also accessible from https://data.imas.utas.edu.au/imagery/IMAS_Antarctic/.
Retrospective data from 559 chronic-phase chronic myeloid leukemia patients treated with flumatinib between 2018 and 2024. The dataset was used to evaluate three comorbidity scoring systems (CCI, ACE-27, CIRS-G) for predicting 12-month major molecular response using machine learning models like XGBoost. It was authored by Yuanlan Yang and last updated in May 2026.
Li Luo's dataset contains computational results identifying PGD, MAPK14, and KRAS as diagnostic biomarkers for neonatal sepsis. The data was generated from transcriptomic analysis of patient samples (GSE69686, GSE25504) using machine learning and bioinformatics methods. It was last updated on 2026-05-29.
360 fuel moisture samples were collected from lodgepole pine posts between November 2004 and June 2005. The data includes sample wet weight, oven-dry weight, and calculated moisture content to evaluate the effects of chainsaw versus hand saw cutting methods. The dataset was authored by Sally M. Haase and published on figshare.
Six early risk factors for postpartum depression were identified from a survey of pregnant women in Shandong Province, China, using interpretable machine learning. The dataset likely contains survey responses from participants hospitalized between July 2023 and January 2025, collected by Shusen Lin. The final LightGBM model was validated internally and externally and analyzed using SHAP for global interpretability.
A formative evaluation from 2021 to 2024 of the TB Think Tank's influence on tuberculosis policy in South Africa. The dataset likely contains qualitative data from 14 in-depth interviews with key stakeholders, collected by author Bey-Marrie Schmidt and last updated in May 2026. It supports analysis of the platform's role in knowledge translation and policy advising.
Mzee Kulenga demonstrates the technique of folding spokes to form a conical basket shape and begins a second row of twining. The 64.3 MB WAV file is an ethnographic recording of a craft demonstration where the palm rope breaks, prompting an explanation about material preparation. Authored by Marie-Annick Moreau, this audio file was last updated on June 3, 2026, and is shared under a CC-BY-NC-SA-4.0 license.
Sub Integrated Care Board (SICBL) locations in England as of 1st April 2026, containing names and codes. The dataset includes three text fields: SICBL26CD, SICBL26CDH, and SICBL26NM, with defined field lengths of 9, 3, and 65 characters respectively. It is provided by the Office for National Statistics and was last updated on 15 April 2026.
An index of information classified as confidential or reserved by the Colombian Unit for the Comprehensive Care and Reparation of Victims (UARIV). The dataset includes records from 2019 to 2022, with updates performed on a semiannual basis. It is published on the Socrata platform via the Colombian open data portal.
VIIRS/NPP Albedo Daily L3 Global 500 m SIN Grid NRT (VNP43IA3N) provides daily surface albedo values at a 500-meter resolution. It uses a 16-day rolling window of VIIRS data and the RossThick/Li-Sparse-Reciprocal BRDF model to generate black-sky and white-sky albedo layers. The product includes 9 Science Dataset layers for albedo and quality assessment across VIIRS imagery bands I1, I2, and I3.
Global land surface albedo data is provided at a 1 km spatial resolution on a daily basis. The dataset contains 36 science data layers, including black-sky and white-sky albedo for nine VIIRS moderate bands and three broadbands. It is generated using a 16-day rolling window of VIIRS observations and the RossThick/Li-Sparse-Reciprocal BRDF model to correct for surface anisotropic effects.
Monitoring data on the extent and rate of change for estuarine wetlands in Queensland, Australia, from 2001 onward. More than 96% of the pre-European settlement extent of estuarine wetlands in Queensland remained in 2017. The dataset is provided by the Queensland Department of Environment, Science and Innovation under a CC-BY-4.0 license.
A retrospective cohort of 588 adolescent patients who received psychological and psychiatric assessments. The data includes candidate predictors such as demographic characteristics, psychological and emotional status, behavioral characteristics, and peer support, used to develop a machine learning risk prediction model. The dataset was created by Yujun Zhao and last updated on 2026-05-13.
Luke Vassallo published a research paper on figshare in May 2026 detailing methods for spiking neural networks. The 128.6 KB PDF describes experiments on the SHD speech recognition dataset, showing performance improvements from learnable synaptic and axonal delays. The findings aim to benefit the design of power and area-constrained neuromorphic processors.
Frame-level agreement scores between the BudFinder model and manual annotations for budding events across three yeast genetic backgrounds. The dataset, shared by Phuc Nguyen on figshare in May 2026, contains percentages of predicted events matching ground truth exactly or within tolerance windows of ±1 or ±2 frames. The model captured >78% of events within ±1 frame and >89% within ±2 frames across strains, including an oscillator and a sir2Δ mutant.
May 1 to October 1, 2017, this dataset provides near-daily lake area timeseries for 85,358 lakes across four Arctic-Boreal study areas in Northern Canada and Alaska. The area estimates were derived from high-resolution Planet Labs CubeSat imagery, capturing each lake's mean, minimum, and maximum area and seasonal dynamism. The dataset is produced by the National Aeronautics and Space Administration.
Zircon U-Pb isotopic age data for the Miocene Onnagawa Formation in the Yashima Area, Akita Prefecture, Northeast Japan. The dataset includes concordia diagrams, photographs of zircon grains, and operational analysis details. It was authored by Takeshi Nakajima and last updated on 2026-05-19.
pone.0348670.t025 is a collection of three benchmark clinical datasets used to evaluate a unified deep learning framework for disease prediction. The data includes the UCI Heart Disease dataset (303 samples, 13 features), the PIMA Indians Diabetes dataset (768 samples, 8 features), and a Parkinson's disease voice dataset (195 recordings, 22 features). It was authored by Vijay U. Rathod and last updated on 2026-05-08.
5.5 KB of Wilcoxon signed-rank test results comparing deep learning models on three clinical benchmark datasets. The data, authored by Vijay U. Rathod and last updated on 2026-05-08, supports the evaluation of a unified deep learning framework for disease prediction. Results include AUC scores for models like FT-Transformer and CNN ensembles on heart disease, diabetes, and Parkinson's disease detection tasks.