Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
141,962 datasets
The 2025 edition of the Guía Peñín provides the source for this analysis of visual communication in Spanish wine. Fernando Suárez-Carballo conducted a content analysis on 63 labels from the 100 highest-rated wines, examining plastic, iconic, and linguistic signs. Results show a high degree of visual similarity and associations between Denomination of Origin and label components.
Weiqi Zeng's study presents a machine learning model for classifying drug craving levels among 629 abstainers recruited from Compulsory Isolation Drug Rehabilitation Centers. The dataset likely contains 18 demographic and behavioral features used to train and evaluate seven algorithms, with Logistic Regression selected as the optimal model. The work was published on figshare in May 2026 under a CC-BY-4.0 license.
A cohort of 101 people living with HIV from Guangzhou Eighth People's Hospital, enrolled in late 2023, was studied to identify baseline metabolic predictors of immunological non-response after antiretroviral therapy. The dataset includes quantified plasma levels of 189 metabolites and 92 cytokines, with six metabolites showing significant differential abundance between immunological responders and non-responders. The research was authored by Heping Zhao and shared under a CC-BY-4.0 license.
196 patient records with clinical and optical coherence tomography features for predicting poor functional visual outcomes after anti-VEGF treatment for macular edema secondary to retinal vein occlusion. The dataset was created by Haiyue Yu and published on figshare in May 2026. It includes a temporal validation split with training data from 2021-2024 and an independent test set from 2025.
Jinghan Liu's dataset contains records from 276 Crohn's disease patients aged 18–45, prospectively recruited from a tertiary center in China. The data was used to construct and validate interpretable machine learning models for predicting fertility intentions. The dataset was last updated on May 22, 2026.
276 prospectively recruited patients with Crohn's disease aged 18–45 years from a tertiary center in China. Jinghan Liu published this dataset on figshare in 2026, containing variables used to train and validate interpretable machine learning models predicting fertility intentions. SHapley Additive exPlanations analysis identified marital status, desired number of children, and perceived family support as the most influential predictors.
276 patient records from a prospective study of reproductive-age Crohn's disease patients at a tertiary center in China. The dataset includes demographic, clinical, and psychosocial variables used to train and validate machine learning models predicting fertility intentions. It was created by Jinghan Liu and last updated in May 2026.
Table 4_Development of machine learning-based predictive models for fertility intentions in patients with Crohn's disease.xlsx contains data from a study of 276 reproductive-age Crohn's disease patients recruited from a tertiary center in China. The dataset, authored by Jinghan Liu and last updated in May 2026, was used to train and validate interpretable machine learning models predicting fertility intentions based on demographic, clinical, and psychosocial variables.
101 plasma metabolite and cytokine profiles from people living with HIV, collected at Guangzhou Eighth People’s Hospital in late 2023. The data includes 189 quantified metabolites and 92 cytokines, used to identify six metabolites associated with immunological non-response. Authored by Heping Zhao and published under CC-BY-4.0 in May 2026.
A spatial dataset from the Government of British Columbia, last updated on 2026-06-03, representing designated Timber Supply Areas (TSAs) and TSA Supply Blocks. TSAs are the primary unit for determining allowable annual cuts (AAC) for forestry, based on patterns of wood flow to primary industries. The data is intended to support integrated resource management and improve AAC calculations.
100,988 experimental data points underpin viscosity models for 952 pure substances, correlating shear viscosity with residual entropy. Timo Klenk published this dataset on figshare in May 2026, proposing a revised ansatz function and a conservative outlier removal approach. The models achieve an average mean absolute relative deviation of 3.1%.
Differential code biases (DCBs) are systematic errors between GNSS code observations, required for precise positioning and ionospheric analysis. NASA's Crustal Dynamics Data Information System (CDDIS) archives this multi-constellation product, which includes data from GPS, GLONASS, Galileo, Beidou, QZSS, IRNSS, and SBAS since 2011. The dataset supports navigation, time transfer, and scientific studies of the Earth's ionosphere and crust.
Esquema de publicación UARIV is a structured information registry from Colombia's Unit for the Comprehensive Care and Reparation of Victims (UARIV). It details the agency's public information assets, including their update frequency, responsible parties, and formats. The schema is updated semestrally, with historical versions available from 2019 to 2022.
A 2026 dataset by Bing Ma integrates 1822 in-house failed synthesis records and 2603 successful literature cases for covalent organic frameworks. It supports a random forest model achieving 91% solvent prediction accuracy and high R² scores for temperature and time prediction. The dataset underpins the ML-COF toolkit for predicting synthesis conditions.
A dataset of 4,425 synthesis records for covalent organic frameworks (COFs), integrating 1,822 in-house failed attempts and 2,603 successful cases from the literature. The data was used to train a random forest model achieving 91% accuracy in solvent prediction and high R² scores for reaction temperature and time. The dataset was created by Bing Ma and last updated on 2026-05-21.
A 2026 study by Lovro Sinkovič analyzed 243 onion genotypes from the Slovenian Plant Gene Bank. The dataset combines cytoplasmic markers, nuclear SSR and ILP markers, and phenotypic descriptors to evaluate the relationship between genetic structure and traits like bulb size. Findings indicate significant population structure but limited predictive power of genetics for multi-trait performance.
October-November 2018 data from a voyage investigating biogeochemical variability in the Polar Front meander south of Tasmania. Chief Scientists Helen Phillips and Nathan Bindoff led the expedition, with data collected by Nic Pittman and Clara Vives. Data include CTD nutrients, chlorophyll, oxygen, underway phytoplankton physiology, and pCO2.
Six image datasets derived from laboratory flotation videos, designed to study the impact of data leakage on deep learning model reliability. The datasets were created by Chunlong Zhang and published on figshare in May 2026. They were used to train and test five deep learning models, including Inception-V3 and Vision Transformer, for flotation concentrate ash recognition.
163 newly diagnosed metastatic breast cancer patients form the basis of this retrospective study. Wen-xiong Nong authored this research, which evaluates 17 nutrition- and inflammation-related indices derived from routine blood tests for their prognostic value. The dataset was last updated on 2026-05-21.
A dataset supporting a machine learning model for predicting chronic kidney disease progression in elderly adults with hyperglycemia. The study followed TRIPOD+AI guidelines, using data from four community sites for training and validation. The XGBoost model achieved AUCs of 0.905, 0.809, and 0.837 on training, internal test, and external validation sets.