Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
141,764 datasets
A June 2024 to June 2025 wetland vegetation map for Narran Lakes produced by the NSW Department of Climate Change, Energy, the Environment and Water. It was created using a machine learning classification workflow incorporating Sentinel-1 radar, Sentinel-2 optical, LiDAR, and terrain data. The product serves as a landscape-scale baseline for environmental water planning and conservation management in the Murray-Darling Basin.
May 2026 digital vector boundaries for Senedd Cymru constituencies in Wales. The dataset is provided by the Office for National Statistics and contains full-resolution boundaries clipped to the coastline.
A best fit lookup table mapping 2021 Middle Layer Super Output Areas (MSOAs) to Counties and Unitary Authorities in England and Wales as of April 2025. The dataset is provided by the Office for National Statistics and was last updated on the platform in May 2026. It contains six text fields for area codes and names in English and Welsh.
England and Wales administrative geography lookup linking civil parishes, electoral wards, and local authorities as of 7th May 2026. The dataset is provided by the Office for National Statistics and was last updated on 13th May 2026. It includes nine text fields containing codes and names for each geographic level.
A high-resolution terrestrial CO₂ record spanning 7.5 to 4.0 million years ago, generated by Hanzhao Zhai in 2026. The dataset was created by applying a machine learning-based paleosol-CO₂ inversion method to deposits from the Jiaxian and Shilou sections on the Chinese Loess Plateau. It integrates multiple proxy measurements including carbonate and organic carbon isotopes, elemental concentrations, grain size, magnetic susceptibility, and formation temperature.
A dataset of 78 obese children aged 6–18 years from a single-center outpatient clinic, used to identify body composition-based obesity phenotypes. The data includes 13 BIA-derived indices analyzed via unsupervised clustering and interpretable machine learning. It was authored by Yuhang Wang and last updated in June 2026.
SENC26CD, SENC26NM, and SENC26NMW are the three text fields listed for this dataset. It contains the names and codes for Senedd Cymru (Welsh Parliament) constituencies in Wales as of 7th May 2026, provided by the Office for National Statistics. The dataset was last updated on 13th May 2026.
A database of Hidden Markov Models (HMMs) for protein domains, mapped from the TED database to CATH structural classifications and filtered for a maximum pairwise sequence identity of 30%. Claudia Alvarez-Carreño authored this dataset, which was last updated on June 4, 2026. It contains 934,186 HMMs spanning 4,688 CATH superfamilies.
Office for National Statistics provides an exact fit lookup file between Middle layer Super Output Areas (MSOAs) from December 2011 and December 2021, and Local Authority Districts from December 2022 in England and Wales. The dataset includes a 'change indicator' field with four categories to define changes between the 2011 and 2021 MSOA boundaries. This version 2 includes updates to the change indicator for splits that went to complexes in under 10 MSOAs.
Three real-world energy datasets were used to evaluate a novel forecasting model called DG-LSTM-SA. Guoqiang Sun published these data statistics on figshare in June 2026. The datasets likely contain time-series records of power generation and load demand.
Hyperparameters for ten baseline models evaluated in a study proposing a DG-LSTM-SA network for power generation and load demand forecasting. The dataset, authored by Guoqiang Sun and uploaded to figshare, is a 5.5 KB Excel file last updated on June 3, 2026. The models were tested on three real-world energy datasets: NEPOOL, Yichang, and Solar-Energy.
Hyperparameters for the DG-LSTM-SA model, a Deep Gated Long Short-Term Memory network with Self-Attention designed for power generation and load demand forecasting. The model was evaluated on three real-world energy datasets (NEPOOL, Yichang, and Solar-Energy) and outperformed ten baseline models. The dataset was authored by Guoqiang Sun and last updated on June 3, 2026.
A systematic review and meta-analysis of 13 studies evaluating machine learning models for predicting severe Mycoplasma pneumoniae pneumonia in Chinese children. The dataset, authored by Juan Cao and published on figshare in June 2026, synthesizes model performance metrics and methodological characteristics. Reported model discrimination, measured by the area under the receiver operating characteristic curve, ranged from 0.81 to 0.90.
13 studies were included in a systematic review evaluating machine learning models for predicting severe Mycoplasma pneumoniae pneumonia in Chinese children. The review, authored by Juan Cao and published on figshare in June 2026, synthesized performance metrics and methodological characteristics from literature up to November 2025. It reports model AUC values ranging from 0.81 to 0.90.
Economic statistics from the Yukon Bureau of Statistics, part of the Community Statistics collection. The data is provided by the Government of Yukon under the OGL-CA-2.0 license and was last updated on 2026-06-03. It includes key economic indicators for the Yukon territory.
A retrospective multicenter study of 475 patients from two medical centers, allocated to training (332) and external validation (143) cohorts. The dataset was created by Shuai Qie and last updated in June 2026. It aims to predict response to immunochemotherapy in EGFR-mutant lung adenocarcinoma patients after third-generation TKI resistance.
Model performance results from a study using a CatBoost classifier and Sea Lion Optimization Algorithm to predict gallstone disease from tabular data. The dataset includes accuracy, F1-score, precision, and recall metrics for models using 38 features and a subset of 19 selected features. It was authored by Prosenjit Das and last updated on June 1, 2026.
A publicly available tabular dataset for predicting the presence of gallstones, a condition affecting the gallbladder. The dataset likely contains 38 clinical features, including C-Reactive Protein (CRP) and Vitamin D, which were identified as the most influential for prediction. It was authored by Prosenjit Das and last updated on June 1, 2026.
Model results using CB classifier presents performance metrics for machine learning models predicting gallstone disease from tabular data. The dataset includes results from a CatBoost classifier and a Sea Lion Optimization Algorithm (SLOA)-optimized model, reporting mean accuracy, F1-score, precision, and recall from 5-fold cross-validation. It was authored by Prosenjit Das and last updated on 2026-06-01.
A tabular dataset used to study predictive models for gallstone disease presence. The dataset likely contains 38 clinical features, with model evaluation results showing a mean accuracy of 79.58% for a CatBoost classifier. The data was published by Prosenjit Das on figshare in June 2026.