Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
141,870 datasets
Experimental results from 2026 evaluate an integrated Intent-Based Networking security system. The dataset, authored by Kumar Sekhar Roy and shared under CC-BY-4.0, contains metrics for cryptographic correctness, access control performance, anomaly detection capability, and scalability under increasing workloads. It is a small dataset of 5.5 KB.
5.5 KB of experimental results evaluating an integrated Intent-Based Networking security system. The data, authored by Kumar Sekhar Roy and last updated in May 2026, likely contains performance metrics from a controlled prototype testbed. It assesses cryptographic correctness, access control performance, anomaly detection capability, and scalability under increasing workloads.
2,604 elderly patient records from Sheba Medical Center in Israel between 2017 and 2023 were used to develop and compare machine learning models for predicting one-year mortality. The dataset includes clinical, demographic, perioperative, and laboratory variables. Adi Shuchami published the study on figshare in 2026.
A retrospective cohort study of 2,604 elderly patients undergoing urgent hip fracture surgery at Sheba Medical Center, Israel, between January 2017 and November 2023. Adi Shuchami published the dataset on figshare in 2026. It compares manual and automated machine learning models for predicting one-year all-cause mortality.
2,102 patient records from a retrospective study of Chronic Obstructive Pulmonary Disease (COPD) patients admitted between 1 January 2019 and 31 December 2024. The dataset was used to develop and compare machine learning models for predicting the risk of acute exacerbations, with the XGBoost model achieving an AUC of 0.960 in training and 0.824 in testing. It was authored by Dapeng Kuang and shared under a CC-BY-4.0 license.
A retrospective study of 2,102 COPD patients admitted between 1 January 2019 and 31 December 2024. The dataset was used to develop and compare machine learning models for predicting acute exacerbation risk, with the XGBoost model achieving an AUC of 0.824 on a test set. The data was contributed by Dapeng Kuang and is shared under a CC-BY-4.0 license.
A virtual library of 14,300 derivatives based on the 8-nitro-2,6-dihydrotetrazolo[1,5-c]pyrimidin-5(3H)-one scaffold for energetic materials discovery. Predictive models for ten physicochemical properties were developed using Extreme Gradient Boosting (XGBoost) on density functional theory (DFT) data from 554 representative samples. The dataset was created by Jing Yang and last updated on 2026-05-26.
A dataset of 1,968 singleton pregnancies without diabetes, used to develop machine learning models for predicting large for gestational age (LGA). The data includes lipid profiles measured at early (11–14 weeks) and mid-pregnancy (20–24 weeks). The dataset was created by Wanqing Liu and is available under a CC-BY-4.0 license.
Myanmar, Thailand, and Cambodia are covered by this regional mangrove extent map derived from Landsat 1-2 MSS imagery. It provides a 15,420.51 km² baseline at 30-meter nominal scale, generated using a Random Forest algorithm to analyze long-term mangrove distribution changes. The dataset addresses a historical data gap for the 1970s, prior to widespread Earth Observation data availability.
Land use data for New South Wales, Australia, collected between June 2000 and June 2007. The dataset classifies land according to three separate schemes: the NSW Land Use Mapping Program (LUMAP), the NSW SCALD classification, and the Australian Land Use and Management (ALUM) classification. It was produced by the NSW Department of Climate Change, Energy, the Environment and Water and updated in May 2011.
NASA's Crustal Dynamics Data Information System provides daily 24-hour files of ground-based GNSS observation data sampled at 30-second intervals. The archive contains data from a global permanent network of receivers, primarily for GPS and GLONASS, with multi-GNSS data from Galileo, Beidou, and others added since 2011. Files are stored in compact RINEX format, one per site per day.
NASA's Crustal Dynamics Data Information System archives daily 24-hour files of ground-based Global Navigation Satellite System observation data with a 30-second sampling rate. The dataset contains data primarily from GPS and GLONASS, and since 2011 includes data from Galileo, Beidou, QZSS, IRNSS, and SBASs. Files are stored in RINEX format, one per site, from a global permanent network of receivers.
Riccardo Boscariol's dataset bundle accompanies the 2026 paper 'Artificial Intuition for Forecasting the Future'. It includes a master per-session spreadsheet with 50 rows and an aggregated trial-level dataset of 19,997 trials. The bundle provides column definitions mapped to the paper and was last updated on 2026-05-24.
A 55.9 KB ZIP archive of cluster models extracted from DFT-optimized periodic structures. The data supports two studies published in 2025 in the Journal of Physical Chemistry C and on ChemRxiv. It contains models of FAU-type zeolites with exchanged metal cations and guest molecules like 5-fluorouracil, carbamazepine, caffeine, mercaptopurine, and nicotine.
East Antarctica subglacial basin data includes filtered bed elevation points and detected fault lines. The dataset contains MATLAB functions for estimating Euler poles and small circle centers, supporting the associated 2026 Nature Geoscience paper. It was authored by Egidio Armadillo and last updated in June 2026.
Data Sheet 1 contains data from a retrospective multi-cohort analysis of 614 perioperative episodes in elderly burn patients. The study, authored by Xiaohui Yuan and last updated in May 2026, used data from the MIMIC-IV and eICU databases to develop machine learning models for predicting inflammatory marker trajectories and optimizing anesthesia dosing. External validation was performed on 206 independent episodes.
500 infertility patient records from a single-center retrospective study conducted between 2020 and 2023. The dataset was used to develop and internally validate machine learning models for predicting ovarian hyperstimulation syndrome risk after oocyte retrieval. The author is Xiangqian Meng, and the data was last updated on 2026-05-25.
Xiangqian Meng published a dataset of clinical and laboratory data from 500 infertility patients undergoing controlled ovarian hyperstimulation at Jinxin Xinan Women & Children Hospital between 2020 and 2023. The data was used to develop and internally validate machine learning models for post-retrieval ovarian hyperstimulation syndrome risk stratification. The dataset includes 40 diagnosed OHSS cases and was used to train models achieving an AUC of 0.81.
A 42-year global time series (1979-2020) of terrestrial water storage changes, reconstructed by Siyou Xu to address gaps in GRACE/GRACE-FO satellite data. The dataset was created using a method combining principal component analysis, seasonal-trend decomposition, and a genetic algorithm-optimized neural network with spatial mode correction. It is validated against observed satellite data and independent hydrological models.
A dataset of 1,103 adult patients with end-stage kidney disease who underwent continuous renal replacement therapy (CRRT). It was used to develop and compare multiple machine learning models for predicting intradialytic hypotension (IDH) within 6 hours of CRRT initiation. The dataset was created by Shuang Qiu and last updated in May 2026.