Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
150,784 datasets
Contracts Finder Notices 01 2022 contains public procurement notices from the UK government's official Contracts Finder portal for January 2022. The data is structured according to the Open Contracting Data Standard (OCDS) and is provided as daily flattened CSV files. This standardized format facilitates analysis of government spending and supplier activity for that month.
Replication data and code for a study analyzing the impact of natural disasters on corporate performance in China. The dataset, approximately 864 MB in size, likely contains firm-level financial and operational metrics linked to disaster events. It supports research into how environmental shocks affect business outcomes such as profitability and innovation.
Almost 10 times the number of light commercial vehicles were on-road in Queensland compared with heavy freight vehicles as of 30 June 2019. The number of registered light commercial vehicles more than doubled since 30 June 2001, while heavy freight vehicles increased by 49% in the same period. This dataset is provided by the Queensland Department of Environment, Tourism, Science and Innovation.
Supplementary material 4 from a study on decadal seafloor geodesy along the Nankai Trough. The dataset contains the average of standard deviations for coefficients used in estimating slip deficit rates for two directions, labeled "02" and "03". It was authored by Yusuke Yokota and is shared under a CC-BY-4.0 license.
57% of 112 surveyed German healthcare professionals treating cardiology patients reported using telemedicine. This dataset contains predictors of telemedicine use identified via Bayesian Model Averaging and an XGBoost model achieving 0.88 AUROC, created by Pascal Petit and last updated in April 2026. It likely includes variables related to professional role, knowledge, attitudes, and demographics.
112 healthcare professionals from a German cross-sectional survey provide data on telemedicine use determinants. The dataset contains the performance metrics and predictor importance results from a final XGBoost model developed by Pascal Petit, last updated in April 2026. The model achieved an AUROC of 0.88 and 79% accuracy in predicting telemedicine adoption.
57% of 112 surveyed German healthcare professionals reported using telemedicine. This 5.5 KB Excel file contains the performance metrics and predictor importance analysis from an XGBoost model predicting telemedicine adoption, authored by Pascal Petit and last updated in April 2026. The model achieved an AUROC of 0.88 and 79% accuracy using nested cross-validation.
Property tax records for Guadalajara de Buga, Colombia, spanning over six decades from 1960. The dataset includes detailed information on land and property taxes paid by owners. It is hosted by datos.gov.co and was last updated in May 2026.
A 5.5 KB Excel dataset presents results from a unified framework for evaluating machine learning-based Intrusion Detection Systems (IDS). The framework harmonizes features from the NSL-KDD and CICIDS2017 datasets and benchmarks models including Random Forest, which achieved 98.0% accuracy and 97.0% F1-score. Authored by Shailendra Mishra and last updated on April 20, 2026, this work focuses on reproducibility and statistical validation in cybersecurity research.
Shailendra Mishra's evaluation metrics reporting summary, published on figshare in April 2026. The 5.5 KB XLS file contains results from a unified framework for evaluating Intrusion Detection Systems (IDS). The framework harmonized features from the NSL-KDD and CICIDS2017 datasets and benchmarked supervised, unsupervised, deep learning, and ensemble models.
5.5 KB of statistical test results from a framework evaluating machine learning models for network intrusion detection. The dataset, authored by Shailendra Mishra and last updated in April 2026, contains results from Wilcoxon signed-rank, McNemar’s, and DeLong tests applied to models like Random Forest on harmonized NSL-KDD and CICIDS2017 datasets.
Shailendra Mishra's framework harmonizes features from the NSL-KDD and CICIDS2017 network intrusion datasets for evaluating machine learning models. The dataset, last updated in April 2026, is a 5.5 KB Excel file containing the harmonized data used in the study. Experimental results from the framework demonstrated a Random Forest model achieving 98.0% accuracy and 97.0% F1-score on this data.
A 5.5 KB dataset from figshare, last updated on 2026-04-20, containing results from an ablation study on machine learning models for intrusion detection. The work by Shailendra Mishra proposes a unified framework, harmonizing the NSL-KDD and CICIDS2017 datasets and benchmarking models including Random Forest, which achieved 98.0% accuracy and 97.0% F1-score.
A 5.5 KB Excel file containing harmonized features from two network intrusion datasets, NSL-KDD and CICIDS2017, for evaluating machine learning models. The dataset was created by Shailendra Mishra and last updated on April 20, 2026. It supports a framework for reproducible and statistically validated benchmarking of Intrusion Detection Systems.
Cross-validation results from a framework evaluating machine learning models for network intrusion detection. The dataset contains performance metrics from models like Random Forest, which achieved 98.0% accuracy and 97.0% F1-score on harmonized data. The work by Shailendra Mishra was last updated in April 2026.
A 5.5 KB Excel dataset created by Shailendra Mishra and last updated on April 20, 2026. It contains harmonized features from the NSL-KDD and CICIDS2017 network intrusion datasets, processed through a unified framework for evaluating machine learning-based Intrusion Detection Systems (IDS). The work includes results from benchmarking supervised, unsupervised, deep learning, and ensemble models.
Shailendra Mishra's framework evaluates Intrusion Detection Systems (IDS) using harmonized features from the NSL-KDD and CICIDS2017 datasets. The work benchmarks supervised, unsupervised, deep learning, and ensemble models, reporting a Random Forest model achieving 98.0% accuracy and 97.0% F1-score on the harmonized data. The dataset, last updated in April 2026, is a 5.5 KB Excel file detailing experimental results and trade-offs.
The Municipal Performance Measurement (MDM) dataset aims to measure and compare municipal management and development outcomes across the Department of Magdalena, Colombia. It includes scores for education, health, security, services, and governance, adjusted for initial municipal capacities. The data is hosted by datos.gov.co and was last updated on 2026-05-18.
Geospatial data identifies Food Production Protection Zones in the Sabana Centro province of Cundinamarca, Colombia. The dataset covers 11 municipalities prioritized by the Ministry of Agriculture and Rural Development. It includes columns for municipality, geometry, department, area in hectares, and administrative codes.
Global data tracks the percentage of children born in the last 24 months who were put to the breast within one hour of birth. The dataset is sourced from UNICEF Data and Analytics and is available in CSV and XML formats. It provides a standardized metric for monitoring progress on a key infant and young child feeding practice.