DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Machine Learning Datasets | DataSalon

All Categories

🤖

Machine Learning

General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites

165,283 datasets

Machine Learning

Pre-Competition Strength and Performance Data for Chinese Sprint Canoe/Kayak Athletes

S1 File contains data for the study 'Association between pre-competition strength and sprint canoe/kayak performance: A mixed-effects analysis of professional Chinese athletes'. The dataset is 33.3 KB in size, stored as an XLSX file, and was authored by Zongwei Chen. It was last updated on 2026-05-28.

TabularExcelCanoe KayakStrength TestingChinese AthletesMixed Effects AnalysisSports Performance+1

0 views

Machine Learning

DOF: Property Charges Balance for New York City

Property-related charge information by period, sourced from data.cityofnewyork.us. The dataset includes columns such as TAXYEAR, SUM_BAL, VALCLASS, and PARID, but a technical issue means 2023 data is missing and cannot be recreated from this snapshot. It was last updated on 2026-05-21.

TabularCSVXMLJSONAccountingCharge SummaryMunicipal FinanceOpen BalanceReal EstateProperty TaxCharge Balance+1

0 views

Machine Learning

WildGUI Screenshots for Parts 16-19

Part16–part19 of the WildGUI dataset contain screenshot images, extending the main release at xwm/WildGUI. The dataset was introduced by Video2GUI and is hosted by author joker-112. The repository was last updated on 2026-06-14.

ImageComputer VisionUi AnnotationGui ScreenshotsWildgui+1

0 views

Machine Learning

Chinese Corporate Performance and Natural Disaster Impact

Replication data and code for a study analyzing the impact of natural disasters on corporate performance in China. The dataset, approximately 864 MB in size, likely contains firm-level financial and operational metrics linked to disaster events. It supports research into how environmental shocks affect business outcomes such as profitability and innovation.

Tabular🇨🇳 ChinaZIPEconomicsCorporate PerformanceCorporate InnovationNatural DisastersReplication Data+1

0 views

Machine Learning

SoE2020: Registered Light and Heavy Vehicle Counts in Queensland, 2001-2019

Almost 10 times the number of light commercial vehicles were on-road in Queensland compared with heavy freight vehicles as of 30 June 2019. The number of registered light commercial vehicles more than doubled since 30 June 2001, while heavy freight vehicles increased by 49% in the same period. This dataset is provided by the Queensland Department of Environment, Tourism, Science and Innovation.

TabularCSVTransportation TrendsQueensland AustraliaVehicle Registrations+1

0 views

Machine Learning

Australia's Southeast Marine Region 3D Images and Descriptions

Australia's Southeast Marine Region dataset from the Australian Ocean Data Network provides 3D images and descriptive text about the marine environment. The dataset was last updated on 2026-06-17. It is available in HTML and PDF formats.

ImageTextGeospatial🇦🇺 AustraliaOceanographyMultimediaMarine Biology+1

0 views

Machine Learning

Additional file 4 of Decadal seafloor geodesy reveals constantly locked areas and temporal

Supplementary material 4 from a study on decadal seafloor geodesy along the Nankai Trough. The dataset contains the average of standard deviations for coefficients used in estimating slip deficit rates for two directions, labeled "02" and "03". It was authored by Yusuke Yokota and is shared under a CC-BY-4.0 license.

TabularTime SeriesGeophysicsTectonic SlipNankai TroughGeodetic DataSeafloor Geodesy+1

0 views

Machine Learning

Gravity and Magnetic Data for the Capel and Faust Basins, Australia

Australian Ocean Data Network provides a record of gravity and magnetic data sources covering the remote offshore Capel and Faust basins on the Lord Howe Rise. The documentation describes the processes applied to level the collected geophysical data. This dataset was last updated on 2026-06-17.

Geospatial🇦🇺 AustraliaZIPGeophysicsMagnetic dataOcean BasinsGravity data+1

0 views

Machine Learning

Telemedicine Use Predictors Among Cardiology Healthcare Professionals, 112 Respondents

57% of 112 surveyed German healthcare professionals treating cardiology patients reported using telemedicine. This dataset contains predictors of telemedicine use identified via Bayesian Model Averaging and an XGBoost model achieving 0.88 AUROC, created by Pascal Petit and last updated in April 2026. It likely includes variables related to professional role, knowledge, attitudes, and demographics.

TabularExcelSurvey DataHealthcare ProfessionalsCardiologyHealthcareTelemedicine AdoptionXgboost Predictors+1

0 views

Machine Learning

Telemedicine Use Predictors Among Cardiology Professionals via XGBoost Model

112 healthcare professionals from a German cross-sectional survey provide data on telemedicine use determinants. The dataset contains the performance metrics and predictor importance results from a final XGBoost model developed by Pascal Petit, last updated in April 2026. The model achieved an AUROC of 0.88 and 79% accuracy in predicting telemedicine adoption.

TabularExcelXgboost ModelSurvey DataHealthcare ProfessionalsCardiologyHealthcareTelemedicine Adoption+1

0 views

Machine Learning

Telemedicine Use Among Cardiology Professionals: XGBoost Model Performance

57% of 112 surveyed German healthcare professionals reported using telemedicine. This 5.5 KB Excel file contains the performance metrics and predictor importance analysis from an XGBoost model predicting telemedicine adoption, authored by Pascal Petit and last updated in April 2026. The model achieved an AUROC of 0.88 and 79% accuracy using nested cross-validation.

TabularExcelXgboost ModelSurvey DataHealthcare ProfessionalsCardiologyHealthcareTelemedicine Adoption+1

0 views

Machine Learning

Guadalajara de Buga Property Tax Records from 1960 to 2000

Property tax records for Guadalajara de Buga, Colombia, spanning over six decades from 1960. The dataset includes detailed information on land and property taxes paid by owners. It is hosted by datos.gov.co and was last updated in May 2026.

TabularCSVXMLJSONLand RecordsColombiaUrban PlanningProperty Tax+1

0 views

Machine Learning

Performance Metrics and Efficiency Trade-offs for Intrusion Detection Systems

A 5.5 KB Excel dataset presents results from a unified framework for evaluating machine learning-based Intrusion Detection Systems (IDS). The framework harmonizes features from the NSL-KDD and CICIDS2017 datasets and benchmarks models including Random Forest, which achieved 98.0% accuracy and 97.0% F1-score. Authored by Shailendra Mishra and last updated on April 20, 2026, this work focuses on reproducibility and statistical validation in cybersecurity research.

TabularExcelMachine Learning BenchmarkCybersecurityIntrusion DetectionNetwork security+1

0 views

Machine Learning

Evaluation Metrics Summary for Intrusion Detection System Benchmarking

Shailendra Mishra's evaluation metrics reporting summary, published on figshare in April 2026. The 5.5 KB XLS file contains results from a unified framework for evaluating Intrusion Detection Systems (IDS). The framework harmonized features from the NSL-KDD and CICIDS2017 datasets and benchmarked supervised, unsupervised, deep learning, and ensemble models.

TabularExcelMachine Learning BenchmarkBenchmarkCybersecurity ResearchIntrusion DetectionNetwork security+1

0 views

Machine Learning

Statistical Significance Test Results for Intrusion Detection System Benchmarking

5.5 KB of statistical test results from a framework evaluating machine learning models for network intrusion detection. The dataset, authored by Shailendra Mishra and last updated in April 2026, contains results from Wilcoxon signed-rank, McNemar’s, and DeLong tests applied to models like Random Forest on harmonized NSL-KDD and CICIDS2017 datasets.

TabularExcelMachine Learning BenchmarkStatistical TestsIntrusion DetectionNetwork security+1

0 views

Machine Learning

Harmonized Network Intrusion Detection Data for ML Benchmarking

Shailendra Mishra's framework harmonizes features from the NSL-KDD and CICIDS2017 network intrusion datasets for evaluating machine learning models. The dataset, last updated in April 2026, is a 5.5 KB Excel file containing the harmonized data used in the study. Experimental results from the framework demonstrated a Random Forest model achieving 98.0% accuracy and 97.0% F1-score on this data.

TabularExcelMachine Learning BenchmarkIntrusion DetectionNetwork securityFeature Harmonization+1

0 views

Machine Learning

Ablation Study Results for Network Intrusion Detection Models on Harmonized Data

A 5.5 KB dataset from figshare, last updated on 2026-04-20, containing results from an ablation study on machine learning models for intrusion detection. The work by Shailendra Mishra proposes a unified framework, harmonizing the NSL-KDD and CICIDS2017 datasets and benchmarking models including Random Forest, which achieved 98.0% accuracy and 97.0% F1-score.

TabularExcelMachine Learning BenchmarkIntrusion DetectionNetwork securityFeature Harmonization+1

0 views

Machine Learning

Cross-Dataset Performance Drop: Harmonized NSL-KDD and CICIDS2017 Network Intrusion Data

A 5.5 KB Excel file containing harmonized features from two network intrusion datasets, NSL-KDD and CICIDS2017, for evaluating machine learning models. The dataset was created by Shailendra Mishra and last updated on April 20, 2026. It supports a framework for reproducible and statistically validated benchmarking of Intrusion Detection Systems.

TabularExcelMachine Learning BenchmarkIntrusion DetectionNetwork securityFeature Harmonization+1

0 views

Machine Learning

Cross-Validation Results for Intrusion Detection Models on NSL-KDD and CICIDS2017

Cross-validation results from a framework evaluating machine learning models for network intrusion detection. The dataset contains performance metrics from models like Random Forest, which achieved 98.0% accuracy and 97.0% F1-score on harmonized data. The work by Shailendra Mishra was last updated in April 2026.

TabularExcelMachine Learning BenchmarkCross ValidationIntrusion DetectionNetwork security+1

0 views

Machine Learning

Cross-dataset Generalization: Harmonized Network Intrusion Detection Data

A 5.5 KB Excel dataset created by Shailendra Mishra and last updated on April 20, 2026. It contains harmonized features from the NSL-KDD and CICIDS2017 network intrusion datasets, processed through a unified framework for evaluating machine learning-based Intrusion Detection Systems (IDS). The work includes results from benchmarking supervised, unsupervised, deep learning, and ensemble models.

TabularExcelMachine Learning BenchmarkIntrusion DetectionNetwork securityFeature Harmonization+1

0 views

PreviousPage 215 of 8246Next