Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
141,875 datasets
UK geospatial data from the Environment Agency detailing land at risk of surface water flooding. It models flood extents and depths across three annual exceedance probabilities (0.1%, 1%, 3.3%) and seven depth bands up to over 2300mm. The dataset was last updated on 2026-05-29.
A retrospective cohort of 297 patients with cirrhosis who underwent a transjugular intrahepatic portosystemic shunt (TIPS) procedure at the First Hospital of Shanxi Medical University from 2019 to 2024. The dataset, authored by Lixin Song, was used to develop and validate machine learning models for predicting postoperative overt hepatic encephalopathy (OHE).
Over 41 million vehicle passages collected from 14 sensor portals in the Aosta Valley region of Italy. The dataset, authored by Marco Alderighi and last updated in May 2026, supports an AI framework for monitoring and forecasting tourist flows.
Marco Alderighi's dataset, last updated May 6, 2026, compares ensemble machine learning models for forecasting tourist flows. It is based on over 41 million vehicle passages from 14 sensor portals in Italy's Aosta Valley, integrated with meteorological and calendar data. The dataset is small at 5.5 KB and is stored in an XLS file format.
An adaptive ensemble AI framework for tourist flow forecasting, validated on over 41 million vehicle passages from 14 sensor portals in Italy's Aosta Valley. The study by Marco Alderighi, last updated in May 2026, demonstrates a 23.7% MAE improvement and 31.2% MSE reduction compared to individual models, with R² scores exceeding 0.94 for short-term predictions.
A best-fit lookup table maps 2021 Middle Layer Super Output Areas (MSOAs) to electoral wards/divisions and Local Authority Districts (LADs) in England and Wales. The dataset includes two versions with different reference dates: one as of 31 December 2022 and another as of 1 May 2025, with corresponding column codes. It provides a key for linking and aggregating data across these nested administrative and statistical geographies.
16 deep learning architectures, including CNNs, Vision Transformers, and Hybrid models, are benchmarked for energy consumption. The dataset contains empirical measurements of fine-tuning training energy and inference energy on NVIDIA A100 GPU and AMD EPYC CPU platforms. It was created by Abderrahmen Jedidi and last updated in May 2026.
349 patient records from a single-center retrospective cohort study at Quzhou People's Hospital in China, collected between June 2022 and June 2025. The dataset likely contains 36 baseline clinical characteristics used to identify four core predictors for diabetic kidney disease progression. Author Binfeng Xiong published the data sheet on figshare in May 2026.
192 accelerometer recordings from 92 healthy controls and 100 participants with Duchenne muscular dystrophy (DMD). The dataset, created by Nicholas Joy and last updated in June 2026, includes measures of movement quantity and quality such as counts per minute, entropy, jerk, and movement frequency. It compares healthy controls, ambulatory DMD, and non-ambulatory DMD participants, with statistical analysis showing significant differences between groups.
192 participant records from a study comparing movement quality between healthy controls and individuals with Duchenne muscular dystrophy (DMD). The dataset includes accelerometer-derived measures such as counts per minute, entropy, jerk, and movement frequency, with median and IQR values reported for healthy, ambulatory DMD, and non-ambulatory DMD groups. It was authored by Nicholas Joy, shared under a CC-BY-4.0 license, and last updated on June 2, 2026.
8,280 global cabin abnormal event reports from 2004 to 2024 were used to train a hybrid CNN-LSTM-Attention model for risk classification. The model achieved 95.01% accuracy and an F1 score of 94.17% on its test set. The dataset was authored by Lianbin Zhou and shared on figshare under a CC-BY-4.0 license.
Global cabin abnormal event reports from 2004 to 2024, used to train a risk classification model. The dataset contains 8,280 reports and was created by Lianbin Zhou. It was last updated on 2026-05-27.
8,280 global cabin abnormal event reports from 2004–2024 were used to construct a hybrid CNN-LSTM-Attention model for intelligent risk assessment. The model achieved 95.01% accuracy and an F1 score of 94.17% on the test set. The dataset, authored by Lianbin Zhou and shared on figshare, provides a foundation for data-driven decision-making in civil aviation safety management.
8,280 global cabin abnormal event reports from 2004 to 2024 were used to train a hybrid CNN-LSTM-Attention model for incident classification. The model achieved 95.01% accuracy and a 94.17% F1 score on the test set. The dataset was authored by Lianbin Zhou and last updated in May 2026.
8,280 global cabin abnormal event reports from 2004 to 2024 were used to train a hybrid CNN-LSTM-Attention model for incident classification. The model achieved 95.01% accuracy and a 94.17% F1 score on the test set. The dataset was authored by Lianbin Zhou and last updated in May 2026.
8,280 global cabin abnormal event reports from 2004 to 2024 were used to train a hybrid CNN-LSTM-Attention model for intelligent risk classification. The model, proposed by Lianbin Zhou, achieved 95.01% accuracy and a 94.17% F1 score on its test set. This framework establishes a mapping from text features to risk mechanisms and hazard levels for interpretable risk quantification in civil aviation.
8,280 global cabin abnormal event reports from 2004–2024 were used to train a hybrid CNN-LSTM-Attention model for intelligent risk assessment. The model achieved 95.01% accuracy and a 94.17% F1 score on a test set, outperforming benchmark approaches. The dataset, shared by Lianbin Zhou on figshare, provides a foundation for data-driven decision-making in civil aviation safety management.
16.5 KB of data from mating trials across four experimental treatments and two single-male controls. The dataset includes morphological measurements for male and female spiders, such as carapace length, width, and area, as well as mating duration and offspring number. It was authored by Shichang Zhang and last updated on 2026-05-31.
185 sporadic pulmonary stenosis patients and 100 healthy controls were analyzed using whole-exome sequencing and multiple machine learning algorithms. The dataset contains results prioritizing 17 candidate genes associated with the condition, published by Yuting Liu in June 2026. It is shared under a CC-BY-4.0 license on figshare as a 663.5 KB Excel file.
A genomic analysis dataset from a study of 185 sporadic pulmonary stenosis patients and 100 healthy controls. The data was generated by Yuting Liu and last updated in June 2026. It contains prioritized candidate genes identified through whole-exome sequencing, gene-level burden tests, and three machine learning algorithms.