Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
141,962 datasets
A model developed in 2026 by Xia Li for predicting immune subtypes in lung adenocarcinoma. It is based on spatial distribution patterns of tumor-infiltrating lymphocytes quantified from H&E-stained whole-slide images of 503 patients from the TCGA cohort. The model achieved an AUC of 0.839 in internal validation and 0.927 in an external cohort.
A study by Xia Li, last updated in May 2026, developed an interpretable prediction model for lung adenocarcinoma immune subtyping. The model quantifies spatial distribution patterns of tumor-infiltrating lymphocytes in H&E-stained whole-slide images, linking morphological phenotypes to molecular subtypes. It was validated on internal and external cohorts, achieving AUCs of 0.839 and 0.927, respectively.
A 2026 study by Xia Li developed an interpretable prediction model for lung adenocarcinoma immune subtyping. The work used transcriptomic data from 503 TCGA LUAD patients and automated annotation of H&E whole-slide images to quantify spatial distribution patterns of tumor-infiltrating lymphocytes. The model was validated in internal and external cohorts, achieving AUCs of 0.839 and 0.927, respectively.
UK land extents at risk from surface water flooding, provided by the Environment Agency. The data includes three flood probability scenarios: 0.1%, 1%, and 3.3% annual exceedance probability. It was last updated on 2026-05-29 and is designed for spatial planning, not individual property assessment.
Nine Excel tables totaling 1.1 MB, published by Bandar Alghamdi on figshare in June 2026 under a CC-BY-4.0 license. The tables contain lists of hypoxia-related genes, differentially expressed genes (DEGs) from TCGA LUAD and GSE18842 datasets, enriched pathways, protein-protein interaction (PPI) network rankings, and genes identified via Kaplan-Meier and univariate regression survival analyses.
Gridded monthly sea ice concentration and extent data begins in 1850, providing a long-term climate record. The dataset combines historical observations from ships, naval compilations, and ice services with satellite passive microwave data from 1979 onward. It is produced by the National Aeronautics and Space Administration and distributed via multiple platforms, though service levels have been reduced.
MOP03NM_9 provides monthly gridded means of carbon monoxide (CO) profiles and total column retrievals from the MOPITT instrument aboard NASA's Terra satellite. The dataset, funded by the Canadian Space Agency and launched in 1999, uses near-infrared radiances and includes averaging kernels for retrieval analysis. Data collection is ongoing, offering a long-term record for atmospheric studies.
74 unique feline subjects captured in controlled indoor clinical settings to eliminate environmental thermal noise. The dataset, created by Mohammad Abdulghafar and updated in May 2026, is a standardized derivative of the Thermal Imaging Cats’ Dataset. It provides 7,588 images paired with synchronized 11-feature tabular CSVs, split across 50 training, 12 validation, and 12 testing subjects.
Results from a federated learning framework for speech emotion recognition, published by Mohammed Tawfik in May 2026. The dataset contains the outcomes of Particle Swarm Optimization (PSO) feature selection applied to multi-scale audio features from German (EmoDB) and English (RAVDESS) speech corpora.
Monalisa M. J. Faulkner estimated the age and sex distribution of sickle cell disorders burden in Sierra Leone for 2023. The data uses non-overlapping age groups, with the 5-to-19-year and 20-and-older groups derived by subtraction. Male-to-female ratios are presented with approximate 95% uncertainty intervals estimated by the delta method.
A physics-informed hybrid artificial neural network integrates the van’t Hoff equation to predict hydrogen sulfide solubility in ionic liquids. The model achieved a coefficient of determination (R²) exceeding 0.9987 on a conventional test set and maintained an R² of 0.9925 in a generalization test with unseen cation-anion combinations. The dataset, shared by Jiaping Zhou on figshare in June 2026, supports the rational design of ionic liquids for gas capture.
16.6 MB of raw mobility and traffic data from the city of Ghent for the reference years 2018 and 2021, provided by author remi demeulemeester. The collection includes statistical outputs for factor and cluster analysis, as well as configuration files for the MATSim agent-based transport simulation model. The dataset was last updated on 2026-05-23 and is licensed under CC-BY-4.0.
A retrospective analysis of 543 primary osteoporosis patients admitted between January 2021 and December 2024. The dataset was used to develop and validate machine learning models for predicting poor response to 12 months of standard anti-osteoporosis therapy. It was authored by Yannan Bi and published on figshare.
Data from a 2026 cross-sectional study of 57 veterans aged 18 and above at the New York State Veterans Home at Oxford. The dataset, authored by Mohammad Najeh Samara, was collected via a self-administered survey to predict recreational therapy participation using machine learning models.
57 survey records from veterans aged 18 and above at the New York State Veterans Home at Oxford. The data was collected for a cross-sectional observational study to develop machine learning models predicting recreational therapy participation. The dataset was created by Mohammad Najeh Samara and last updated on 2026-04-30.
Créditos Renovados is a processed and anonymized database from the Colombian Institute of Educational Credit and Technical Studies Abroad (ICETEX) statistical operation on educational credit renewal. It contains semiannual data from the first semester of 2015 through the second semester of 2025, with the 2025-2 statistics being preliminary. The dataset is published via datos.gov.co and was last updated on 2026-05-26.
A retrospective cohort study of 300 patients from Beijing Shijitan Hospital between June 2018 and June 2025. The dataset likely contains clinical variables used to develop and validate a machine learning model predicting 6-month outcomes after combined suction-assisted lipectomy and lymphovenous anastomosis surgery. The predictive model, authored by Yonghao Cui, was published on figshare in May 2026.
A retrospective analysis of 1,324 critically ill patients from a prospectively maintained database, used to develop a machine learning model for predicting ICU-acquired weakness. The dataset includes predictors such as age, APACHE II score, sarcopenia, sepsis, mechanical ventilation, and lactic acid. The data was published by Peng Zheng on figshare in May 2026.
A 9.5 KB Excel file presents results from a random search for the most efficient subsets of morphological landmarks. Author Jean-Pierre Dujardin used K-means to reclassify groups of males and females across multiple species, including Anopheles and Phlebotomus mosquitoes. The data compares classification accuracy using all landmarks versus the smallest optimal subset.
Ablation study results for the Heterogeneous Biological Graph Convolutional Network (HBGCN) model, authored by Haoran Zhu and last updated on 2026-05-19. The 5.5 KB XLS file contains experimental data supporting a method for predicting drug-target interactions by integrating multimodal biological information.