Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
146,448 datasets
Nine convolutional neural network models achieved AUCs between 0.921 and 0.967 for distinguishing malignant melanoma from other malignant skin lesions. An XGBoost ensemble model built on their outputs achieved an AUC of 0.988 on a test dataset, as documented in an Excel file by Jinyan Jiang last updated in May 2026. The model was trained and validated using the ISIC-2024 and HAM10000 dermatoscopic datasets.
Norfolk Master Address List is a dataset from data.norfolk.gov containing the master list of addresses for the city. Addresses are classified by status (active, pending, historical) and type (Base or City), with Base addresses included due to emergency response agreements. The dataset was last updated on 2026-05-29 11:39:19.
Supplementary data for a research article on the independent loss of the ancestral endosymbiont Blattabacterium in cockroaches. The 17.9 MB collection includes CSV and XLSX files, authored by Zhuli Cheng and last updated in May 2026. It likely contains genomic or phylogenetic data supporting the finding of ten independent loss events in the Blattellidae, Pseudophyllodromiidae, and Anaplectidae families.
Three daily browse files in GIF format contain plotted digital count measurements from the Airborne Multichannel Microwave Radiometer (AMMR). This dataset was collected during the CAMEX-3 field campaign in August and September 1998, based out of Florida, to study tropical cyclones. The instrument was mounted on a NASA DC-8 aircraft, and the data is provided by the National Aeronautics and Space Administration.
New York's Tuition Assistance Program (TAP) grant recipients and award amounts, broken down by income range, age group, and program details. The data is published by data.ny.gov and covers academic years beginning in 2000. It includes columns for recipient headcount, full-time equivalent students, and dollars awarded.
A global raster map of Stand Basal Area across forest cover for 2020 at a 0.027° × 0.027° spatial resolution. The map was derived using Random Forest machine learning models, with detailed methods and validation described in an associated manuscript. The dataset is 25.8 MB in size and was last updated on June 3, 2026.
Monthly statistical data on air transport operations in Colombia, disaggregated by aircraft type, airline, and origin-destination route. The dataset includes variables such as year, month, airline, origin and destination airports, aircraft type, flight type, number of flights, block hours, seats offered, passengers transported, and cargo and mail moved. The information is produced and managed by the Unidad Administrativa Especial de Aeronáutica Civil for monitoring, control, and analysis of the aeronautical sector.
A study from 2026 by Lan Yan developed a multimodal model for predicting pathological complete response to neoadjuvant therapy in breast cancer. The model integrates deep learning features from longitudinal DCE-MRI, peripheral blood inflammatory indices, and baseline tumor-infiltrating lymphocytes. It was trained and validated on data from 262 retrospectively enrolled patients.
A dataset of 384 patients with pulmonary space-occupying lesions used to develop a multimodal machine learning model. The data includes computed tomography radiomics, positron emission tomography metabolic parameters, and clinical variables, with malignant lesions confirmed by pathology. The dataset was created by Xue Liu and last updated on 2026-04-21.
160 patients with hepatocellular carcinoma were used to develop and validate AI models predicting treatment response and prognosis after transarterial chemoembolization. The dataset, created by Jiangqin Ma and last updated in April 2026, includes results for clinical, radiomics, and deep learning models with external validation on an independent cohort of 38 patients. Deep learning models achieved an area under the curve of up to 0.96 in the training set.
Jiangqin Ma's research dataset contains models and results for predicting treatment response and prognosis after transarterial chemoembolization (TACE) in hepatocellular carcinoma (HCC) patients. The data includes performance metrics for clinical, radiomics, and deep learning models developed from gadoxetic acid-enhanced MRI scans of 160 patients from April 2018 to September 2024. An independent external validation cohort of 38 patients was also used.
A retrospective study from Zhongshan Hospital, Xiamen University (January 2022–January 2025) developed interpretable machine learning models for stroke prediction. The dataset includes 82 stroke cases and 164 matched controls among non-valvular atrial fibrillation patients with low CHA₂DS₂-VA scores. Data encompasses demographics, comorbidities, laboratory markers, and echocardiographic parameters.
A text document explaining the concepts of atomic neutrality and ionic charge. The 2.7 MB PDF was authored by Ben Friday and last updated on June 4, 2026. It is licensed under CC-BY-4.0 and hosted on figshare.
Tuition Assistance Program (TAP) grant recipients and dollar amounts for New York residents attending in-state colleges, starting from academic year 2000. The data is provided by data.ny.gov and includes metrics by college and sector group. It was last updated on 2026-05-22.
The Tuition Assistance Program (TAP) is New York's largest student financial aid grant program. This data includes TAP award recipients and dollar amounts by college, sector groups, and Level of Study for academic years 2000-2011, sourced from data.ny.gov.
Remote Sensing Systems provides a 1-degree gridded global dataset of ocean wind speeds, derived from a merged product of seven satellite microwave radiometers. The dataset includes a 12-month climatology, monthly anomaly maps, trend analysis, and time series covering a period from January 1988 to March 2016. This V7R01 product is constructed using inter-calibrated brightness temperature data and a consistent processing methodology across all sensors.
Beginning in academic year 2000, this dataset tracks the number of students receiving New York's Tuition Assistance Program (TAP) grants each fall semester. It is published by data.ny.gov and includes headcounts broken down by college, sector group, and level of study. The data likely contains over two decades of annual fall semester records.
Annual counts of visa applications processed by the Colombian Visa and Immigration Authority, as defined by Resolution 5477 of 2022. The data includes breakdowns by applicant nationality, sex, age, and intended length of stay. The dataset is hosted on the Colombian open data portal www.datos.gov.co and was last updated in May 2026.
Coastal Louisiana's Atchafalaya and Terrebonne basins are covered by this dataset of estimated land subsidence rates for 2021. It provides total subsidence calculated as the sum of deep and shallow vertical elevation change rates, derived from official state reports and interpolated data. The National Aeronautics and Space Administration (NASA) produced these estimates as part of the Delta-X project, offering them as 30-meter resolution cloud-optimized GeoTIFFs.
Adirondack Park lake water chemistry and nutrient records updated through October 2024. The dataset extends a prior long-term monitoring set, adding 25 new lakes for a total of 53, and includes new analytes UV254 and surface water temperature. It was compiled by Jeremy Farrell from the USGS AQ Samples database.