Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
142,269 datasets
475 couples undergoing IVF at the University of Szeged between January 2022 and December 2023 were analyzed for microbial cultures. Vaginal cultures were positive in 121 women (25.5%), with Candida albicans, Streptococcus agalactiae, and Escherichia coli being common, while 134 men (29%) had positive semen cultures dominated by Enterococcus faecalis and Escherichia coli. Machine learning models including SVM, RF, and XGBoost were applied to explore the predictive value of combined clinical and microbial features for IVF outcome.
1988 data from the National Meteorological Center's global upper air models, using a six-hour intermittent assimilation method. This derived dataset contains spatially interpolated atmospheric variables calculated for four grid points over the FIFE study area on a 381 km polar-stereographic grid. It represents model output, not direct measurements, from the NOAA operational analysis system.
Montserrat Mora provides a structured dataset on remittance flows in Mexico, covering both inflows and outflows from 1995 to the present. The data is sourced from the Bank of Mexico (Banxico) and is organized across multiple geographic levels including national, state, municipal, and international scales. Supporting Python scripts and visualizations are included to facilitate data use and reproducibility.
Australian Ocean Data Network presents a flythrough of seabed bathymetry compilations for the Australian Antarctic margin. The data is derived from multibeam, singlebeam, and satellite sources, including ETOPO2. Images of seabed communities are included for the George V margin and Davis coastline.
Breast Cancer Data Integration Omics Atlas version 1 is a 412.2 MB dataset published by Jeremy Prokop on figshare. It contains integrated omics data for breast cancer research, with a focus on biomolecule linkage. The dataset was last updated on June 1, 2026.
Five tables present results from a geochemical study of subduction zone processes. The dataset includes reconstructed bulk chemical compositions of slab-derived fluids, major-element contents in host minerals, and thermodynamic modelling parameters. It was authored by Dongbo Tan and last updated on June 4, 2026.
Files associated with a manuscript investigating the influence of human disturbance on wolves and their prey. The dataset includes materials for analyzing diel activity patterns, temporal overlap, and species occupancy in relation to environmental and anthropogenic variables. It is provided by Francesca Brivio and colleagues under a CC-BY-4.0 license to ensure reproducibility of the published analyses.
A 2.2 GB high-resolution digital master copy of manuscript HC.MS.2017.0061 from the Qatar National Library Heritage Collection. The manuscript is titled 'The Astronomical Poem' and is authored by Abu Ali ibn Abi al-Husayn Abd al-Rahman al-Sufi al-Razi. The dataset was last updated on June 1, 2026.
Ka Un Lao published a benchmark dataset of 27 large noncovalent molecular complexes on 2026-05-23. The dataset provides reference binding energies computed with high-accuracy local coupled cluster [CCSD(T)] methods, extrapolated to the complete basis set limit. It is designed for evaluating electronic structure methods, semiempirical approaches, and machine learning potentials for nanoscale systems.
INFIVALLE's detailed budget revenue execution data, showing planned and collected values and compliance levels. The dataset includes columns for PRESUPUESTO VIGENTE (PV), RECAUDOS (RS), RUBRO, and MES. It is published in compliance with open data and transparency regulations and was last updated on 2026-05-20.
The Australian Ocean Data Network provides rock sample data from the Clerke and Mermaid Canyons off the northwest Australian shelf. The dataset contains macrofauna, mainly bivalves, from four sets of samples collected at depths between 3625 and 4480 meters. The fossils, including species like Pseudopecten dugong and Palaeocurdita aff. globiformis, indicate an Early Jurassic to Norian-Rhaetian age and show Tethyan and Southeast Asian relationships.
Infrastructure Australia's 2019 Audit includes geospatial data on average weekday bus crowding during the PM peak period from 4pm to 6pm in 2016. The data represent transport performance for strategic modelling, with network links below daily volume thresholds excluded. It was last updated on 2026-05-14.
Registered retail dealers of cigarettes, tobacco, and vapor products in New York State are listed with business details and license status. The dataset includes license types, business addresses, and records of any registration suspensions or revocations. Columns suggest it can be used to verify a retailer's standing with the New York State Tax Department.
202 acute ischemic stroke patient records from a retrospective study conducted between August 1, 2019 and August 31, 2023. The dataset was used to develop and validate machine learning models predicting 90-day functional outcomes for patients receiving low-dose alteplase in an extended time window. The author is Huiru Chen, and the data was last updated on May 8, 2026.
A retrospective study of 202 acute ischemic stroke patients receiving thrombolysis between August 2019 and August 2023, conducted by Huiru Chen. The dataset was used to develop machine learning models predicting 90-day functional outcomes based on clinical features like age, blood pressure, and NIHSS score.
230 fasted morning sample sets collected from 118 athletes. Descriptive characteristics include mean values with standard deviations and dehydration classifications, with subgroup proportions shown in parentheses. The dataset was authored by Stefan Pettersson and last updated on June 3, 2026.
A 2026 analysis of 30 post-approval non-interventional safety studies in pregnancy, extracted from the HMA-EMA Catalogue of Real-world Data Studies. The dataset, authored by Stacy Chen, summarizes the wide variation in outcomes evaluated, such as stillbirth and congenital malformations, across studies. It highlights the need for harmonized regulatory guidance to improve study comparability.
NASA's Crustal Dynamics Data Information System provides a time series of Earth orientation parameters derived from the DORIS satellite system. This dataset is generated by the International DORIS Service through the combination of solutions from multiple Analysis Centers. The product is available in text format and is hosted on multiple authoritative government data platforms.
A dataset supporting a machine learning correction algorithm for isotope ratio measurements in metabolic labeling studies using Orbitrap mass spectrometry. The dataset, authored by Zhenwen Yu and last updated in May 2026, is a 141.3 MB ZIP file. It likely contains scan-by-scan mass spectrometry data used to train a random forest model for bias correction.
A machine learning correction algorithm for isotope ratio measurements in metabolic labeling studies using Orbitrap mass spectrometry. The dataset, authored by Zhenwen Yu and last updated in May 2026, contains results from a random forest model that reduces measurement error for mass isotopomers M1 and M2. It is stored in an 81.0 KB XLSX file.