Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
168,810 datasets
List of keg tags assigned to distributors in the state of Missouri. The data includes columns for State, City, Street, KegNum, Building Number, DBA Name, Requestor Name, Zipcode, Phone, and Licensee Name, suggesting a record of alcohol distribution licensing. It is published by data.mo.gov on the Socrata platform and was last updated on 2026-05-29.
Estimated building footprint layers generated by applying an XGBoost classification model to satellite imagery for the Municipality of Cereté, Colombia. The model was developed during the DataSandbox Colombia 2020 by the Data Science Unit of the National Planning Department (Departamento Nacional de Planeación). Data is projected in MagnaSirgas 3115 and was last updated on 2026-05-18.
CARIACO Ocean Time Series provides a long-term record of biogeochemical processes in the Cariaco Basin, a tectonic depression off Venezuela's coast influenced by seasonal upwelling. The program, run by NASA, began in November 1995 to study how meteorological and hydrographic conditions affect carbon fluxes and primary production. This dataset captures marked seasonal and interannual variations driven by the migration of the Intertropical Convergence Zone.
On-Time Performance (OTP) data for the Long Island Rail Road (LIRR) beginning in 2015. The dataset, hosted by data.ny.gov, measures how frequently trains arrive at their final destination within 5 minutes and 59 seconds of schedule. It includes monthly performance metrics segmented by branch and peak travel periods.
From the 2000s to the 2020s, this dataset provides water quality data for 98 lakes on the Tibetan Plateau. It was authored by Xin Rong Si and is available under a CC-BY-4.0 license.
30 independent runs of 7 evolutionary algorithms on 12 benchmark functions provide average optimal values and standard deviations. The dataset, authored by Yang Cao, contains results for 20-dimensional problems. It was last updated on May 13, 2026.
Numerical results from 30 runs of 7 evolutionary algorithms on 12 test functions across 10 dimensions. The data includes average optimal values and standard deviations, compiled by Yang Cao. The dataset was last updated on 2026-05-13.
Yang Cao published a dataset on May 13, 2026, containing numerical results from benchmarking eight evolutionary optimization algorithms. The data includes average optimal values and standard deviations from 30 runs of algorithms like DE, SaDE, SHADE, ILSHADE, jSO, MPEDE, LSHADE, and RL-DE. The evaluation covers 29 test functions from the CEC2017 benchmark suite in 100 dimensions.
30 runs of 8 evolutionary algorithms (DE, SaDE, SHADE, ILSHADE, jSO, MPEDE, LSHADE, RL-DE) on 29 test functions across 50 dimensions. The dataset contains the average optimal values and standard deviations from these runs, compiled by Yang Cao and shared under a CC-BY-4.0 license.
Numerical results from 30 runs of 8 optimization algorithms on the 29 test functions of the CEC2017 benchmark suite in 30 dimensions. The dataset contains average optimal values and standard deviations, compiled by Yang Cao and last updated in May 2026. The data is provided in a single 16.7 KB XLSX file.
30 runs of 8 evolutionary algorithms on 29 test functions provide a benchmark for optimization performance. The dataset contains average optimal values and standard deviations for 10-dimensional problems. Authored by Yang Cao and last updated in May 2026, it is shared under a CC-BY-4.0 license.
A 9.5 KB dataset provides statistical summaries for various traits. It was authored by Pablo Ubilla Pavez and last updated on May 13, 2026. The data includes units of measurement, means, standard deviations, minima, maxima, sample counts, and error metrics for both natural-log transformed and original-scale values.
24.4 MB of benchmark data for evaluating deep intronic variant prediction models, hosted on GitHub. The dataset was last updated on May 13, 2026, and is shared under a CC-BY-4.0 license by author Nathan Fortier.
A benchmark dataset derived from ClinVar, a public archive of human genetic variants and their clinical significance. The dataset is hosted on figshare and was authored by Nathan Fortier, last updated on May 13, 2026. It is available as a 337.2 KB ZIP file under a CC-BY-4.0 license.
Riepe benchmark data is a 5.5 MB collection for evaluating computational tools that predict RNA splicing events. The data is hosted on GitHub and was last updated on May 13, 2026. Author Nathan Fortier released it under a CC-BY-4.0 license.
Benchmark data for evaluating the CI-SpliceAI model, a machine learning tool for predicting splice site alterations. The data is hosted on GitHub and was last updated in May 2026. It was authored by Nathan Fortier and is shared under a CC-BY-4.0 license.
Approved recipients and funded projects under the Torres Strait Community Sport and Recreation Program. The dataset is provided by the Sport, Racing and Olympic and Paralympic Games organization via data.gov.au. It was last updated on 2026-05-29.
A metadata catalog outlines the information publication framework for Palmira's public utility company during the 2022 fiscal year. The schema includes columns for update frequency, data format, responsible parties, and access links. It is hosted on the Colombian open data portal, www.datos.gov.co, and was last updated in May 2026.
453 bytes of latencies for a pole descent cliff task, shared by Jason Samonds. The data is stored in a MAT file format and was last updated on June 3, 2026. The dataset is licensed under CC-BY-4.0.
Colombia's reported mobilization volumes of timber and non-timber forest products. The data is reported by the Colombian Agricultural Institute (ICA) and the Ministry of Environment and Sustainable Development (MADS). It includes temporal and spatial breakdowns by year, semester, trimester, month, municipality, and department.