Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
166,182 datasets
228 Korean-language questions designed to benchmark web agents on exhaustive enumeration tasks. Each task asks an agent to fill every attribute cell of a table by exhaustively enumerating a closed set. Gold answers, source URLs, and scoring details are withheld for a leakage-aware evaluation run privately against held-out data.
LockerNYC is a pilot program operated by GoLocker for the City of New York, providing secure public lockers for package delivery and pickup. The dataset records actions like receiving, reserving, and withdrawing packages from sidewalk lockers across the city. Columns suggest detailed tracking of delivery and pickup durations, locker locations, and associated geographic and administrative boundaries.
50 individual mussels of the species *Mytilus edulis* and *M. galloprovincialis* are represented in this dataset. It provides the number of byssal threads and the corresponding tenacity for individuals that are either non-infested or infested by endolithic symbionts. The dataset was authored by Laurent Seuront and last updated on 2026-05-29.
Consolidated list of entities subject to control by the Departmental Comptroller's Office of Huila, Colombia. The data includes contact details and location for public entities and private legal or natural persons managing municipal resources. The dataset was last updated on 2026-05-18 16:55:52 and is hosted on the Colombian open data portal.
Every Eval Ever Datastore provides a shared schema and crowdsourced database for machine learning benchmarks. The project is maintained by the EvalEval Coalition, a researcher community focused on evaluation infrastructure. The dataset was last updated on June 19, 2026.
Bee occurrence and trait data collected in 2023 from communities along a 2,957-meter elevation gradient in the Colombian Andes. The data was used in the 2026 publication 'Tropical bee assemblage diversity decreases with elevation while body size increases' by Turley et al. in Biotropica. It was authored by Nash Turley and shared under a CC-BY-4.0 license.
Hydrocoherent numerical terrain models provide a regional representation of Quebec's relief based on altimetric and planimetric data. The models are a collaborative product from the Ministry of Natural Resources and Forests and Natural Resources Canada, offering a quality portrait of relief at a 1:50,000 scale. They feature a grid resolution of 0.324 arcseconds, corresponding to approximately 10 meters on the ground.
A dataset of 93 patients with sellar region brain tumors, including 40 Langerhans cell histiocytosis (LCH) and 53 germ cell tumor (GCT) cases, collected between April 2012 and April 2024. Radiomics features were extracted from multiparametric MRI scans (T1WI and T2WI) with manually segmented regions of interest. The data was used to develop and validate machine learning models for tumor classification.
A gene expression signature model for lung adenocarcinoma prognosis, constructed using LASSO, XGBoost, and Random Forest algorithms. The dataset was created by Guannan Wang and last updated on May 1, 2026. It integrates single-cell RNA-seq data and includes experimental validation of the core gene PABPC1.
A prognostic model for lung adenocarcinoma (LUAD) was constructed using hypoxia- and lactylation-related genes via LASSO, XGBoost, and Random Forest algorithms. The model's core gene, PABPC1, was validated experimentally in two LUAD cell lines using qRT-PCR, CCK-8, colony formation, wound healing, and Transwell assays. The dataset, authored by Guannan Wang and last updated in May 2026, is shared under a CC-BY-4.0 license.
José Alonso Solís-Lemus published a dataset containing Spearman correlation coefficients between geometric variables and simulation outputs, with associated p-values and significance levels. The dataset is 2.3 KB in size and was last updated on June 2, 2026. It is available under a CC-BY-4.0 license.
Data extracted from 24 research articles for in-distribution and 4 articles for out-of-distribution evaluation. The dataset was created by Shashank Mishra using manual extraction tools like WebPlotDigitizer and was last updated on 2026-05-02. Its public availability is intended to aid in modelling and designing next-generation triboelectric nanogenerators (TENGs).
A curated collection of text data for training large language models, created by the organization OpenLLM-France. The dataset was last updated on June 3, 2026. Its specific composition, size, and license are not detailed in the provided metadata.
Expenses for the Colorado Department of Transportation for the current and previous state fiscal year. The dataset includes columns such as Name, Clearing Date, Expense Description, Funding Source, CDOT Segment, and Amount. It is published by data.colorado.gov and was last updated on 2026-05-29.
South-East Nigeria is the geographic scope for this dataset from a stepped wedge cluster randomised trial. The study aimed to improve leprosy ulcer management through a community self-care intervention. The data was authored by Anthony Meka and last updated on the platform in June 2026.
Two field campaigns in Europe and North America collected data on snow depth, density, and water equivalent using 9 common snow core samplers. The study, led by Ignacio Lopez Moreno of the National Institute of Ecology, quantifies instrumental bias and observer-induced error in manual snow measurements. Results show uncertainty in bulk snow density estimation is about 5% for an individual instrument and close to 10% among different instruments.
L&I Intent Project Details records intents filed by employers or contractors for work on public works projects in Washington State. The dataset likely contains detailed information on project contracts, involved companies, and key dates. It is hosted on multiple platforms, including data.wa.gov and Data.gov, indicating its status as an official government data release.
Washington State's Labor & Industries department provides daily-updated records of Affidavits of Wages Paid filed by contractors for public works projects. The dataset includes project details, contractor and agency information, contract amounts, and apprentice utilization rates. It supports compliance monitoring and analysis of labor standards on state-funded construction.
Derwent Estuary in Australia contains samples of temperature and salinity measured via CTD profile at Station 1. Data was collected by the Australian Ocean Data Network between August 2012 and January 2013. The dataset is available in formats including NetCDF, HTML, and PNG.
Surficial cover facies maps for the Hope and Boulder reefs within Australia's Great Barrier Reef. The dataset is published by the Australian Ocean Data Network on the data.gov.au platform. Its last recorded update is scheduled for June 27, 2026.