Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
169,080 datasets
RelBench hosts the dbinfer family of relational datasets in its 3.0 manifest format. The datasets originate from the 4DBInfer benchmark and are exposed via the dbinfer-relbench-adapter package. Labels for tasks are built externally and served as-is.
Municipal data on the population enrolled in the elderly care program of San Pedro de los Milagros, Colombia. The dataset includes variables such as zone, disabilities, pathologies, sex, age, health insurer, and victim status. It corresponds to the validity year 2025 and was last updated on the datos.gov.co platform in May 2026.
A dataset from the Corporación Autónoma Regional de Boyacá (CORPOBOYACA) listing records of information classified or reserved from public access. The dataset includes 22 columns detailing the legal basis, responsible parties, and duration of classification for each record. It was last updated on 2026-05-26 16:22:42 via the Socrata platform on datos.gov.co.
Motionatlas Data provides metadata for video files, including source identifiers and media paths. The dataset was created by maxLWSv2 and was last updated on June 10, 2026. It does not contain the actual video files, which must be sourced separately by the user.
Primary and processed experimental data supporting a manuscript on complex fault reactivation behavior. The dataset is 619.5 MB and was authored by Chonglang Wang, last updated on May 16, 2026. It is shared under a CC-BY-4.0 license via the figshare platform.
Lithogeochemistry contains whole rock chemical analyses from samples collected in Yukon, Canada. The dataset is provided by the Government of Yukon and was last updated on 2026-05-20. The specific number of samples and analytical features are not detailed in the available metadata.
CTD profile samples of temperature and salinity measured at Station 5 in the Derwent Estuary. The data collection period spans from August 2012 to January 2013. The dataset is provided by the Australian Ocean Data Network via the data_gov_au platform.
LAION-BVD - 300M Video Frame URLs is a collection of approximately 300 million keyframe references extracted from publicly available web videos. It was created by LAION and last updated on June 14, 2026. The dataset contains source video URLs and frame timestamps, but no actual image data.
Comerciantes Activos is a registry of active and valid merchants for the year 2023, including natural persons and companies. The dataset is hosted on the Socrata platform by www.datos.gov.co and was last updated on 2026-05-18. It contains columns for business identification, classification, location, and status.
A 60.0 KB Excel file contains a taxonomic presence-absence matrix of species recorded in evaluated biogeographic regions. The dataset was authored by Sebastian de la Hoz Pedraza and last updated on June 2, 2026. It is shared under a CC-BY-4.0 license on figshare.
Performance data for the UK government's digital electoral services. The data is refreshed daily and originates from the Ministry of Housing, Communities and Local Government.
Weekly data on COVID-19 case follow-up in Colombia, likely from 2020 onward. The dataset tracks the number and percentage of cases with and without follow-up by municipality and department. It is published by the Colombian government via the datos.gov.co platform and was last updated on 2026-05-18.
Yukon Exploration and Geology Overview 2020 is a government report from the Government of Yukon, published under an open license. The dataset's cross-platform presence on open.canada.ca suggests it is an official public record. Its content likely contains textual descriptions of geological surveys and mineral exploration activities for the year 2020.
Colombian government data listing active proponents (contractors and bidders) as of October 15, 2020, including renewals and new registrations. The dataset is published by datos.gov.co and was last updated on the platform in May 2026. It contains business contact and registration details for entities eligible to participate in public procurement processes.
Mean test set accuracy scores for multiple datasets, comparing a prior model based on FoMo v1.0 to other baselines. The dataset is a 5.5 KB Excel file published by Alasdair D. F. Clarke on figshare under a CC-BY-4.0 license. It was last updated on May 12, 2026.
A collection of 550 test cases for evaluating large language models in financial contexts. It was produced by BC Card and Yonsei University DSL as part of the S2026 LLMOps project. The dataset includes 300 regression test cases, 200 financial edge cases for hallucination detection, and 50 hard negative QA samples.
A table consolidating key decisions from smaller decisions, likely from interview sources. It includes the actors involved in each decision and the assignment of decisions to four phases: design, implementation, adaptation, and operation. The dataset was authored by Clara Léonie Diebold and last updated on 2026-05-19.
241 eye surgery cases form the basis for a multiple linear regression model predicting absolute IOP reduction at 3 months. The analysis, authored by Jean-Marc Perone, stratifies eyes by preoperative IOP into subgroups of 20–29, 15–19, or 10–14 mmHg. This 9.5 KB Excel file was last updated on May 19, 2026.
Regression model results testing the effects of transcranial ultrasound sonication on Go/NoGo task choices. The dataset contains standardized regression coefficients and p-values from mixed-effects logistic models. It was authored by Nomiki Koutsoumpari and published on figshare in May 2026.
86,933 stars with spectral classifications, accurate positions, and proper motions, revived from historical chart data. The catalog is an extension of the Henry Draper Catalog, created by astronomers at the Harvard College Observatory and later digitized by Nesterov et al. in the 1990s. It was made available by NASA HEASARC in 1998.