Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
157,441 datasets
NSW Cadastre web service provides a dynamic map of cadastral features extracted from the NSW Digital Cadastral Database (DCDB). The service offers access to a state-wide integrated database of current land titles and property boundaries, maintained by Spatial Services (DCS). It was last updated on 2026-05-12.
geoBoundaries maintains standardized, open-license political boundaries for every country globally. This dataset provides ADM0 (country), ADM1, and ADM2 level administrative divisions for Venezuela. The boundaries have been produced and maintained since 2017.
Pairwise comparisons of regional capacity scores from 2023 include mean differences, 95% confidence intervals, and adjusted p-values. The data is derived from a one-way ANOVA with Tukey’s HSD post-hoc test and was authored by Pratik Sharma. The 10.3 KB XLSX file was last updated on May 12, 2026.
A 5.5 KB Excel file containing results from an ablation study on the AVP task. The dataset was authored by Enyan Liu and last updated on June 2, 2026. It is shared under a CC-BY-4.0 license on the figshare platform.
A dataset detailing a spatiotemporal encoder architecture, with parameter counts verified against the implementation. The dataset is a 5.5 KB XLS file authored by Hongmin Wang and last updated on June 2, 2026. It is shared under a CC-BY-4.0 license on the figshare platform.
A validation dataset from the METABRIC breast cancer cohort examining associations among a 16-gene hypoxia score, copy number alteration (CNA) burden, and overall survival. The 5.5 KB Excel file was authored by Wenhan Yang and last updated on June 2, 2026. It is shared under a CC-BY-4.0 license on figshare.
Primer sequences used for RT-qPCR validation of potential key genes. The dataset is a 22.5 KB Excel file authored by Haiming Liang and shared under a CC-BY-4.0 license. It was last updated on June 2, 2026.
Nagara Wakgari Futasa authored this dataset on TH softening technology. The dataset is a 13.5 KB Excel file available under a CC-BY-4.0 license. It was last updated on June 2, 2026.
Fit statistics for latent class models ranging from 2 to 6 classes, published by Shannon A. H. Compton. The dataset is a 5.5 KB Excel file hosted on figshare and last updated in May 2026.
AA-Briefcase-Lite is a public example scenario for Artificial Analysis' frontier agentic evaluation of realistic, long-horizon knowledge work. The dataset extends frontier model benchmarking beyond coding and short-form reasoning to the professional deliverables knowledge workers produce day to day. It consists of four private scenarios in which agents complete realistic professional workflows across data science.
Spatial layers from Ku-Ring-Gai Council detail flood characteristics for design floods ranging from a 20% Annual Exceedance Probability to the Probable Maximum Flood. The dataset is hosted on data.gov.au and was last updated in May 2026. It provides flood map outputs for the Middle Harbour Northern Catchments area.
SS2013v02/GA4402 is a collection of marine underwater video and still images from Balls Pyramid. The dataset is provided by the Australian Ocean Data Network and was last updated on 2026-06-17.
Australian Ocean Data Network provides a tool for calculating distances to marine features. The AMSIS Distance To tool outputs the distance from a given location to the nearest selected marine feature. The dataset was last updated on 2026-06-17.
A collection of one-to-one semantic matches between harmful and harmless prompts, created by aligning prompts from the mlabonne/harmful_behaviors and mlabonne/harmless_alpaca source datasets. The dataset was created by the organization heretic-org and was last updated on 2026-06-16.
User profiles for projects registered with the Rural Development Agency for the 2020 fiscal period. The data is used for review in constructing comprehensive agricultural and rural development projects. It originates from the Colombian open data portal, datos.gov.co, and was last updated on 2026-05-18.
Yotoco's 2020 Annual Acquisition Plan details planned government purchases, including estimated values and contract details. The dataset includes columns for estimated contract duration, selection modality, UNSPSC codes, and funding sources. Data is provided by the Colombian open data portal, www.datos.gov.co, and was last updated in May 2026.
Wikipedia PT Categories is a Portuguese clustering evaluation dataset containing 2,873 articles from pt.wikipedia.org, each labeled with one of 15 broad topic categories. The dataset was created by tardellirs and serves as the source for the WikipediaPTCategoriesClusteringP2P task in the MTEB(por) benchmark. It was last updated on 2026-06-08.
Colombia's ICFES institute maintains this registry of its information assets available to the public. The dataset lists categories of information, their formats, availability, and physical or digital locations. It was last updated on 2026-05-18.
2022 data from Lac-Saint-Jean and Saint-Maurice River sectors delineates flooded areas exceeding established cartographic flood zones. Photogrammetric capture from aerial photographs was used to map the farthest water limits reached during flooding events. The dataset supports the Plan for the protection of the territory against floods (PPTFI).
Over 1,300 convents and monasteries in the geographical area affected by the German Peasants' War (1524-1526) are listed with coordinates and information on the war's effects. The dataset was provided by the 'Visualising the Destruction of Convents and Monasteries in the German Peasants' War' project team at Oxford and Royal Holloway. It is available for download in XLSX format.