Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
166,400 datasets
Spatial layers from Ku-Ring-Gai Council detail flood characteristics for design floods ranging from a 20% Annual Exceedance Probability to the Probable Maximum Flood. The dataset is hosted on data.gov.au and was last updated in May 2026. It provides flood map outputs for the Middle Harbour Northern Catchments area.
SS2013v02/GA4402 is a collection of marine underwater video and still images from Balls Pyramid. The dataset is provided by the Australian Ocean Data Network and was last updated on 2026-06-17.
Australian Ocean Data Network provides a tool for calculating distances to marine features. The AMSIS Distance To tool outputs the distance from a given location to the nearest selected marine feature. The dataset was last updated on 2026-06-17.
A collection of one-to-one semantic matches between harmful and harmless prompts, created by aligning prompts from the mlabonne/harmful_behaviors and mlabonne/harmless_alpaca source datasets. The dataset was created by the organization heretic-org and was last updated on 2026-06-16.
User profiles for projects registered with the Rural Development Agency for the 2020 fiscal period. The data is used for review in constructing comprehensive agricultural and rural development projects. It originates from the Colombian open data portal, datos.gov.co, and was last updated on 2026-05-18.
Reiwa 7 Fiscal Year Annual Report published by the Securities and Exchange Surveillance Commission. The document is a PDF file released on the japan_data platform and last updated on June 29, 2026. The author is the Securities and Exchange Surveillance Commission, an organization under the Financial Services Agency.
Yotoco's 2020 Annual Acquisition Plan details planned government purchases, including estimated values and contract details. The dataset includes columns for estimated contract duration, selection modality, UNSPSC codes, and funding sources. Data is provided by the Colombian open data portal, www.datos.gov.co, and was last updated in May 2026.
Wikipedia PT Categories is a Portuguese clustering evaluation dataset containing 2,873 articles from pt.wikipedia.org, each labeled with one of 15 broad topic categories. The dataset was created by tardellirs and serves as the source for the WikipediaPTCategoriesClusteringP2P task in the MTEB(por) benchmark. It was last updated on 2026-06-08.
Colombia's ICFES institute maintains this registry of its information assets available to the public. The dataset lists categories of information, their formats, availability, and physical or digital locations. It was last updated on 2026-05-18.
Polygon features depict the surficial geology of the Willow River area within NTS map area 83O Northeast. The Government of Alberta created the data in ArcInfo format, distributing it as Arc export and shapefiles. The metadata was last updated in March 2026.
2022 data from Lac-Saint-Jean and Saint-Maurice River sectors delineates flooded areas exceeding established cartographic flood zones. Photogrammetric capture from aerial photographs was used to map the farthest water limits reached during flooding events. The dataset supports the Plan for the protection of the territory against floods (PPTFI).
Over 1,300 convents and monasteries in the geographical area affected by the German Peasants' War (1524-1526) are listed with coordinates and information on the war's effects. The dataset was provided by the 'Visualising the Destruction of Convents and Monasteries in the German Peasants' War' project team at Oxford and Royal Holloway. It is available for download in XLSX format.
The STRABLE benchmark is a suite for evaluating machine learning models on tabular data containing strings, addressing a previously understudied setting. It was created by inria-soda and is hosted on Hugging Face. The dataset was last updated on June 11, 2026.
From 2006 to April 2026, this database contains all competency conflicts presented to the Constitutional Court of Colombia. The data was last updated on May 4, 2026, and is provided by the platform www.datos.gov.co. It includes columns for case file number, subject matter, date, and case type.
United Nations Security Council decisions from 1999 onward containing keywords related to the Protection of Civilians. The Security Council Affairs Division created this dashboard as an information resource for the Repertoire of the Practice of the Security Council. The data was last updated on 2026-05-20.
A live snapshot of the London Datastore catalogue, showing every dataset and resource entry. Files include all metadata for each entry and are continually updated. The catalogue is maintained by the Greater London Authority.
The dataset lists employers participating in the London Living Wage Programme. It is provided by the Greater London Authority and was last updated on 2026-06-24. The London Living Wage is an independently calculated hourly rate, currently £10.55, designed to reflect the high cost of living in the capital.
Origin and destination data for London public transport journeys, segmented by time of day and day of the week. The dataset is part of the Greater London Authority's Night Time Observatory. The record was last updated on 2026-06-24 21:06:38.498055.
Numbers and proportions of London's night time workers broken down by ethnic group, country of birth, age, and sex. This dataset is part of the Greater London Authority's Night Time Observatory. The metadata indicates a last update timestamp of 2026-06-24 21:06:36.756508.
Greater London Authority's dataset tracks the number of assault-related incidents attended by the London Ambulance Service, broken down by night (6pm-6am) and day (6am-6pm) periods. It includes counts and proportions from the 2007/08 to 2017/18 financial years and provides borough-level data for selected years. This dataset is part of the Greater London Authority's Night Time Observatory.