Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
159,516 datasets
35 million scanned archival documents from the Dutch National Archives, available as open data. The collection spans from medieval monastery records to archives of the Dutch East and West India Companies and documents on the decolonization of India. The material is provided by the Ministerie van Binnenlandse Zaken en Koninkrijksrelaties and is accessible via an OAI-PMH API.
Geospatial data from the Digital Atlas of Colombian Coral Reefs details the location and classification of coral areas identified up to 2020. It includes columns for biotic, geomorphic, and ecological units, as well as sector and zone information. The data is provided by www.datos.gov.co and was last updated on 2026-05-18.
Geospatial points for mountains and elevations within the Valle del Cauca region of Colombia. The dataset includes columns for Cordillera, Montaña, Altitud_(msnm), and precise location via Longitud and Latitud. It was published by www.datos.gov.co and last updated on 2026-05-18.
Australian Ocean Data Network provides a geospatial map resource detailing the locations of Australia's major ports. The data is served via WFS, WMS, and PNG formats, offering flexibility for different mapping and analysis needs. It was last updated on June 4, 2026.
Santa Marta District Institute for Recreation and Sports (INRED) maintains this registry of its information assets. The dataset includes columns for asset name, type, description, confidentiality classification, legal basis, and responsible owner. It was last updated on 2026-05-18 18:59:42 and is published by www.datos.gov.co.
NYPD Officer Profile - Department Recognition tracks awards bestowed upon uniformed members of the New York City Police Department. The dataset includes the award type and the date it was given, linked to individual officer profiles. Its presence on multiple government data platforms indicates its use for official transparency and public accountability.
Piedecuesta, Colombia's municipal public services company maintains this inventory of trees under its care. The dataset includes columns for tree condition, scientific and common names, family, neighborhood location, and categorization. It was last updated on 2026-05-18 and is available via the Colombian open data portal.
Anonymized environmental complaints data from the Regional Autonomous Corporation of Cundinamarca (CAR), covering potential impacts on natural resources since January 1, 2009. The dataset includes columns for complaint type, municipality, environmental media affected, response status, and dates. It is published by www.datos.gov.co and was last updated in May 2026.
Datos.gov.co provides epidemiological and demographic data on vital events and morbidity causes in Colombia's Valle del Cauca department. Records are structured by life stage, sex, municipality, year, diagnostic group, and ICD-10 code. The dataset was last updated on 2026-05-18.
47,140 Sinhala text pairs for training spelling correction models, split into 37,712 training and 9,428 test samples. The dataset, created by SPEAK-PP, contains dyslexic/noisy sentences paired with their clean, corrected versions. It was last updated on June 8, 2026.
Active Bingo Lessors with identifying information is a government dataset from data.texas.gov. It tracks business licenses for bingo lessors, including their status, administrative holds, and contact details. The dataset was last updated on 2026-05-25.
A government white paper published by Japan's Ministry of the Environment. The document likely contains policy analysis, statistics, and progress reports on environmental protection, the circular economy, and biodiversity conservation. It is authored by the Ministry's General Policy Division, Environmental Planning Office.
South Melbourne, VIC 3205 is the location for this anticipatory notice regarding FTTP network work. The notice was given by DGTek Pty Ltd on 19 May 2026, with a contract date of 23 April 2026 and an expected completion date of 30 September 2026. It originates from the SIP Register on the data.gov.au platform and is available in PDF, ZIP MAPINFO, and Excel formats.
Turkey's first Turkish-priority, categorized LLM prompt injection dataset. It contains 300 manually and generation-assisted prepared Turkish prompt injection payloads, mapped to 12 attack categories and the OWASP LLM Top 10 (2025). The dataset was created by AltaySec and last updated on June 13, 2026.
1,000 multiple-choice benchmark items with first-order ambisonics audio, released in 2026. It corresponds to the evaluation set for the paper 'The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models'. The dataset was authored by KonoyoBC and is hosted on Hugging Face.
Investigaciones Administrativas Iniciadas por Dirección Territorial is a dataset from www.datos.gov.co tracking the initiation of labor law administrative investigations. It likely contains counts of investigations started by different territorial directorates. The dataset was last updated on 2026-05-18.
Georeferenced records of rural community nuclei strengthened through environmental culture processes in 2019. The data lists the municipality and village for each location, along with X and Y coordinates. It was published by www.datos.gov.co and last updated on May 18, 2026.
Consumption data for the sewerage service provided by the Municipal Company of Aqueduct, Sewerage and Cleaning of Funza EMAAF ESP. The dataset is classified by month, use, and socioeconomic stratum for the municipality of Funza in Cundinamarca. It was last updated on 2026-05-18 17:09:11 and is available via the www.datos.gov.co platform.
Registro de Activos de Información is a public information inventory from the Institute for the Development of Antioquia (IDEA), created to comply with Colombia's Transparency Law 1712. The dataset catalogs information assets, detailing their format, language, and publication status. It was last updated on 2026-05-18 18:50:21 and is available via the Colombian open data portal www.datos.gov.co.
Colombia's www.datos.gov.co platform hosts a catalog of published government information, structured under Law 1712 of 2014. The dataset likely contains metadata records describing information assets, including their titles, responsible agencies, formats, and update schedules. The catalog was last updated on 2026-05-18.