Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
162,337 datasets
High capacity wells data from Prince Edward Island, Canada. The dataset is published by the Government of Prince Edward Island on the open_canada platform. It was last updated on 2026-06-10.
BMRS is a dataset of Bongard–Maximov problems for remote sensing, published on Preprints.org in June 2026. The dataset is authored by Nikita Firsov, Olga Terekhova, and colleagues. It was last updated on the Hugging Face platform on 2026-06-22.
Daily-updated dataset of arXiv papers from AI/ML and adjacent categories, enriched with LLM-derived signals. It includes a 0–100 importance score, topical/lab tags, a one-line takeaway, and dense full-page summaries for a selected subset. The dataset is published by author taesiri and was last updated on 2026-06-17.
curt is a machine-first programming language designed for AI agents with a focus on output-token cost. This dataset contains the complete evaluation record for language version 0.2, including benchmark suites, model-generated programs, and reference materials. The dataset was created by therikkening and was last updated on June 12, 2026.
Historical information on Colombian beneficiaries of international scholarship calls from 2018 to 2024, grouped by various demographic and program variables. The data is provided by www.datos.gov.co and was last updated on 2026-05-26. Columns suggest records for MODALIDAD, GÉNERO, PAÍS DE DESTINO, and ESTRATO SOCIOECONOMICO DE RESIDENCIA.
Submissions and evaluation results for the CADGenBench leaderboard. The dataset contains one row per submitted and evaluated entry, as read by the leaderboard table. It was created by HuggingAI4Engineering and last updated on June 10, 2026.
GNS3 file exports were created as part of a master's thesis at NTNU. The files can be downloaded and imported into GNS3 to extract and run the network topology used in the thesis titled 'IPsec tunnels between end user devices behind NAT'. The dataset was authored by Sindre Revheim Svellingen and last updated on 2026-06-21.
35 million scanned archival documents from the Dutch National Archives, available as open data. The collection spans from medieval monastery records to archives of the Dutch East and West India Companies and documents on the decolonization of India. The material is provided by the Ministerie van Binnenlandse Zaken en Koninkrijksrelaties and is accessible via an OAI-PMH API.
Geospatial data from the Digital Atlas of Colombian Coral Reefs details the location and classification of coral areas identified up to 2020. It includes columns for biotic, geomorphic, and ecological units, as well as sector and zone information. The data is provided by www.datos.gov.co and was last updated on 2026-05-18.
Geospatial points for mountains and elevations within the Valle del Cauca region of Colombia. The dataset includes columns for Cordillera, Montaña, Altitud_(msnm), and precise location via Longitud and Latitud. It was published by www.datos.gov.co and last updated on 2026-05-18.
Australian Ocean Data Network provides a geospatial map resource detailing the locations of Australia's major ports. The data is served via WFS, WMS, and PNG formats, offering flexibility for different mapping and analysis needs. It was last updated on June 4, 2026.
Santa Marta District Institute for Recreation and Sports (INRED) maintains this registry of its information assets. The dataset includes columns for asset name, type, description, confidentiality classification, legal basis, and responsible owner. It was last updated on 2026-05-18 18:59:42 and is published by www.datos.gov.co.
NYPD Officer Profile - Department Recognition tracks awards bestowed upon uniformed members of the New York City Police Department. The dataset includes the award type and the date it was given, linked to individual officer profiles. Its presence on multiple government data platforms indicates its use for official transparency and public accountability.
Piedecuesta, Colombia's municipal public services company maintains this inventory of trees under its care. The dataset includes columns for tree condition, scientific and common names, family, neighborhood location, and categorization. It was last updated on 2026-05-18 and is available via the Colombian open data portal.
Anonymized environmental complaints data from the Regional Autonomous Corporation of Cundinamarca (CAR), covering potential impacts on natural resources since January 1, 2009. The dataset includes columns for complaint type, municipality, environmental media affected, response status, and dates. It is published by www.datos.gov.co and was last updated in May 2026.
Datos.gov.co provides epidemiological and demographic data on vital events and morbidity causes in Colombia's Valle del Cauca department. Records are structured by life stage, sex, municipality, year, diagnostic group, and ICD-10 code. The dataset was last updated on 2026-05-18.
47,140 Sinhala text pairs for training spelling correction models, split into 37,712 training and 9,428 test samples. The dataset, created by SPEAK-PP, contains dyslexic/noisy sentences paired with their clean, corrected versions. It was last updated on June 8, 2026.
Active Bingo Lessors with identifying information is a government dataset from data.texas.gov. It tracks business licenses for bingo lessors, including their status, administrative holds, and contact details. The dataset was last updated on 2026-05-25.
A government white paper published by Japan's Ministry of the Environment. The document likely contains policy analysis, statistics, and progress reports on environmental protection, the circular economy, and biodiversity conservation. It is authored by the Ministry's General Policy Division, Environmental Planning Office.
South Melbourne, VIC 3205 is the location for this anticipatory notice regarding FTTP network work. The notice was given by DGTek Pty Ltd on 19 May 2026, with a contract date of 23 April 2026 and an expected completion date of 30 September 2026. It originates from the SIP Register on the data.gov.au platform and is available in PDF, ZIP MAPINFO, and Excel formats.