Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
157,962 datasets
Postulados según situación penitenciaria contains data on demobilized individuals processed under Colombia's Law 975 of 2005 and its modification, Law 1592 of 2010. The dataset is managed by the Transitional Justice Directorate of the Ministry of Justice and Law and is organized by place of demobilization. It was last updated on 2026-05-18.
Event records of incidents that can modify the health situation of a community, including disease, risk factors, and other determinants. The dataset is hosted by the Colombian open data portal www.datos.gov.co and was last updated on May 18, 2026. Columns suggest it includes demographic details like nationality, sex, and age for each recorded event.
Results from experiments (S17–S24) evaluating machine learning models for intrusion detection under domain shift. The 5.5 KB dataset, created by Dung Ha Thanh, was last updated in April 2026. It contains performance metrics from the TAN IDS evaluation framework.
Cross-dataset evaluation results (S9–S16) from the TAN-IDS framework, a method for assessing NetFlow-based intrusion detection models. The dataset, created by Dung Ha Thanh and shared on figshare in April 2026, contains performance metrics from experiments testing model robustness across different network environments. It is a small dataset at 5.5 KB, stored in an XLS file.
Evaluation results for machine learning models across eight distinct scenarios (S1–S8) assessing robustness to domain shift in network intrusion detection. The data originates from the TAN-IDS evaluation framework research by Dung Ha Thanh, published on figshare in April 2026. This 5.5 KB XLS file contains comparative performance metrics.
A public transport route dataset for Dosquebradas, Colombia, last updated on 2026-05-18 16:38:24. It describes the 'Ruta 17 - Molivento' bus line, listing key stops for both inbound and outbound journeys. The data is hosted by the Colombian open data portal www.datos.gov.co on the Socrata platform.
NASA's Parker Solar Probe SWEAP instrument provides Level 3 measurements of electron pitch angle distributions in the solar wind. The data is governed by specific 'Rules of the Road' requiring user collaboration with the principal investigator for scientific publication. This dataset is part of the mission's effort to study the Sun's corona and solar wind acceleration.
NASA's Parker Solar Probe SWEAP instrument provides Level 3 measurements of Electron Pitch Angle Distributions from the SPAN-B sensor. The data is structured in 13.981-second intervals and is governed by specific collaboration rules for scientific use. The dataset was last updated on March 13, 2026.
NASA's Parker Solar Probe mission provides electron pitch angle distribution data from the SPAN-Electron instrument. The dataset is part of the Solar Wind Electrons Alphas and Protons (SWEAP) instrument suite and is governed by specific collaboration rules. Data files are named with a versioned format and last updated on 2026-03 13.
Measurements of gases from mobile sources circulating in the jurisdiction of Corantioquia, Colombia. Data was obtained from roadside operations and companies across different municipalities. The dataset includes columns for subtotals by fuel type, municipality, year, and total rejected results.
Barrios y Veredas del Municipio de Villavicencio contains information on the neighborhoods and rural districts of Villavicencio Municipality. The data was updated as of April 30 and provided by the Dirección de Ordenamiento Territorial - DOT. It is available via the www.datos.gov.co platform in CSV, JSON, XML, and RDF formats.
Anonymized records of students benefiting from academic support programs aimed at bridging the gap between secondary and higher education in Colombia. The dataset includes geographic, demographic, and program participation details for each student. It is hosted on the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-18.
Fusagasugá municipality in Colombia monitors the behavior of its water sources, including surface and underground streams. The dataset includes columns for location coordinates (Este, Norte), source names, activity types, and measurement dates. It is hosted by the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-18.
Corantioquia jurisdiction in Colombia contains data on indigenous communities participating in environmental culture processes. The dataset includes information on location, indigenous community, reservation, ethnicity, legal acts or resolutions, and titled area in hectares. It was published on the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-18.
A multi-domain reasoning dataset built to improve frontier models by revealing their failures and turning expert grading into training signal. The dataset pairs self-contained tasks with weighted rubrics across three domains — Computer Science, Data Science, and Chemistry. It was created by TuringEnterprises and last updated on 2026-06-16.
Historical data on teachers by academic level in the public sector for the urban and rural zones of the municipality of Sabaneta. The dataset includes columns for Sector, Year, Zone, Quantity, and Academic Level. It is hosted by www.datos.gov.co and was last updated on 2026-05-18.
Tiny-Ko-Stories is a dataset of 2,003,542 original Korean short stories, created by author psymon and last updated on June 13, 2026. Inspired by the English TinyStories dataset, it was generated from scratch in Korean to test if small models can demonstrate reasoning and creativity with limited, high-quality data. The dataset includes Korean-specific elements like native names, sentence rhythm, onomatopoeia, and small event structures.
Weekly updated registry of corporations and limited liability companies in Oregon designated as benefit companies. The dataset includes business names, official registry numbers, entity types, and the dates of their benefit designation. Columns suggest it provides details on companies that have committed to creating a public benefit alongside profit.
Colombian national and regional data on the educational level of individuals who entered the reintegration process, as of a specific cut-off date. The dataset is published by datos.gov.co and was last updated on 2026-05-18. It includes columns for municipality, department, process status, and educational level.
Xin-Rui released the ImagineTime benchmark in 2026 to evaluate image generation models. It contains 750 benchmark cases designed to test a model's ability to produce ordered 2x2 motion sheets with coherent entities and state transitions. The dataset was published with the paper 'Can Image Models Imagine Time?' and is hosted on Hugging Face.