Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
157,579 datasets
From 2006 to April 2026, this database contains all competency conflicts presented to the Constitutional Court of Colombia. The data was last updated on May 4, 2026, and is provided by the platform www.datos.gov.co. It includes columns for case file number, subject matter, date, and case type.
United Nations Security Council decisions from 1999 onward containing keywords related to the Protection of Civilians. The Security Council Affairs Division created this dashboard as an information resource for the Repertoire of the Practice of the Security Council. The data was last updated on 2026-05-20.
Academic program information offered by the Center for Aeronautical Studies of Aerocivil for continuing education. The dataset includes columns for activity name, cost, modality, duration, target audience, objectives, and study plan. It is hosted on the Colombian open data portal www.datos.gov.co and was last updated on May 18, 2026.
A collection of video files with action annotations documenting the initial stage of basket trap construction. The dataset, created by Marie-Annick Moreau, includes footage from carving sticks to tying them onto the top ring. It was last updated on June 3, 2026, and is shared under a CC-BY-NC-SA 4.0 license.
Infrastructure Australia created this geospatial dataset for the 2019 Australian Infrastructure Audit. It represents average weekday transport crowding performance during the PM peak period from 4pm to 6pm in 2016. The data models strategic transport conditions, excluding network links below daily volume thresholds.
Semantic Harmless contains one-to-one semantic matches between prompts from two source datasets. The dataset aligns prompts that are semantically closest, where one prompt is harmful and the other is harmless, creating a more controlled comparison. It was created by heretic-org and was last updated on Hugging Face in June 2026.
A list of the top 20 pain products sold by a retailer, which collectively accounted for 53% of total menstrual product sales. The data covers sales between 30th April 2006 and 16th April 2015. It was authored by Victoria Sivill and published on figshare under a CC-BY-4.0 license.
Standard error of estimate (σ_est) for predictions made by the DAMM model regarding fecal short-chain fatty acid chemical oxygen demand. The 5.5 KB XLS file, authored by Taylor L. Davis and last updated in May 2026, quantifies the error for predictions against an identity line where predictions should equal measurements.
Matthew N. Ponticiello's dataset records changes in interest, perceived difficulty in accessing, and perceived importance of initiating medications for opioid use disorder before and after a brief intervention. The data covers 117 participants on probation with opioid use disorder. It was last updated on 2026-05-27 and is shared under a CC-BY-4.0 license.
A dataset supporting a machine learning model for engineering porous biochar for CO2 adsorption. The gradient boosting regression model uses biomass composition, pyrolysis, activation, and adsorption conditions as inputs, achieving an R² of 0.99 and RMSE of 0.15. The dataset, created by Chengkai Cao and last updated in May 2026, is provided in an XLSX file.
SkillTrustBench Results stores public leaderboard records for an AI safety benchmark. The dataset tracks two comparison groups: one fixing a model and comparing tools, and another fixing an analysis tool and comparing models. Raw system outputs are normalized into safety categories of normal (safe), suspicious, or malicious.
472 pregnant women undergoing labor induction were randomized to receive 2g oral Azithromycin (n=236) or no treatment (n=236) in a single-center trial. The dataset contains primary and secondary outcomes measuring perinatal infection rates, maternal and neonatal complications, delivery mode, and safety parameters. It was authored by huimin Cao and last updated on 2026-05-24.
An index of information from the Colombian National Unit for Disaster Risk Management (UNGRD) that is restricted by law or regulated for specific classes of persons, identified as classified or reserved. The dataset includes 13 columns detailing the legal basis, responsible parties, and duration of classification. It was published on datos.gov.co and last updated on 2026-05-18.
The Canadian Environmental Sustainability Indicators program provides data tracking terrestrial snow cover over Canada. Indicators include spring snow cover extent, annual snow cover duration, and March snow water equivalent, with data presented in maps, charts, and CSV tables. The dataset is produced by Environment and Climate Change Canada and was last updated on 2026-04-23.
Prox-E ShapeTalk Benchmark is a subset of 600 random samples from the ShapeTalk dataset, curated for evaluating 3D shape editing models. It is the official benchmark for the SIGGRAPH'26 paper 'Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions'. The dataset was created by author 'haopt' and last updated on May 30, 2026.
Produced during the GRIP Field Experiment, this dataset contains satellite-derived overshooting top magnitudes for tropical storms and hurricanes. It was created by NASA for use with the Real Time Mission Monitor tool to study storm formation and intensification. The data is visualized as color-coded overlays in Google Earth.
San José de Cúcuta municipality provides data on free public Wi-Fi zone usage from 2021 to 2022. The dataset likely contains session counts and user demographics, including gender, age, device type, and operating system. It originates from the Colombian open data portal www.datos.gov.co and was last updated in May 2026.
Northern California and Nevada are the geographic scope for this data release. It contains raw timeseries and metadata for 827 minidisk infiltrometer measurements conducted across nine burned areas and nearby unburned areas. Scott McCoy authored the dataset, which covers measurements from 2018 to 2023.
1995 to 2014 monthly gridded climatologies of total lightning flash rates derived from two satellite-based sensors, the Optical Transient Detector (OTD) and Lightning Imaging Sensor (LIS). The dataset provides a merged, long-term record, with robust tropical and subtropical coverage from LIS and high-latitude data from OTD. It is produced by the National Aeronautics and Space Administration and is available in formats including BIN, ISO, HTML, and PDF.
ARTPARK-IISc's Vaani Benchmark V1.0 is a curated Hindi automatic speech recognition (ASR) evaluation set. It contains 5,343 audio segments from 1,103 speakers across 104 Indian districts, totaling approximately 11.7 hours. Each audio segment includes three independent human transcriptions.