Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
161,680 datasets
Datos.gov.co provides epidemiological and demographic data on vital events and morbidity causes in Colombia's Valle del Cauca department. Records are structured by life stage, sex, municipality, year, diagnostic group, and ICD-10 code. The dataset was last updated on 2026-05-18.
47,140 Sinhala text pairs for training spelling correction models, split into 37,712 training and 9,428 test samples. The dataset, created by SPEAK-PP, contains dyslexic/noisy sentences paired with their clean, corrected versions. It was last updated on June 8, 2026.
Active Bingo Lessors with identifying information is a government dataset from data.texas.gov. It tracks business licenses for bingo lessors, including their status, administrative holds, and contact details. The dataset was last updated on 2026-05-25.
A government white paper published by Japan's Ministry of the Environment. The document likely contains policy analysis, statistics, and progress reports on environmental protection, the circular economy, and biodiversity conservation. It is authored by the Ministry's General Policy Division, Environmental Planning Office.
South Melbourne, VIC 3205 is the location for this anticipatory notice regarding FTTP network work. The notice was given by DGTek Pty Ltd on 19 May 2026, with a contract date of 23 April 2026 and an expected completion date of 30 September 2026. It originates from the SIP Register on the data.gov.au platform and is available in PDF, ZIP MAPINFO, and Excel formats.
Turkey's first Turkish-priority, categorized LLM prompt injection dataset. It contains 300 manually and generation-assisted prepared Turkish prompt injection payloads, mapped to 12 attack categories and the OWASP LLM Top 10 (2025). The dataset was created by AltaySec and last updated on June 13, 2026.
1,000 multiple-choice benchmark items with first-order ambisonics audio, released in 2026. It corresponds to the evaluation set for the paper 'The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models'. The dataset was authored by KonoyoBC and is hosted on Hugging Face.
Investigaciones Administrativas Iniciadas por Dirección Territorial is a dataset from www.datos.gov.co tracking the initiation of labor law administrative investigations. It likely contains counts of investigations started by different territorial directorates. The dataset was last updated on 2026-05-18.
Georeferenced records of rural community nuclei strengthened through environmental culture processes in 2019. The data lists the municipality and village for each location, along with X and Y coordinates. It was published by www.datos.gov.co and last updated on May 18, 2026.
Consumption data for the sewerage service provided by the Municipal Company of Aqueduct, Sewerage and Cleaning of Funza EMAAF ESP. The dataset is classified by month, use, and socioeconomic stratum for the municipality of Funza in Cundinamarca. It was last updated on 2026-05-18 17:09:11 and is available via the www.datos.gov.co platform.
Registro de Activos de Información is a public information inventory from the Institute for the Development of Antioquia (IDEA), created to comply with Colombia's Transparency Law 1712. The dataset catalogs information assets, detailing their format, language, and publication status. It was last updated on 2026-05-18 18:50:21 and is available via the Colombian open data portal www.datos.gov.co.
Colombia's www.datos.gov.co platform hosts a catalog of published government information, structured under Law 1712 of 2014. The dataset likely contains metadata records describing information assets, including their titles, responsible agencies, formats, and update schedules. The catalog was last updated on 2026-05-18.
Chong-Wei Li deposited tabular datasets and R scripts for reproducing analyses and figures from the study 'Divergence among species with “good competitor” and “good cultivator” strategies promotes asymmetric facilitation among co-invaders'. The materials are organized by figure and include four main-text and three supplementary figure folders. The archive is 7.6 MB and was last updated on 2026-05-11.
Georeferenced locations of solid waste disposal instruments within the municipality of Bucaramanga. The dataset is categorized by material, collection ease, model, and condition for the year 2023. It originates from the open data portal datos.gov.co and was last updated on 2026-05-18.
Monthly totals of Penalty Charge Notices issued for a bus stop clearway on London Road, including income value. The data is provided by Leicester City Council and covers a period starting from September 2017. The dataset was last updated on 2026-06-17.
An index of information assets classified as reserved or confidential by the University of Cundinamarca, published in compliance with Colombian transparency law 1712 of 2014. The dataset includes 19 columns detailing asset names, responsible units, and security classifications. It was last updated on May 18, 2026, and is available via the Colombian open data portal.
Water consumption records from the Municipal Water, Sewerage, and Sanitation Company of Funza (EMAAF ESP), classified by month, usage type, and socioeconomic stratum. The dataset is available in multiple formats including CSV, JSON, XML, and RDF. It was last updated on May 18, 2026, and is hosted on the Colombian open data portal www.datos.gov.co.
A flood study report for the Middle Harbour Northern Catchments, authored by Ku-Ring-Gai Council and last updated on 2026-05-28. The study area includes Middle Harbour Creek and its tributaries in St Ives, and Rocky Creek, Stoney Creek, and High Ridge Creek in Gordon/East Killara, which flow into Middle Harbour.
800 episodes of robot manipulation data created using the LeRobot framework. The dataset contains 20,000 frames recorded at 15 frames per second. It is structured for training imitation learning models on a single push task.
Xarm Lift Medium Replay Image is a dataset of 800 episodes and 20,000 frames created using the LeRobot framework. The dataset likely contains image observations and teleoperation data for a robotic lifting task. It was last updated on June 8, 2026.