Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
160,952 datasets
Anonymized records of students benefiting from academic support programs aimed at bridging the gap between secondary and higher education in Colombia. The dataset includes geographic, demographic, and program participation details for each student. It is hosted on the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-18.
Fusagasugá municipality in Colombia monitors the behavior of its water sources, including surface and underground streams. The dataset includes columns for location coordinates (Este, Norte), source names, activity types, and measurement dates. It is hosted by the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-18.
Corantioquia jurisdiction in Colombia contains data on indigenous communities participating in environmental culture processes. The dataset includes information on location, indigenous community, reservation, ethnicity, legal acts or resolutions, and titled area in hectares. It was published on the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-18.
A multi-domain reasoning dataset built to improve frontier models by revealing their failures and turning expert grading into training signal. The dataset pairs self-contained tasks with weighted rubrics across three domains — Computer Science, Data Science, and Chemistry. It was created by TuringEnterprises and last updated on 2026-06-16.
Historical data on teachers by academic level in the public sector for the urban and rural zones of the municipality of Sabaneta. The dataset includes columns for Sector, Year, Zone, Quantity, and Academic Level. It is hosted by www.datos.gov.co and was last updated on 2026-05-18.
Tiny-Ko-Stories is a dataset of 2,003,542 original Korean short stories, created by author psymon and last updated on June 13, 2026. Inspired by the English TinyStories dataset, it was generated from scratch in Korean to test if small models can demonstrate reasoning and creativity with limited, high-quality data. The dataset includes Korean-specific elements like native names, sentence rhythm, onomatopoeia, and small event structures.
Weekly updated registry of corporations and limited liability companies in Oregon designated as benefit companies. The dataset includes business names, official registry numbers, entity types, and the dates of their benefit designation. Columns suggest it provides details on companies that have committed to creating a public benefit alongside profit.
Colombian national and regional data on the educational level of individuals who entered the reintegration process, as of a specific cut-off date. The dataset is published by datos.gov.co and was last updated on 2026-05-18. It includes columns for municipality, department, process status, and educational level.
Xin-Rui released the ImagineTime benchmark in 2026 to evaluate image generation models. It contains 750 benchmark cases designed to test a model's ability to produce ordered 2x2 motion sheets with coherent entities and state transitions. The dataset was published with the paper 'Can Image Models Imagine Time?' and is hosted on Hugging Face.
Student enrollment data from the Digital University Institution of Antioquia for the 2024-02 academic period. The dataset includes 11 columns covering biological sex, place of birth, academic level, program, semester, and course load. It was published on the Socrata platform via datos.gov.co and last updated on May 18, 2026.
Administrative and provisional career officials who currently work in the different secretaries and offices of the municipal administration. The dataset includes columns for employee name, department, hire date, salary assignment, and job title. It was published on the Colombian open data portal, datos.gov.co, and was last updated on 2026-05-18.
University of Cauca maintains a directory of its current units and dependencies. The dataset includes names, descriptions, contact information, and geographic coordinates for each entry. It was last updated on May 18, 2026, and is hosted by the Colombian open data portal www.datos.gov.co.
A machine learning framework achieved test R² values from 0.942 to 0.963 for predicting missing Sonic logs and 0.927 to 0.930 for Gamma Ray logs. This dataset from the University of Kansas supports a study on using K-Nearest Neighbors regression to address gaps in geophysical well data. The workflow involves correlation-guided feature selection and min–max normalization on data from five wells.
Source data files for the manuscript "Gating Crosstalk in Potassium Channels". The 3.7 GB ZIP archive contains structures and parameters for molecular dynamics simulations, including input TPR files, initial and final structures, and MDP parameter files for each parallel simulation. The dataset was authored by GU and last updated on May 26, 2026.
Student enrollment records for the University of Valle disaggregated by campus, faculty, and academic program per semester. The dataset covers undergraduate and graduate students from 2000 to 2022. It is hosted by the Colombian open data portal www.datos.gov.co.
Pacific Ocean and Monterey Bay data from three drifting buoys deployed in fall 1992. Two buoys were air-launched near 140W longitude, and one was attached to a fixed mooring in Monterey Bay, which was later recovered for post-calibration. The dataset likely contains time-series measurements of oceanographic properties.
A coded data collection sheet for archiving admissions and statistics in Ear, Nose, and Throat (ENT) medicine. The dataset includes a DOCX file and an SPSS program for data extraction from multiple files, authored by Hany Amin Riad and last updated on May 23, 2026. It is shared under a CC-BY-4.0 license on figshare.
2009-2016 data from the GG-HDSS site in Southwest Ethiopia shows the distribution of causes of death across different age groups. The dataset was authored by Desalegn Shiferaw and is available under a CC-BY-4.0 license. It is a small dataset, stored in a 5.5 KB XLS file.
A 21.5 MB audio file in WAV format, uploaded to figshare by Marie-Annick Moreau and last updated on June 3, 2026. The recording captures a discussion where KST checks his understanding of the moral of the 'Nani wewe' song with women. The dataset is licensed under CC-BY-NC-SA-4.0.
A 63.2 KB PDF file containing documentation or a transcription of the 'Majungu' song. The description indicates the song was performed by women while working, specifically while sweeping, hauling a net, and circling a pond. The dataset was authored by Marie-Annick Moreau and last updated on June 3, 2026.