Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
167,986 datasets
Snow depth, snow water equivalence (SWE), snow wetness, and snow pit data collected from two pine sites and a small clearing at the Local Scale Observation Site (LSOS) in northern Colorado. The dataset was created by the National Aeronautics and Space Administration as part of the Cold Land Processes Field Experiment (CLPX). Data collection concluded in March 2003, though metadata records show a later administrative update.
Cover and Management (C-factor) data for New South Wales, provided monthly from 2021 to 2030. The dataset is published by the NSW Department of Climate Change, Energy, the Environment and Water and was last updated in May 2026. It is available under a Creative Commons Attribution 4.0 International license.
A registry of non-official educational establishments in the municipality of Soledad, Colombia, for the year 2020. The dataset includes 30 columns detailing administrative, operational, and demographic characteristics of each school. It is hosted on the Colombian open data platform www.datos.gov.co and was last updated in May 2026.
Supplementary data for a study on photochemical processes in shallow Antarctic blue ice and their effect on trapped greenhouse gases. The 26.4 KB Excel file, authored by Giyoon Lee and last updated in June 2026, is shared under a CC-BY-4.0 license. Its specific row and column structure is not detailed in the available metadata.
A dataset created on 2026-06-20 using the LeRobot platform. It contains robot action data, likely for training or testing machine learning models. The author is Tenry55, and the dataset was last updated on 2026-06-20 09:43:59.
The KORUS-OC expedition was a venture among scientists from the Korean Institute of Ocean Science and Technology (KIOST), NASA, and other institutions to study daily changes in the seas surrounding South Korea. It is a cross-platform dataset available in BIN and ISO file formats.
Synthetic and sanitized training and evaluation data for Jawbreaker, a local-first scam defense application. The dataset, created by build-small-hackathon, includes evaluation sets ranging from smoke checks to hard calibration suites. It was last updated on June 10, 2026.
Final reports published by Cessnock City Council on the data.gov.au platform. The dataset consists of HTML documents related to a project or initiative named 'Black Creek Stage 2'. The last update was recorded on 2026-06-14 18:41:32.814746.
A 2023-2024 registry from Colombia's datos.gov.co platform, last updated in May 2026. It catalogs government information assets, detailing their classification status, legal basis, and custodianship. The dataset includes columns for data type, purpose, confidentiality level, responsible offices, and legal justification for classification.
Pioneer 10 trapped radiation detector 1 hour data. The dataset was created by the National Aeronautics and Space Administration (NASA) and was last updated on the platform in April 2026. It contains measurements from the spacecraft's mission to Jupiter.
57 benchmark combinations measure the performance of five Bonsai large language models on an NVIDIA Jetson Orin Nano. The dataset includes results for 12 prompt and generation length configurations, with each combination tested over 20 requests. It was created by YuvrajSingh9886 and last updated on May 27, 2026.
OmniVideo-100K is an instruction-tuning dataset introduced in the paper 'OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains'. It contains 100,000 training samples split into 70,000 open-ended and 30,000 multiple-choice questions. The dataset was created by MiG-NJU and was last updated on the Hugging Face platform in June 2026.
Familias en Acción Phase IV beneficiary registry for the municipality of Tubará in the Atlántico department of Colombia for the year 2021. The dataset contains individual and family-level records, including columns for Documento, Primer Nombre, Segundo Nombre, Primer Apellido, and Segundo Apellido. It is hosted on the Colombian open data portal www.datos.gov.co and was last updated in May 2026.
Citizen participation event records developed by the Financial Superintendency of Colombia. The dataset is updated periodically following approval and publication on the Superfinanciera website. Columns suggest details on event outcomes, activities, attendance, and feedback.
Land Use Progress Boundary Polygon (LUP_PRGS_POLY) describes the planning or project area for Land Use Plans (LUPs) that are in progress, as managed by the Bureau of Land Management. The dataset is provided by the Department of the Interior and was last updated on April 9, 2026. It is designed to ensure no more than one LUP is in progress in a given area, with polygons removed upon plan activation.
Land Use Planning Current Polygon describes the planning or project area for the Land Use Planning Current dataset. The dataset contains active land use plans, typically Resource Management Plans and amendments with a signed Record of Decision, and extends to adjacent Plan Area Boundaries with no gaps or overlaps. It is provided by the Department of the Interior's Bureau of Land Management and was last updated in April 2026.
20 GiB JSONL checkpoints for training code completion and fill-in-the-middle models. The dataset is structured in upload-ready units, with a long-term target of 400 GiB total. It is authored by aisamdasu and was last updated on June 11, 2026.
2011 to 2020 monthly estimates of hillslope cover erosion rates, measured in tonnes per hectare per month, across New South Wales. The dataset is published by the NSW Department of Climate Change, Energy, the Environment and Water. It provides a decade-long time series for analyzing soil loss patterns.
An inventory of vehicles registered in the municipality of Envigado, Colombia, with records through December 2018. The dataset is hosted on the Colombian open data portal www.datos.gov.co and was last updated on the platform in May 2026. It contains details on vehicles that circulate daily on municipal roads.
47,772 tables derived from 1,379 parent tables from TabFact and WikiTableQuestions, fragmented at four cumulative noise tiers. The dataset is part of the TRL-Bench suite for evaluating tabular encoders, created by logo-lab and last updated on June 11, 2026.