Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
166,197 datasets
Weekly updated records of alcohol licenses across Missouri counties track the status and details of over 23,000 active and expired permits. This dataset provides a detailed view of licensees, managers, and associated fees, compiled by county clerks and stored with a three-week rolling window. Columns suggest it supports compliance monitoring, market analysis, and business verification for the state's regulated alcohol industry.
Great Barrier Reef, specifically Davies and Myrmidon Reefs, are the focus of this dataset. It contains surficial cover facies maps published by the Australian Ocean Data Network. The dataset was last updated on 2026-06-27.
Mahamudul Hasan's dataset supports a 2026 study on a unified AI-driven predictive maintenance framework. It contains two processed datasets: 19,535 real-world OBD-II automotive engine sensor observations and the NASA C-MAPSS FD001 turbofan engine dataset labeled for remaining useful life. The repository includes model outputs, feature importance values, and threshold optimization results to enable replication of the research.
Aegis AI released a benchmark in 2026 containing 2,288 multi-step agent trajectories for evaluating AI-agent governance verifiers. It includes 513 hand-authored gold-standard trajectories and 1,775 provenance-flagged augmented examples. The dataset is designed to score whether a verifier catches drift inside an agent's trajectory, not whether a prompt is harmful.
The Topographic Map 1:25 000 (DTK25) is an official topographical map series for Germany with scale-related completeness and accuracy. It is provided by the Bundesamt für Kartographie und Geodäsie via an INSPIRE download service, allowing tile-based downloads in 1,806 individual files. The dataset is available under a CC-BY-4.0 license.
Active licensed retailers selling New York state lottery products are listed with business names and addresses. The dataset includes geospatial coordinates and indicates which locations offer the Quick Draw game. Columns suggest integration with New York state geographic boundaries and census data for potential spatial analysis.
Council Member Expenses from data.winnipeg.ca details spending by city council members. Records begin on January 1, 2014. The dataset is available in CSV, JSON, XML, and RDF formats.
Plot-level average leaf chemical concentrations for nitrogen, lignin, and cellulose were calculated from green leaf and litterfall samples. This dataset was created by NASA to investigate relationships between leaf chemistry and AVIRIS airborne reflectance measurements. The core data originates from the 1992 Accelerated Canopy Chemistry Program (ACCP).
Productos de licor registrados en el Departamento de Risaralda para su distribución con información de su REGISTRO INVIMA la vigencia del registro INVIMA, el ORIGEN (N = Nacional, I = Importado) y grado de alcohol que contiene. The dataset is provided by www.datos.gov.co and was last updated on 2026-05-18. It lists registered alcoholic beverages approved for distribution in the Risaralda department of Colombia.
January to March 2026 data from the Somalia Humanitarian Needs and Response Plan (HNRP). The dataset contains district-level reported and cumulative reach, inter-cluster coverage, and operational priority classifications, compiled by OCHA Somalia and last updated in May 2026.
VNP43IA1N provides Bidirectional Reflectance Distribution Function (BRDF) and albedo model parameters derived from VIIRS/NPP satellite imagery. The product is generated daily by NASA's LANCEMODIS team using a 16-day rolling window of data and the RossThick/Li-Sparse-Reciprocal kernel-driven model. Each file contains six Science Dataset layers, including quality bands and the fiso, fvol, and fgeo parameters for the I1, I2, and I3 spectral bands.
BRDF/Albedo model parameters provide a 1 km resolution global daily snapshot of land surface radiative properties. The dataset contains 24 Science Dataset layers, including mandatory quality bands and three model parameters (fiso, fvol, fgeo) for multiple VIIRS spectral bands. It is produced by the LANCEMODIS organization using a 16-day rolling window of VIIRS/NPP satellite data and the RTLSR kernel-driven model.
Revenues for the Colorado Department of Transportation for the current and previous state fiscal year. The dataset includes columns for Account Date, Customer Description, Funding Source, Account Description, CDOT Segment, and AMOUNT. It is provided by data.colorado.gov and was last updated on 2026-05-29 11:07:13.
436.0 KB of anonymized data supporting research on how short video content presentation relates to user cognition and digital interaction, using Douyin (TikTok) Peking Opera videos as a case study. The dataset, authored by Jia Wang and last updated on 2026-05-23, includes two CSV files for main analysis and robustness checks. It contains variables for content presentation, creator identity, engagement metrics, visible cognitive response proportions, and control variables.
MTA Open Data Catalog is a metadata inventory listing datasets the Metropolitan Transportation Authority shares or plans to share on New York's open data portal. The dataset includes columns for tracking dataset status, posting frequency, and agency information. Its presence on multiple platforms indicates it serves as a central reference for MTA's public data offerings.
Mosquito species data collected in Forshaga municipality, Sweden, between 2019 and 2021. The dataset includes collection date, week, sampling site, estimated number of individuals, and species functional groups. It was authored by Jenny Hesson and is available under a CC-BY-4.0 license.
NASA's dataset provides canopy height, land cover change, and stand age estimates for mangrove forests in Tanzania's Rufiji River Delta. The data was derived from TanDEM-X and Landsat imagery using Pol-InSAR techniques and canopy height modeling. It covers a time range from 1990 to 2014.
Socrata platform data tracks citizen participation activities conducted by Colombia's Ministry of Justice and Law from 2020 through 2025. The dataset includes 30 columns for characterizing activities, measuring dialogue perception, tracking financial resources, and monitoring commitment progress. It was last updated on May 18, 2026, via the www.datos.gov.co portal.
Benchmark data compares file sizes and access speeds for compressed genomic sequence formats. The dataset includes size ratios relative to FASTQ.GZ and speedup ratios comparing FASTQ.GZ to BINSEQ access times. It was authored by Noam Teyssier and last updated on May 28, 2026.
A 5.0 MB Excel workbook containing the underlying numerical data used to generate figures for a research project. The data is organized across multiple sheets corresponding to figure numbers, with a README sheet explaining connections and key functions. The dataset is licensed under CC-BY-4.0 and was last updated on 2026-05-07.