Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
142,312 datasets
Global 0.25-degree gridded monthly mean leaf area index (LAI) climatology averaged from August 1981 to August 2015. The dataset was derived from the AVHRR GIMMS LAI3g version 2 bi-weekly product, processed to remove missing values and calculate long-term monthly means. It is provided by the National Aeronautics and Space Administration.
Land cover and land use classification for the state of New Hampshire at 30-meter resolution, featuring 23 distinct classes. The dataset was created by analyzing 12 Landsat Thematic Mapper scenes and incorporating over 2,600 training data points from new and archived field sites. It represents a snapshot of land conditions spanning from 1996 to 2001.
Tomoyasu Noji published a study on figshare in 2026 investigating the mechanisms of far-red and near-infrared light absorption in biliverdin-binding proteins. The dataset likely contains the results of quantum mechanical/molecular mechanical analyses used to reproduce absorption wavelengths across a broad set of proteins. The 132.5 KB text file compares the color tuning effects of chromophore conformation, electrostatic interactions, desolvation, and Ο-stacking between biliverdin- and phycocyanobilin-binding proteins.
Road and pavement incidents in York recorded in the City of York Council's CRM tool from November 2019 onwards. The dataset contains the most recent incidents covering a 30-day period, but excludes records created in the last 14 days. It is published as a live API link to the council's GIS server, with data provided by the Government Digital Service under an OGL-UK-3.0 license.
A 30-day rolling window of the most recent water and drainage incidents reported in York, sourced from the City of York Council's customer relationship management tool. The dataset is a live API link to the council's GIS server, with data recorded from December 2019 onwards, though incidents from the last 14 days are excluded. It is published by the Government Digital Service under the OGL-UK-3.0 license.
Yanfei Zhu's dataset from 2026 contains performance metrics for an interpretable 20-microRNA diagnostic model for pancreatic cancer. It includes results from training on 216 samples and external validation across 585 RNA-seq and 30 serum-based samples. The data is stored in a 5.5 KB XLS file.
ACIntel's Engineers Public Register lists all active engineer registrations in the Australian Capital Territory under the Professional Engineers Act 2023. The dataset includes registration details but excludes engineers participating in the Automatic Mutual Recognition scheme. It is published by Access Canberra and available on multiple government data platforms.
Analysis code for a study applying a time-embedded geovisual explainable AI framework to metro ridership in Tianjin. The code reproduces a modeling pipeline using gradient boosting and SHAP values, with models trained separately for the years 2023, 2024, and 2025. The repository includes modules for data loading, model comparison, training, evaluation, and clustering, released under a CC-BY-4.0 license by Chuan Chen.
Kavan Nailesh Shah's dataset contains source code, training data, and ML models for predicting multi-functional properties of CNT-polymer nanocomposites. It supplements a manuscript submitted to Computational Materials Science. The 126.4 MB repository includes files in PY, TXT, IPYNB, PTH, CSV, and RTF formats.
Southern Ocean Monthly Climatology by Yamazaki et al. provides interpolated oceanographic data for the region south of 40 degrees latitude. The dataset, sourced from Argo, MEOP, and World Ocean Database profiles, includes temperature, salinity, and mixed layer depth on a 1/4 to 1/2 degree grid. It was created using Data Interpolating Variational Analysis to incorporate under-ice observations often missed by other datasets.
Wangzhouyang Lou published a dataset on 2026-05-13 containing results from a machine learning and WGCNA analysis of Parkinson's disease. The study identified 5 store-operated calcium entry (SOCE)-associated feature genes, such as LPCAT3 and CLCNKB, using data from the GSE6613, GSE20163, and GSE22491 datasets. The dataset includes validation results from in vitro experiments using 6-OHDA and MPP+ models of dopaminergic neurons.
A list of active taxpayers registered for franchise tax under Texas Tax Code Chapter 171. The dataset includes taxpayer name, address, organizational type, and various status codes. It is hosted on the data.texas.gov platform via Socrata and was last updated on 2026-05-30 10:44:41.
Environment Agency's Waste Electrical and Electronic Equipment recycling (WEEE) Self-Cleared UK Summary contains data reported by Designated Collection Facilities (DCFs) about the amount of WEEE they clear and report themselves. The report contains figures for WEEE delivered to Approved Exporters and Authorised Treatment Facilities, broken down by 13 categories. Data is reported quarterly, with reports available from Q3 2007.
Newer civil penalties issued by the New York City Department of Buildings (DOB) through the DOB NOW system. The dataset includes violations for devices like boilers and elevators, with geographic and administrative details for each case. It is distinct from older DOB violations and summonses adjudicated by OATH/ECB.
A benchmark dataset compiled for evaluating the DMAPLM multimodal pretrained framework for drug repositioning. The dataset was created by Hailin Chen and last updated on April 22, 2026. It is a small dataset of 5.5 KB, stored in an XLS file format.
Official Assessor Parcel Numbers (APNs) from the County of Los Angeles are maintained by the Bureau of Engineering to link property identifiers to spatial parcels. This dataset provides a key crosswalk between administrative APNs and their corresponding Parcel Identification Numbers (PINs) for GIS integration. The associated parcel boundary layer is available separately from the City of Los Angeles's open data portal.
NASA CDDIS provides the DORIS Geocenter Time Series product, derived from Doppler Orbitography and Radiopositioning Integrated by Satellite data. The International DORIS Service (IDS) Analysis Centers compute these solutions, which track coordinates of the terrestrial reference frame origin. The dataset is maintained by NASA and was last updated in March 2026.
Three years of continuous methane and carbon dioxide measurements collected at the 'Arcturus' monitoring station in the Bowen Basin, Australia. The data was used to simulate and statistically analyze the sensitivity of atmospheric techniques for detecting fugitive emissions from a simulated coal seam gas field. The work was presented by Geoscience Australia and CSIRO at the American Geophysical Union meeting in December 2013.
CNHARQ is a novel framework integrating network-based information to forecast multivariate realized volatilities. The dataset includes files for the Correlation-Based Stochastic Block Model, which reduces parameters from O(NΒ²) to O(NK) by uncovering latent community structures among N assets. Authored by Sixuan He and last updated in 2026, the 436.6 MB collection contains PDF, NPY, CSV, and TXT files.
FIFE field campaign data contains ground-based measurements of surface reflectance factors, radiances, and temperatures. These measurements were collected using a portable mast-mounted Modular Multiband Radiometer (MMR), coordinated with aircraft and satellite overpasses. The data supports the validation and calibration of remote sensing observations.