Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
145,222 datasets
Biomechanical data from thirteen healthy volunteers walking on an instrumented ramp at inclinations of 0°, 7.5°, and 10°. Data includes raw and processed files from a 3D marker-based gait system and three force plates, covering twelve trials per condition for most participants. The dataset was published by Johanna Vielemeyer in 2026 and is available under a CC-BY-4.0 license.
Weekly influenza surveillance data from Anhui Province, China, spanning 2015 to 2025. The dataset includes incidence rates, viral subtypes, and meteorological indicators. It was used to develop a stacked ensemble forecasting model by Qingqing Zhu.
City of Denver Pending Short Term Rentals is a dataset of short-term rental business licenses with associated pending renewal records. The dataset is provided by data.colorado.gov and was last updated on 2026-05-29. It has been deprecated, with users directed to a combined dataset for current information.
Baseline data from 6,858 adults aged 35–75 years in the Luohe, Henan, China screening cohort from March 2021 to February 2022. The dataset was used to examine the association between the body roundness index (BRI) and World Health Organization-defined high cardiovascular disease risk. The study applied multivariable logistic regression and an explainable machine-learning workflow including LASSO, random forest, and SHAP.
Voyager 2 Saturn encounter magnetometer data resampled to a 1.92-second rate from the original 60-millisecond instrument sampling. The dataset includes calibrated magnetic field vector components in Saturn-centered Kronographic (L1) coordinates, measured in nanotesla, and covers the period from the solar wind through at least the first magnetopause crossing. NASA produced this Level 4 processed dataset, with processing starting on 1988-09-21.
Voyager 2 Saturn encounter magnetometer data from the Low Field Magnetometer (LFM) resampled at a 9.6-second sample rate. The dataset includes calibrated magnetic field vector components in Saturn-centered Kronographic (L1) coordinates, measured in nanotesla, with coverage beginning in the solar wind and continuing through at least the first magnetopause crossing. Data was processed by the National Aeronautics and Space Administration, with processing starting on 1988-09 21.
Marie-Annick Moreau authored a 20.3 KB dataset published on figshare in June 2026. The data consists of a translated conversation in EAF format, where Mauridi Omari Mpendu requests help for people in Rufiji and Marie-Annick Moreau explains her limitations and goals for the EMKP project.
A metadata catalog from Metrolinea, a public transport operator in Colombia, describing its published information assets. The schema lists details such as the information title, format, language, and responsible parties, as published on the Colombian open data portal. The record was last updated on 2026-05-18.
2011 to 2024 panel data covers 30 provinces in mainland China, excluding Hong Kong, Macao, Taiwan, and Tibet. The dataset includes key indicators for urban–rural integrated development and the level of digital rural construction. It was authored by ZHAOYANG LU and last updated on 2026-05-31.
A machine learning model trained on 7,974 ALS patient records from the PRO-ACT database and a supplemental cohort of 678 advanced-stage patients. The gradient-boosting model achieved a C-index of 0.709 for survival prediction and successfully stratified patients into low-, average-, and high-risk tertiles. The model and supporting data were published by Danielle Beaulieu on figshare in April 2026.
The George V Land shelf in East Antarctica is the focus of this dataset. Over 2000 kilometres of high-frequency echo-sounder data were collected between February and March 2000 to study seafloor morphology. The acoustic facies are interpreted in terms of glacial and oceanographic influences since the Last Glacial Maximum.
Tequila, Jalisco, Mexico is the location for this dataset of 4,038 records collected over 87 days from IoT sensors monitoring industrial wastewater. The data includes measurements of suspended solids, dissolved oxygen, turbidity, and electrical conductivity, used to predict chemical oxygen demand (COD) with machine learning models. It was created by Alfredo Figarola-Figarola and last updated on 2026-04-17.
June to September 1994 data from the BOREAS TF-11 team measuring methane and carbon dioxide fluxes at the SSA-Fen site. Measurements were part of a 2x2 factorial experiment with carbon and nitrogen additions across four replicate locations. The dataset includes environmental variables like air temperature and water table height.
NASA's 29-year time series provides annual snow cover data for Northern Hemisphere land areas above 45 degrees North. It contains gridded variables for the week of snow disappearance, week of snow cover onset, and duration of the snow-free period, along with summary statistics of their mean and standard deviation. The dataset is available in formats including BIN, ISO, and HTML.
VNP03MOD_NRT provides terrain-corrected geolocation data for the VIIRS/NPP satellite's moderate resolution bands at a 750-meter nominal resolution. The dataset includes geodetic latitude, longitude, surface height, solar and satellite viewing angles, and a land/water mask for each sample. This product serves as a foundational input for generating subsequent VIIRS land surface data products.
Active Shared Housing Registrations is a dataset from the City of Chicago containing registration data for shared housing units regulated under the Shared Housing Ordinance. The dataset tracks the status, location, and administrative details of registered units, with columns suggesting information on host names, addresses, expiration dates, and political precincts. It is maintained by data.cityofchicago.org and is available on multiple data platforms.
MVR-Bench is a benchmark dataset for evaluating AI agents on market-permission reasoning in African, emerging, and high-context markets. It was published by African Market OS and originated by Farouk Mark Mukiibi, with a public development split released on May 25, 2026. The benchmark measures the 'Reckless-GO Rate,' which is the share of cases where an AI agent overclaims market-entry readiness.
Forest cover polygons for the province of New Brunswick, interpreted from aerial imagery on a 10-year cycle. The attributes contain information describing the stand characteristics for each polygon area. The data is hosted by the Government of New Brunswick on the Socrata platform and was last updated in May 2026.
Location data for all recycling facilities within the Causeway Coast and Glens Borough Council area. The dataset is provided by the Government Digital Service under the UK Open Government Licence and is available in multiple geospatial formats including KML, GeoJSON, and ESRI Shapefile. Further operational details like opening hours are referenced via a link to the council's official website.
Recycling facilities within the Causeway Coast and Glens Borough Council area in Northern Ireland. The dataset is published by the Government Digital Service under the OGL-UK-3.0 license and is available in HTML and JSON formats. The last update date is unknown.