Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
169,395 datasets
Colombian data on costs for initiatives registered in the Project Profile system during the 2020 term at the Rural Development Agency. The dataset includes columns for labor, requested funds, investments, and beneficiary contributions. It is hosted by the Colombian open data portal, datos.gov.co, and was last updated on 2026-05-18.
EucFACE P addition paper dataset published by Min ZHAO on figshare. The dataset is 166.0 KB in size and was last updated on 2026-05-20. It is available in RDS and CSV formats under a CC-BY-4.0 license.
A compilation of terrestrial sediment samples and observational data from the Vestfold Hills region in East Antarctica. The dataset likely contains point locations with sample type, analyses, and references to original sources. Data was collected from the 1970s to present from published and unpublished sources by the Australian Ocean Data Network.
NASA's GRIP HIWRAP dataset contains dual-frequency Ka- and Ku-band radar measurements collected from a Global Hawk Unmanned Airborne System during the Genesis and Rapid Intensification Processes experiment. The data, gathered from September 16 to 24, 2010, primarily over the Gulf of America, provides calibrated reflectivity and unfolded Doppler velocity to study tropical storm formation and hurricane development. It was produced by the GHRC DAAC as part of a multi-aircraft NASA campaign.
2022-2024 data from 6 EFGH country sites describes children aged 6-35 months presenting with watery diarrhea who had a whole stool tested for inflammatory biomarkers. The dataset was authored by Billy Ogwel and is shared under a CC-BY-4.0 license. It is a small dataset, 13.5 KB in size, stored in an XLS file format.
Dallas Police Public Data from the Records Management System (RMS) detailing modus operandi (MO) for incidents. The dataset includes columns such as Entry_Area, PropertyTarget1, MO, and Method_Of_Entry. It is published by www.dallasopendata.com on the Socrata platform and was last updated on 2026-05-28.
A study from 2026 by Min Xiao analyzes clinical and laboratory data from 1,498 patients to identify risk factors for diabetic microvascular complications. The research compares nine machine learning models, finding a Gradient Boosting Decision Tree (GBDT) model performed best. The dataset, shared on figshare, includes identified independent risk factors such as urea, fibrinogen, and D-dimer.
61 teleoperated episodes recorded for robot learning, totaling 100,143 frames at 30 frames per second. The dataset includes 26-dimensional joint position states and images from three camera views: head, left wrist, and right wrist. It was created by author 120ft and published on the Hugging Face platform.
Replication data from a study identifying distinct mechanisms for competing charge density waves using time-domain techniques. The dataset includes data for Figures 1, 2, and 3, but Figure 4 does not contain original experimental data. It was authored by Yifan Su and hosted on Harvard Dataverse, with a last recorded update in July 2026.
Australian Ocean Data Network provides a flythrough presenting seabed bathymetry compilations for the Australian Antarctic margin. The bathymetry data is derived from a combination of multibeam, singlebeam, and satellite data (ETOPO2). Images showing different types of seabed communities are included for the George V margin and Davis coastline.
2003-2004 austral summer season data includes two geospatial image maps: snow grain size and surface morphology, covering Antarctica. The maps were produced by the National Aeronautics and Space Administration from composited Moderate Resolution Imaging Spectroradiometer (MODIS) swath data. They are provided at two spatial resolutions, 125 meters and 750 meters.
2014/15 to 2024/25 records of counter fraud work undertaken by Trafford Council. The dataset is provided in CSV format and is licensed under the OGL-UK-3.0 open license.
An inventory of government instruments for competitiveness and innovation, published by datos.gov.co. The portfolio includes details on coordination, objectives, target populations, and monitoring for each policy tool. The dataset was last updated on 2026-05-18.
Hydrochemistry analyses conducted at CMAR Floreat on behalf of WAMSI projects. The dataset is provided by the Australian Ocean Data Network and was last updated on 2026-06-23. The specific parameters, sample locations, and temporal coverage are not detailed in the available metadata.
Acoustic index calculations derived from WAV files. The dataset includes results for ACI, BI, ADI, SH, and NSDI indices. It was created by Leonardo Vivar and last updated on 2026-05-28.
NASA's Bowen Ratio Surface Flux Observations (GSFC) Data Set contains surface energy flux measurements collected in 1987. The data originate from a major collection effort involving 16 stationary sites equipped with Bowen ratio equipment within the FIFE area. Measurements from a single grazed upland site include fluxes of net radiation, sensible heat, latent heat, and several micrometeorological parameters.
Boreal forest sites in Canada's Northern Study Area (NSA) provide the context for this 1994 data collection. The dataset contains point measurements of canopy biochemistry, including lignin, nitrogen, cellulose, starch, and fiber concentrations. It was collected to study spatial and temporal changes in forest cover and to validate high-resolution radiative transfer models for remote sensing of biochemical properties.
A list of public and confidential information assets, sourced from the Colombian open data portal www.datos.gov.co. The dataset includes columns for asset classification, custodians, owners, and security attributes like confidentiality and integrity. It was last updated on 2026-05-18.
Historical records of goods and services planned for acquisition by the ANM (Agencia Nacional de MinerΓa). The dataset includes 20 columns detailing contract values, timelines, responsible units, and procurement status. It is hosted on the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-19.
Replication data and programs for the academic paper 'In Their Shoes: Empathy Through Information' by Andries, Bursztyn, Chaney, Djourelova, and Imas. The dataset is hosted on The Quarterly Journal of Economics Dataverse and was last updated on 2026-06-27. It is intended to replicate the tables and figures from the published study.