Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
153,813 datasets
472 pregnant women undergoing labor induction were randomized to receive 2g oral Azithromycin (n=236) or no treatment (n=236) in a single-center trial. The dataset contains primary and secondary outcomes measuring perinatal infection rates, maternal and neonatal complications, delivery mode, and safety parameters. It was authored by huimin Cao and last updated on 2026-05-24.
An index of information from the Colombian National Unit for Disaster Risk Management (UNGRD) that is restricted by law or regulated for specific classes of persons, identified as classified or reserved. The dataset includes 13 columns detailing the legal basis, responsible parties, and duration of classification. It was published on datos.gov.co and last updated on 2026-05-18.
The Canadian Environmental Sustainability Indicators program provides data tracking terrestrial snow cover over Canada. Indicators include spring snow cover extent, annual snow cover duration, and March snow water equivalent, with data presented in maps, charts, and CSV tables. The dataset is produced by Environment and Climate Change Canada and was last updated on 2026-04-23.
Prox-E ShapeTalk Benchmark is a subset of 600 random samples from the ShapeTalk dataset, curated for evaluating 3D shape editing models. It is the official benchmark for the SIGGRAPH'26 paper 'Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions'. The dataset was created by author 'haopt' and last updated on May 30, 2026.
Produced during the GRIP Field Experiment, this dataset contains satellite-derived overshooting top magnitudes for tropical storms and hurricanes. It was created by NASA for use with the Real Time Mission Monitor tool to study storm formation and intensification. The data is visualized as color-coded overlays in Google Earth.
San José de Cúcuta municipality provides data on free public Wi-Fi zone usage from 2021 to 2022. The dataset likely contains session counts and user demographics, including gender, age, device type, and operating system. It originates from the Colombian open data portal www.datos.gov.co and was last updated in May 2026.
Northern California and Nevada are the geographic scope for this data release. It contains raw timeseries and metadata for 827 minidisk infiltrometer measurements conducted across nine burned areas and nearby unburned areas. Scott McCoy authored the dataset, which covers measurements from 2018 to 2023.
1995 to 2014 monthly gridded climatologies of total lightning flash rates derived from two satellite-based sensors, the Optical Transient Detector (OTD) and Lightning Imaging Sensor (LIS). The dataset provides a merged, long-term record, with robust tropical and subtropical coverage from LIS and high-latitude data from OTD. It is produced by the National Aeronautics and Space Administration and is available in formats including BIN, ISO, HTML, and PDF.
ARTPARK-IISc's Vaani Benchmark V1.0 is a curated Hindi automatic speech recognition (ASR) evaluation set. It contains 5,343 audio segments from 1,103 speakers across 104 Indian districts, totaling approximately 11.7 hours. Each audio segment includes three independent human transcriptions.
NASA's MEaSUREs program provides a daily record of global landscape freeze/thaw status at 6 km resolution. The data is derived from microwave radiometer observations by JAXA's AMSR-E and AMSR2 instruments. This dataset is maintained by NASA and is available on multiple government platforms.
Geoscience Australia's Science Principles document outlines six foundational principles guiding its scientific work. The principles, which include Relevance to Government and Quality Science, are embedded into the agency's long-term strategic planning and daily operations. The document, published by the Australian Ocean Data Network, was last updated on May 5, 2026.
1,507 episodes of robot demonstrations for the Tower of Hanoi puzzle, comprising 3,264,454 frames at 60 frames per second. The dataset was created using LeRobot and is hosted on Hugging Face by the author jellyho. It was last updated on 2026-06-13.
Global lightning signatures were detected from visible channel imagery by the Defense Meteorological Satellite Program (DMSP) Operational Linescan System (OLS) flown on satellite F12. The dataset contains extracted time and location data for each lightning streak, stored in monthly HDF files. It was produced by the National Aeronautics and Space Administration and covers a seven-month period from May through November 1995.
1.6 MB of mass spectrometry data from a study identifying photoproducts of the antibiotic sulfamethoxazole (SMX) under environmentally relevant UV irradiation (300–350 nm). The dataset was authored by Pavla Fojtíková and last updated in June 2026. Files are provided in XML and MGF formats.
NSIDC satellite data aids investigations of variability and trends in sea ice cover. It provides measurements of sea ice concentration, extent, ice-covered area, persistence, and monthly climatologies. The dataset is produced by the National Aeronautics and Space Administration (NASA).
Contract disclosure reports for the Department of Sport, Racing and Olympic and Paralympic Games for the first two quarters of the 2025-26 financial year. The data was published by the Queensland Government's Sport, Racing and Olympic and Paralympic Games organization and was last updated on May 29, 2026. It is available as a CSV file under a Creative Commons Attribution 4.0 license.
Queensland Corrective Services publishes monthly counts of specific incident types within custodial centers for the year 2020. The dataset likely contains tabular time-series data tracking incidents over 12 months. It is available under a Creative Commons license and is published on multiple government data platforms.
Data from data.colorado.gov lists all state liaisons (lobbyists) and the years they were registered with the Colorado Department of State (CDOS). The dataset includes lobbyist names, contact information, associated state agencies, and registration status. It was last updated on 2026-05-29 11:09:30.
December 2018 to April 2019 model results from a 4km-resolution regional-scale biogeochemistry and sediments simulation of the Great Barrier Reef. The dataset, part of the eReefs simulation suite, represents a hindcast run with a pre-industrial catchment scenario, forced by a hydrodynamic model and specific catchment inputs. It serves as a comparative benchmark alongside baseline and reduced-load catchment scenarios.
Estadísticas Solicitudes Demanda shows all restitution requests (lawsuits) filed by the Administrative Unit for Dispossessed and Abandoned Lands before specialized land restitution judges, as mandated by Law 1448 of 2011. The dataset is published by www.datos.gov.co and was last updated on 2026-05-18. It likely contains counts of legal demands broken down by regional office, territorial office, year, and month.