Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
153,224 datasets
SIVIGILA 2019 provides systematic information on the dynamics of events affecting the health of the Colombian population. The dataset likely contains case counts aggregated by event type, municipality, and epidemiological week. Columns such as COD_EVE, COD_MUN_O, and SEMANA suggest a structure for tracking disease incidence across administrative divisions over time.
HHBlits DHFR alignment with gaps present is a collection of dihydrofolate reductase (DHFR) protein sequences. The dataset, created by Francois D. Rouleau, contains sequences with less than 70% homology to PjDHFR to avoid skewing GEMME analysis with highly similar sequences. It was last updated on 2026-05-27 and is available in FASTA format.
Yumin Yan published classification performance results on a 10% sample of the Today's Headlines dataset on May 27, 2026. The dataset is a 5.5 KB Excel file hosted on figshare under a CC-BY-4.0 license.
5.5 KB of experimental results comparing different models on a 10% sample of 'Today's headlines'. The dataset was authored by Yumin Yan and last updated on 2026-05-27. It is available under a CC-BY-4.0 license on the figshare platform.
Supplementary Table S1 from a 2026 study on colorectal cancer risk factors. This 12.4 KB Excel file details the distribution of cases and controls across participating studies, genotyping platforms, and study sites. Joel Sanchez Mendez authored this table, which is shared under a CC-BY-4.0 license.
Supplementary Table S2 provides summary statistics for colorectal cancer-related risk factors in a study cohort. The 13.0 KB XLSX file, authored by Joel Sanchez Mendez, was last updated on June 2, 2026 and is shared under a CC-BY-4.0 license on figshare. The dataset likely contains aggregated participant data used to analyze the interaction between red meat intake and a novel pathway-based polygenic risk score.
From 1980 to 2010, this dataset provides detailed sources of tax revenue for 41 countries in Sub-Saharan Africa. It was constructed to identify problems with tax revenue statistics in the region and suggest improvements. The data supports analysis of revenue evolution by income levels, the relative importance of extractive industry taxes, and trade blocs.
A 26.8 MB raster dataset containing the uncertainty associated with a stand basal area map. The spatial resolution of the uncertainty map across forest cover is 0.027° × 0.027°. It was authored by ANKITA MITRA and last updated on June 3, 2026.
Emserpa E.I.C.E. E.S.P.'s Information Publication Schema is a metadata catalog used by obligated entities to report on information published on the entity's website and other media. The dataset includes columns such as TITULO_INFORMACION, FORMATO, FRECUENCIA_ACTUALIZACION, and RESPONSABLE_DE_LA_INFORMACION. It is hosted on the www.datos.gov.co platform via Socrata and was last updated on 2026-05-27.
Snow cover days per year and 10-day snow depth means from stations across Estonia. The dataset provides over a century of measurements, with snow cover data spanning 1891-1994 and depth means from 1891-1990. It was compiled by the National Aeronautics and Space Administration.
Seven epochs of land cover data for the state of Victoria, Australia, spanning from 1987-1990 to 2015-19. The dataset classifies each pixel into one of 19 land cover classes using Landsat satellite imagery and local calibration data. It is provided by the Department of Energy, Environment and Climate Action and was last updated in April 2026.
Facilities Management annually prequalifies contractors and subcontractors for public works projects. The list includes company details, trade classifications, and maximum contract dollar values. This dataset is published by data.delaware.gov and was last updated on 2026-05-29.
DAPO-MATH-17k-oss-reasoning contains reasoning trajectories produced by the gpt-oss-120b model on the DAPO-Math-17k dataset. The dataset includes trajectories generated under three distinct effort levels—Low, Medium, and High—with corresponding average token counts of 1,300, 2,936, and 8,419. It was created by user thuzhizhi and last updated on June 2, 2026.
July 2012 to present, this prototype dataset provides daily 4 km sea ice concentration for the Arctic. It is a blended product combining ice coverage from the Multisensor Analyzed Sea Ice Extent (MASIE) product with ice concentration from the Advanced Microwave Scanning Radiometer 2 (AMSR2). The data was developed to provide greater accuracy and higher resolution for initializing operational sea ice forecast models.
Fuel price data from service stations in Queensland, Australia. The dataset is provided by the Queensland Treasury and was last updated on May 28, 2026. It is available for download in CSV and TXT formats.
RAILLINE_OFFICIAL is a geospatial data layer containing the official rail line network for the province of Saskatchewan. The layer was developed by the Saskatchewan Ministry of Highways & Infrastructure and includes the linear geometry of CN, CP, and provincial/shortline rail lines. This version of the network is used for official mapping, querying, and plotting purposes.
Global burned area polygons derived from MODIS satellite imagery, providing information on spatial and temporal attributes of areas affected by fires. The data is provided as GeoJSON by Copernicus and was last updated on 2026-05-21. Fires are mapped using a semi-automatic procedure, though the reported dates may not correspond to the exact ignition and extinction times.
S-MODE Shipboard Radiometer Measurements Version 1 contains air-sea interaction data collected during the Sub-Mesoscale Ocean Dynamics Experiment pilot campaign. Air-Sea Interaction METeorology sensors on the R/V Oceanus recorded shortwave and longwave radiation fluxes approximately 300 km offshore of San Francisco over two weeks in October 2021. These measurements support the S-MODE mission to understand how small-scale ocean dynamics influence vertical exchanges of physical and biological variables.
Northern Saskatchewan and Manitoba data from the BOREAS project includes ceilometer measurements of cloud fraction, cloud height, and surface-based lifting condensation level. The National Aeronautics and Space Administration collected this information at the NSA-OJP site in 1994 and at both NSA-OJP and SSA-OBS sites in 1996. Data formats include HTML, PDF, PNG, BIN, ISO, ZIP, and TEXT files.
A database of registered and/or renewed SMEs from the last decade, provided by the Chamber of Commerce of Honda, Guaduas and North Tolima in response to a citizen request. The dataset includes 24 columns such as RAZON SOCIAL, IDENTIFICACION, ACTIVIDAD, and FEC-MATRICULA. It was last updated on the platform on 2026-05-18.