Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
168,883 datasets
Metadata associated with a research study is provided by author ANKITA MITRA. The dataset is a 44.3 KB DOCX file published under a CC-BY-4.0 license. It was last updated on June 3, 2026.
The Join is a collection of 650 relational databases spanning domains like academia, e-commerce, finance, sports, and biomedicine. It is ported to the RelBench manifest format and built for pretraining relational and tabular foundation models. The dataset was created by relbench and was last updated on 2026-06-12.
Omni Dreams Samples is a curated dataset of single-view driving sequences for evaluating the NVIDIA AlpaDreams world model. The dataset includes ground truth videos, HD-map rasterized conditioning videos, first-frame RGB images, and text prompts. It was authored by NVIDIA and last updated on June 4, 2026.
GOES-16 and GOES-17 satellite data captures lightning flash and event characteristics over a four-day period in March 2021. The National Aeronautics and Space Administration provides both raw (L1b events) and processed (L2 flashes) data from its Geostationary Lightning Mapper instrument. Data is stored in netCDF-4 and HDF-5 formats.
Originally classified as Confidential, these topographic maps were created primarily for military purposes between 1981 and 1989. The dataset includes variants for topographic maps (TK) and topographic city maps (TSP) at a scale of 1:25,000. It was produced by the Bundesamt für Kartographie und Geodäsie and last updated in 1987.
An information publication schema from the Municipal Administration of Versailles, Valle del Cauca, Colombia. It catalogs published and to-be-published information under proactive disclosure laws, with 10 columns describing metadata like format, frequency, and responsible parties. The dataset is hosted on the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-18.
NASA's Crustal Dynamics Data Information System (CDDIS) archives high-rate broadcast ephemeris data from multiple Global Navigation Satellite Systems (GNSS). The dataset includes sub-hourly files for the Indian Regional Navigation Satellite System (IRNSS) and the Russian GLONASS system, each containing 15 minutes of navigation data in RINEX format from a global network of receivers. Since 2011, the archive has expanded to include data from Galileo, Beidou, QZSS, and Satellite Based Augmentation Systems (SBAS).
NASA's CDDIS provides daily files of 30-second sampled Global Navigation Satellite System (GNSS) observation summaries from a permanent global network of ground receivers. Since 2011, the archive has expanded beyond GPS and GLONASS to include data from Galileo, Beidou, QZSS, IRNSS, and SBAS. These RINEX-format files contain all distinct navigation messages received per site per day, supporting precise positioning and Earth science research.
Global ground-based receivers collect daily broadcast ephemeris data from multiple satellite navigation systems, including GPS, GLONASS, Galileo, Beidou, and IRNSS. The data is stored in RINEX format, with one file generated per receiver site per day. This dataset supports precise geospatial positioning and Earth science research.
NEGOCIOS VERDES is a dataset from datos.gov.co detailing companies certified as green businesses in Colombia. It includes columns for business identification, location, sector, and descriptions of positive environmental impact. The dataset was last updated on 2026-05-18.
Scottish Local Authorities maintain lists of libraries within their council areas. The data likely includes both static buildings and mobile library services with scheduled stops. The dataset is provided by the Scottish Government via SpatialData.gov.scot and was last updated in May 2026.
Official evaluation assets for the NuPlan / MTGS Track of the AlpaSim End-to-End Closed-Loop Challenge 2026. The dataset is managed by a trusted evaluator and consists of 15 tar.gz archive files. It was published by OpenDriveLab and last updated on June 12, 2026.
Raw input and output data for THAMES v5.1.0 simulations of gypsum dissolution, portlandite-to-calcite conversion, and hydration of portland cement paste. The data was authored by Jeff Bullard and last updated on 2026-06-29. It originates from the Texas Data Repository Harvested Dataverse.
A 584 KB compilation of available data on the jaguar, aggregated from multiple databases. The dataset was created by Iulian Denis Viorica and last updated on May 30, 2026. It is provided as an XLSX file under a CC-BY-4.0 license.
Casos Atendidos por la Comisaría de Familia records cases handled by Family Commissioner offices in Colombia. The dataset includes columns for YEAR, INTERVENTION RESULT, ATTENTION DATE, CASE TYPE, APPLICANT AGE, OBSERVATIONS, MUNICIPALITY, and APPLICANT GENDER. It is hosted by www.datos.gov.co and was last updated on 2026-05-18.
Gachancipá's publication schema details information proactively disclosed by the municipality under Colombia's Law 1712 of 2014. The dataset catalogs available records with columns for title, responsible area, format, and update frequency. It is published via the Colombian open data portal, datos.gov.co, and was last updated on May 18, 2026.
Colombia's Higher Education Development Fund (FODESEP) publishes this transparency scheme in compliance with Law 1712 of 2014. The dataset lists information series, responsible areas, and publication sites. It was last updated on 2026-05-18 and is hosted on the Colombian open data portal.
Samples of Temperature and Salinity from a CTD profile have been measured at Station 1 in the Derwent Estuary between August 2012 and January 2013. The data is provided by the Australian Ocean Data Network and was last updated on the platform in June 2026. It is available in formats including NetCDF, HTML, and PNG.
CTD profile measurements of temperature and salinity were collected at Station 2 in the Derwent Estuary. The data collection period spans from August 2012 to January 2013. The dataset is provided by the Australian Ocean Data Network.
Victoria, Australia's public land management data, derived from tabular and spatial sources including PORTAL, PRIMS, PARKRES, VEACRECS25, and CL_TENURE. The dataset describes primary management, land manager, and VEAC recommendations for state forests, parks, Crown land, and coastal areas to a 3 nautical mile limit. It is maintained by the Department of Energy, Environment and Climate Action and was last updated on 2026-04 09.