Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
168,192 datasets
JSON annotations for works from medieval manuscripts and printed editions, created by scholars using the IMAGO Annotation Tool. The 11.7 MB repository includes structured information on authors, toponyms, genres, and holding libraries. A corresponding JSON schema is provided as a TXT file detailing the data structure.
A 11.7 MB JSON file containing structured annotations of works from medieval manuscripts and printed editions. The data was produced by scholars using the IMAGO Annotation Tool for the IMAGO project and was last updated on 2026-04 -23. It includes information such as authors, toponyms, genres, and holding libraries.
Temperature logger data collected from deployments around Pelorus Island in the Great Barrier Reef. The data set was collected by the Australian Ocean Data Network and spans from 04 August 1993 to 26 February 2026. The record was last updated on 04 June 2026.
Data from data.ct.gov lists courses approved for pre-licensing education for real estate salespersons and brokers in Connecticut. The dataset includes details on course providers, delivery methods, hours, and validity dates. It is available on multiple government data platforms.
ProbioSML is a machine learning-derived genomic dataset containing 1,072 non-redundant protein-coding sequences. It was created by Diego Lucas Neres Rodrigues through pangenomic analysis and supervised machine learning of bacterial genomes from taxa frequently reported as probiotics and reference gut-associated bacteria. The dataset, last updated in April 2026, is publicly available under a CC-BY-4.0 license.
1,072 non-redundant protein-coding sequences form a genomic dataset derived from comparative analyses of bacterial genomes. The ProbioSML dataset, created by Diego Lucas Neres Rodrigues and released in 2026, was generated using pangenomic analysis combined with supervised machine learning approaches like Random Forest and Support Vector Machine. It includes gene presence-absence matrices and functional annotations for taxa frequently reported as probiotics and reference gut-associated bacteria.
1,072 non-redundant protein-coding sequences derived from comparative genomic analysis of bacteria frequently reported as probiotics. The dataset, named ProbioSML, was created by Diego Lucas Neres Rodrigues using pangenomic analysis and supervised machine learning models like Random Forest. It was last updated on April 22, 2026, and is available under a CC-BY-4.0 license.
Temperature loggers deployed around Masig Island in the Torres Strait collected this environmental data over nearly a decade, from July 2013 to March 2023. The dataset is hosted by the Australian Ocean Data Network and was last updated in June 2026. It likely contains time-series records of sea water temperature at one or more deployment sites.
Temperature loggers collected data at Elizabeth Reef in the Lord Howe Province from 16 February 2006 to 03 March 2018. The Australian Ocean Data Network aggregated this data set. The data was last updated on the platform in June 2026.
From 14 December 2006 to 28 September 2007, temperature loggers deployed around Lennox Head collected this sea water data. The dataset is provided by the Australian Ocean Data Network. It was last updated on 4 June 2026.
A level network operated in the Free State of Saxony provides quantitative hydrological data on watercourses. The base network continuously records water flow for important rivers and performs flood reporting functions. Observations from a control network are included to consolidate these statements, primarily for monitoring dams, reservoirs, and flood retention basins.
From April 25, 2010, to April 29, 2012, temperature loggers collected sea water data at the E Lowendal site in the Montebello Islands, North Western Australia. The dataset was aggregated by the Australian Ocean Data Network and last updated on June 4, 2026.
SkillTrustBench is a benchmark dataset for evaluating the security analysis of AI agent skills. It was created by author cuhk-zhuque and last updated on June 15, 2026. Each case follows an agent-skill-style layout with a SKILL.md entrypoint defining usage.
OSNI Open Data provides a 1:1,000,000 scale raster map image of Northern Ireland, suitable for use as background mapping. The dataset includes place names for towns, cities, and other administrative locations. It is the smallest scale raster product from OSNI, offering a broad overview of the region's geography.
Australian Ocean Data Network collected this sea water temperature dataset from one or more loggers deployed at Daydream Island. The data spans from 26 June 1996 to 04 June 2025. The dataset was last updated on 04 June 2026.
Temperature loggers deployed by the Australian Ocean Data Network collected sea water data at Night Island in the Great Barrier Reef. The dataset covers a 17-year period from December 1996 to October 2013. The data likely contains time-series measurements of water temperature, a key environmental variable for coral reef health.
Derwent Estuary in Australia contains samples of temperature and salinity measured via CTD profile at Station 6. The data was collected by the Australian Ocean Data Network between August 2012 and January 2013. The dataset is available in formats including NetCDF, HTML, and PNG.
A registry of companies and independent persons dedicated to providing health services in the municipality of Yopal, Colombia. The dataset is published via the Socrata platform and was last updated on 2026-05-18. It includes columns for institution details, contact information, and geospatial coordinates.
Temperature loggers deployed northeast of Bridled Island in the Montebello Islands collected sea water temperature data from 15 November 2010 to 05 February 2012. The Australian Ocean Data Network aggregated this data. The dataset was last updated on 4 June 2026.
A dataset of 943-band hyperspectral spectral signatures for precision agriculture, containing both real field-collected and model-generated synthetic data. It was created by Manoj Kaushik to support the SpectraGeni framework, a convolutional conditional variational autoencoder for data generation under class imbalance. The data is provided in Parquet format and was last updated on 2026-04-19.