Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
168,374 datasets
An 18.3 KB Excel file quantifying immunohistochemistry staining in xenograft tumors, created by Xing Wei and last updated on June 1, 2026. The dataset supports research on RAS signaling inhibition and immune checkpoint blockade in KRAS-G12C mutant non-small cell lung cancer. It is shared under a CC-BY-4.0 license on the figshare platform.
Pharmacokinetic data from a preclinical cancer study investigating KRAS G12C-mutant non-small cell lung cancer. The 13.6 KB Excel file contains measurements from xenograft tumor-bearing mice, published by Xing Wei under a CC-BY-4.0 license. It was last updated on June 1, 2026.
Supplementary tables from a study on METTL3 methylation and endogenous retroelement transcripts. The dataset, authored by Xiaowei She, is hosted on figshare and was last updated on June 1, 2026. It consists of Excel files totaling 35.6 KB.
Trafford Council publishes data on paid time off granted to trade union representatives for union duties. The dataset covers the fiscal years from 2016/17 to 2024/25. It is available in CSV and HTML formats under the OGL-UK-3.0 license.
A spreadsheet used for conceptual analysis, shared on figshare. The dataset is 6.8 KB in size and was last updated on May 31, 2026. It was authored by an anonymous researcher and is shared under a CC-BY-4.0 license.
York City Council's monthly percentage of positive feedback from customers contacting its Customer Centre by phone or in-person at West Offices. The data is reported by the Government Digital Service and can be analyzed alongside related waiting time metrics. The dataset is provided in CSV format under the OGL-UK-3.0 license.
URSA benchmarking sets are benchmark targets for single-step retrosynthesis from Zagribelnyy et al. (2026). The collection includes three CSV files containing product molecules for precursor prediction, with target counts ranging from 100 to 4,972. The dataset was authored by insilicomedicine and last updated on Hugging Face in May 2026.
Brazilian Portuguese message-only distilled trajectories for training tool-using Text-to-SQL agents. The dataset was created by Boakpe and contains trajectories selected from LLM-judged correct conversations, preserving the agent protocol from the released code. It was last updated on June 15, 2026.
A list of contractors working for the Municipal Mayor's Office during the administrative period Juntos por el Cambio from 2020 to 2023. The dataset is published on the Colombian open data portal www.datos.gov.co and was last updated on 2026-05-18. It includes columns for contractor name, dependency, contract object, and contact details.
A preprocessed version of the Spanish Wikipedia dump from May 2026, totaling 8.4 GB. The dataset was created by raj2708 and includes articles parsed from wikitext to plain text. It is intended for use in large language model pretraining.
Municipality of Armenia, Colombia, consolidated records of property loans (comodatos) from 2015 to 2020. The dataset includes details on the loan process, property location, and usage. It is published by www.datos.gov.co and was last updated on 2026-05-18.
Boakpe's benchmark dataset provides anonymized metadata and gold labels for evaluating agentic Portuguese Text-to-SQL systems. It is derived from a real PostgreSQL/PostGIS environmental-registry database, though the underlying production data is not released. The dataset was last updated on June 15, 2026.
Recycle collection schedule for the City of Greater Dandenong in 2021. The data is provided by the City of Greater Dandenong and was last updated on the platform in May 2026. Formats include GEOJSON, WFS, .KML, and WMS, indicating a geospatial dataset.
Morbidity data from 2019, 2020, and 2021, sourced from www.datos.gov.co. The dataset includes columns for ICD-10 descriptions, patient sex, service names, and aggregated cause categories. It was last updated on 2026-05-18.
BDAPPV is a dataset of aerial images showing rooftop photovoltaic installations. It includes segmentation masks and installation metadata, sourced from two aerial imagery providers, Google and IGN. The dataset was created by Gabriel Kasmi and published in Scientific Data in 2023.
A research dataset containing scored cued-recall responses from two experiments on episodic and semantic memory. The data was collected by author Rujuta Pradhan and published on figshare in April 2026. It includes responses from younger (18β33 years) and older adults (65β85 years) who viewed episodes of BBC's Sherlock.
Two experiments with 658.1 KB of scored recall data investigate the 'within > across' memory effect. Younger (18β33 years) and older adults (65β85 years) viewed BBC's Sherlock and completed a cued-recall task, with responses scored for episodic and semantic details. The dataset, authored by Rujuta Pradhan and last updated in April 2026, reveals a reversed effect for gist details across experiments.
Data-Gouv-ML provides a catalogue of datasets from the French open data platform data.gouv.fr. The dataset's structure suggests it contains metadata linking each data.gouv.fr dataset to a corresponding Hugging Face repository. It was last updated on 2026-06-09.
A list of administrative procedures and OPAS offered by the District of Barranquilla, Colombia, registered in the SUIT system. The data includes the responsible department, number of locations where offered, and operational details. The dataset was last updated on 2026-05-18.
QLeave, a Queensland government organization, discloses contracts valued over $10,000 for the 2025-2026 financial year. The data is provided in an XLSX file format and was last updated in June 2026. It is published under a Creative Commons Attribution 4.0 license.