Loading...
Loading...
General ML benchmarks, tabular data, AutoML, recommendation systems, anomaly detection, evaluation suites
142,359 datasets
A European sea bass population dataset for predicting resistance to Viral Nervous Necrosis (VNN). The study explores machine learning techniques on high-dimensional, low-sample-size genetic data, authored by Giovanni Faldani and last updated on 2026-05-20. The dataset is a 620.5 KB PDF document describing the research and its associated data.
Lincolnshire County Council Children's Services provides annual counts of children in care and children in need as of 31 March each financial year. The dataset includes figures for Lincolnshire overall and by district, with categories for children in need, children in care by home address, and children in care by placement address. Numbers below five are suppressed for confidentiality, which may cause some records to be omitted and figures not to tally precisely.
69 hypothetical proteins from the predicted proteome of the fungal pathogen Microbotryum intermedium 1389 BM 12 12. The dataset, created by Michael Perlin, contains predicted amino acid sequences and JGI Gene IDs for proteins lacking conserved domains, identified from an initial pool of 296 predicted small secreted proteins. It was last updated on May 31, 2026.
Itemized late independent expenditures of $1,000 or more e-filed on California's FPPC Form 496 from 2011 onward. The data is sourced from the Fair Political Practices Commission and includes details on filers, candidates, amounts, and jurisdictions. It is current as of the dataset's last modified date of 2026-05-28.
A 2024 systematic review synthesizing literature from 2000 to 2024 on modelling approaches for skin neglected tropical diseases. The review, authored by Mesoud A. Bushara, analyzed 68 studies from 2,870 retrieved records, focusing on methodologies, predictors, and data sources used for diseases like cutaneous leishmaniasis and lymphatic filariasis.
A 2024 systematic review synthesizes 68 studies from 2000-2024 on modelling the distribution of skin neglected tropical diseases. The review, authored by Mesoud A. Bushara and published on figshare, analyzes prevalent methodologies, environmental covariates, and data sources used in these predictive models.
A 252.1 KB research paper and associated materials from figshare, authored by Hongyu Cheng and last updated on May 9, 2026. It presents a methodological framework for estimating the multivariate coefficient of variation under multiplicative distortion measurement errors. The approach is validated using simulated data and a real-world analysis of the Regensburg Pediatric Appendicitis dataset.
A dataset comparing automated, interactive machine learning, and deep learning pipelines for quantifying lipid droplets and mitochondria in live human osteosarcoma cells. The data was created by Chloé Daul and last updated on 2026-05 08. It includes images from fluorescence microscopy and label-free holotomography, with standardized downstream feature extraction applied across different analysis workflows.
Matbench Discovery v1 provides data files for a benchmark on machine learning energy models predicting inorganic crystal stability from unrelaxed structures. The dataset includes relaxed structures from the Materials Project training set and initial plus relaxed structures from the WBM test set, along with model checkpoints. It was authored by Janosh Riebesell and updated in May 2026.
Daily incident records broken down by call type group for each fire station response area in Montgomery County. The dataset includes columns such as Incident Number, Date, Time, Call Type Description, Location, and Fire Station Number. It is updated daily and appears on multiple government data platforms.
29.0 MB of data from figshare details Rh-catalyzed C–H active [3+2] annulation reactions of imines and alkynes. The dataset, authored by Ying Yang and last updated on 2026-06-02, explores the novel function of noncoordinating halogen groups in unsymmetric alkynes to regulate regioselectivity and chemoselectivity.
Ying Yang published a dataset on figshare in June 2026 describing the role of noncoordinating halogen groups in Rh-catalyzed C–H active [3+2] annulation reactions. The data likely contains experimental results showing how chloro groups regulate regioselectivity and chemoselectivity between imines and alkynes. It provides up to single-regioisomeric level selectivity for ketimines and controls product formation for acyclic aldimines.
Weekly photographs for 12 weeks track the survival and growth area of coral spat under controlled light conditions. The experiment tested two light spectra, blue and full, across four maximum intensities from 5 to 160 μmol m⁻² s⁻¹. Data originates from the Australian Ocean Data Network and was last updated in June 2026.
Records of official acts issued by the Specialized Chamber for Medical Devices and In Vitro Diagnostic Reagents in Colombia. The dataset contains decisions on whether products require sanitary registration for commercialization, with records starting from 2007. It is published on the datos.gov.co platform via Socrata and was last updated on 2026-05-18.
British Columbia's 2025-26 Second Quarterly Report provides a tabular summary of the fiscal plan update for 2025/26 to 2027/28. The report includes an economic outlook and six-month financial results for April to September 2025. It was published by the Government of British Columbia and last updated on June 3, 2026.
Bingyang Zha's dataset from 2026 contains clinical data from 503 adolescent patients with major depressive disorder (MDD) who underwent electroconvulsive therapy (ECT). It was used to build a machine learning model identifying baseline factors associated with poor treatment response. The simplified model uses two features: neutrophil-to-platelet ratio (NPR) and pre-treatment HAMD score.
503 adolescent patients with major depressive disorder were retrospectively enrolled to identify baseline clinical factors associated with poor response to electroconvulsive therapy. The dataset was created by Bingyang Zha and last updated on May 7, 2026. It likely contains clinical variables used to build a machine learning model, including the neutrophil-to-platelet ratio and Hamilton Depression Scale scores.
New York's Metro-North Railroad delay data tracks trains that are canceled, terminated, substituted by bus, or arrive late. The dataset includes specific train numbers, branches, stations, and minutes late for each incident. Records appear to be available from 2012 onward, providing a longitudinal view of service reliability.
Onshore coverage of Great Britain includes landscape areas attributed with types of mass movement, such as landslips and foundered strata, at a 1:10,000 scale. The data is provided by the British Geological Survey (BGS) in vector polygon format, but coverage is partial, with approximately 30% of England, Scotland, and Wales available in this version 2 release. BGS intends to expand coverage, focusing on large priority urban areas and transport corridors.
1996–2024 scientometric data comparing Iran with selected systems. Total research output is reported for the full period, while Field-Weighted Citation Impact (FWCI) and Top 10% output values are averaged over 2015–2024 and 2015–2023 respectively. The dataset was created by Ehsan Roohi and published on figshare.