Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,485 datasets
Statistical modeling for correlated count data using the beta-binomial distribution, described in Martin et al. (2020). The method allows for covariates to model both the mean and overdispersion of the data. The dataset likely contains the statistical models and associated data used for the described regression analysis.
Prioritizr is an R package for systematic conservation prioritization using mixed integer linear programming (MILP). It provides a flexible interface for building and solving conservation planning problems, with solutions guaranteed to be optimal or within a specified gap. The package was authored by Jeffrey O. Hanson, with a related paper published in 2025.
A comparison of differential equation solver suites across MATLAB, R, Julia, Python, C, Mathematica, Maple, and Fortran. The dataset was created by Christopher Rackauckas and is published under an Open Access (diamond) license on Papers with Code. It aims to detail the construction and rationale of each software suite to help users select appropriate tools.
A Thurstone scaling technique was applied to construct a measure of a subject's affective reaction under field experimental conditions. The scale, developed by Robert H. Kerle, was validated using four contrived and natural situations to test its application and reliability. It detected significant affective changes in situations judged stressful by the experimenters.
afex is an R package providing convenience functions for analyzing factorial experiments. It supports ANOVA and mixed models for between-subjects, within-subjects, and mixed designs, automatically aggregating data in long format. The package, authored by Henrik Singmann, includes functions for model fitting and high-level plotting with ggplot2.
An R package for statistical procedures in agricultural research, originally presented in a Master's thesis at the National Engineering University (UNI) in Lima, Peru. It offers functionality for planning experimental designs like lattice, Alpha, Cyclic, and factorial designs. The package also provides analysis facilities for treatment comparisons, non-parametric tests, and biodiversity indexes.
Research data from the National Oceanic and Atmospheric Administration focusing on the development of optimal grow-out diets for sablefish (Anoplopoma fimbria). The study uses a novel statistical mixture model and response surface analysis to test the effects of dietary protein, lipid, and digestible carbohydrate on fish growth and feed conversion efficiency. Fish in the experiments may be PIT tagged and regularly measured for length and weight.
Research data from the National Oceanic and Atmospheric Administration focuses on optimizing dietary protein, lipid, and carbohydrate levels for sablefish aquaculture. The study uses a novel statistical mixture model and response surface analysis to test commercially viable feed formulations. Raw data on rearing densities, tank conditions, water temperature, mortalities, ration, and feed size may be available.
ColQwen3.5 Optimization Trail contains 776+ MTEB evaluation results from the development of three visual document retrieval models. The dataset, published by athrael-soju, captures the full development process including seeds, ablations, and variants for models using ColBERT-style late interaction with Qwen3.5-VL.
GLC_FCS30D is a global land-cover monitoring product developed by Liangyun Liu of the Chinese Academy of Sciences. It provides 30-meter resolution data for 35 land-cover categories from 1985 to 2022, with updates every 5 years before 2000 and annually thereafter. The product was validated to achieve an overall accuracy of 80.88% for a 10-category system and 73.24% for a 17-category system.
META-R is a set of R programs for statistical analysis of plant breeding trials. It calculates BLUEs, BLUPs, genetic correlations, and broad-sense heritability, and can generate boxplots and histograms. The software was authored by Gregorio Alvarado and includes a graphical Java interface for user interaction.
Seven types of evidence indicate that subjective well-being contributes to better health and longevity. The review by Ed Diener of the University of Illinois Urbana-Champaign synthesizes prospective longitudinal studies, experimental research, and naturalistic studies. It discusses causality, effect sizes, and the controversial link between well-being and longevity in populations with certain diseases.
2012-2023 country-year data on tuberculosis treatment outcomes, processed for Bayesian model comparison. The dataset originates from the World Health Organization and covers global TB treatment success rates. Its specific use case is for comparing statistical models.
Ukrainian state statistical observations are described in this metadata collection. The dataset is provided by the States site of Ukraine and was last updated on March 5, 2026. It is available in JSON and Excel XLSX formats.
Chun-Hui He's research paper presents a mathematical model for the Fangzhu, an ancient Chinese device for collecting water from air. The model elucidates the device's possible surface-geometric properties and identifies key factors affecting its effectiveness. The dataset likely contains parameters and results from this mathematical analysis of the hydrophilic-hydrophobic hierarchical surface.
A balanced dataset was used to help participants classify potential customers who might churn. This challenge was part of the pre-qualification for the 2018 Data Science Nigeria all-expense paid bootcamp and hackathon scheduled for October 10-15, 2018. The data is described as academic and focuses on the business problem of customer retention in telecommunications.
A balanced dataset for predicting customer churn in the European telecommunications sector. The data was used as a pre-qualification challenge for the 2018 Data Science Nigeria bootcamp and hackathon. It is described as an academic dataset intended to help participants classify potential customers who might leave their service provider.
A dataset from openml by VARSHA PANDEY concerning a premium club's customer membership. The description indicates the club has faced membership cancellations in recent years and aims to use statistical methods to identify at-risk customers. The dataset's specific size, time range, and geographic scope are not detailed.
A dataset from OpenML by Varsha Pandey concerning a premium club's customer membership. The description indicates the club has faced significant membership cancellations in recent years and aims to use statistical methods to identify at-risk customers. The dataset's specific scale, such as row count and column details, is unknown.
The Mallard Model is a stochastic computer model from CEOS_EXTRA, hosted on NASA EarthData. It is designed to aid waterfowl managers in predicting outcomes for various management scenarios to maximize mallard and upland nesting waterfowl productivity. The dataset's specific size, format, and update history are not provided.