Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,469 datasets
Clam performs 'classical' age-depth modelling for dated sediment deposits, a step prior to applying more sophisticated Bayesian techniques. The tool calibrates radiocarbon-dated depths and constructs models by repeatedly sampling dated levels to draw age-depth curves. It was created by Maarten Blaauw, with methodology detailed in a 2010 publication.
Several example datasets are included for demonstrating the repeated measures correlation technique, first introduced by Bland and Altman in 1995. The associated R package provides functions for computing this correlation, diagnostics, p-values, effect sizes with confidence intervals, and graphing. The package and its documentation were authored by Jonathan Z. Bakdash and Laura R. Marusich, with a key paper published in 2017.
An efficient implementation of the Scalable Bayesian Rule Lists algorithm, a competitor to decision tree algorithms. The model builds from pre-mined association rules and has a logical structure identical to a decision list. The algorithm, developed by Hongyu Yang, Cynthia Rudin, and Margo Seltzer in 2017, is fully optimized over rule lists to balance accuracy, interpretability, and computational speed.
Frontier is a dataset for Maximum Likelihood Estimation of Stochastic Frontier production and cost functions. It implements two model specifications: the error components specification with time-varying efficiencies and a model where firm effects are influenced by explanatory variables. The dataset was created by authors Tim Coelli and Arne Henningsen, based on the methodologies of Battese and Coelli from 1992 and 1995.
rcarbon provides a statistical framework for building demographic and longitudinal inferences from aggregate radiocarbon date lists. The package includes functions for calibration, uncalibration, and plotting, as well as Monte-Carlo simulation and spatial permutation tests. It was authored by Andrew Bevan and is often used for archaeological research.
Nathan Nunn from Harvard University analyzes the differential effect of rugged terrain on income for all countries worldwide, with a focus on Africa. The study shows that ruggedness had a statistically significant and economically meaningful positive effect on income in Africa, which is fully accounted for by the history of the slave trades. The dataset likely contains country-level geographic and economic indicators used to support this analysis.
Indicspecies provides functions for assessing the statistical relationship between species occurrence or abundance and groups of sites, as described by De Cáceres & Legendre (2009). It also includes methods for measuring species niche breadth using resource categories, based on De Cáceres et al. (2011). The package is authored by Miquel De Cáceres and is available via the paperswithcode platform.
VineCopula provides a suite of tools for the statistical analysis of regular vine copula models, referencing foundational works by Aas et al. (2009) and Dissman et al. (2013). The package includes functions for parameter estimation, model selection, simulation, goodness-of-fit tests, and visualization. It was authored by Thomas Nagler and is hosted on the paperswithcode platform.
Bayesplot provides a suite of plotting functions for posterior analysis, MCMC diagnostics, and predictive checks, supporting the applied Bayesian workflow. The package is designed by Jonah Gabry to offer convenient functionality for users and a common set of tools for developers of R packages for Bayesian modeling. It is particularly intended for use with packages interfacing with the 'Stan' probabilistic programming language.
Math Logic Ru is a synthetic dataset for training models to formalize Russian natural language text into mathematical logic. The dataset covers 10 sections, from propositional logic to multi-step deductive chains. It was created by author s85io and last updated on Hugging Face on March 13, 2026.
Market basket optimization data likely contains records of retail transactions for association rule mining. The dataset is hosted on Kaggle, but its specific size, origin, and update history are unknown. Columns and sample data are unavailable for inspection.
A dataset titled 'market_optimization' published on Kaggle. The content likely contains variables relevant to optimization problems in market contexts, such as pricing, allocation, or logistics. Specific details regarding its size, creator, and update history are not provided in the available metadata.
Market basket optimization data likely contains transaction records for analyzing product co-occurrence. The dataset is published on Kaggle, but its specific origin, size, and creation date are unknown. Columns and sample data are unavailable for review.
Benchmark data from the vHive open-source framework characterizes snapshot-based serverless infrastructure using Containerd and Firecracker. The analysis reveals a 95% average increase in execution time for functions started from snapshots versus memory-resident ones and demonstrates a 3.7x reduction in cold-start delays with the REAP prefetching mechanism. The work was conducted by Dmitrii Ustiugov of the University of Edinburgh.
Delivering model data and scripts for recreating power flow data and visualizations from a human factors study. The study evaluated contour and glyph visualizations for an urban distribution model and a large-scale transmission model. It includes the study results and statistical analysis.
A conceptual framework designed to diffuse binary logic and oversimplifications by refracting polarized signals into nuanced perspectives. The dataset, authored by ronniross, was last updated on March 19, 2026. Its structure and specific data content require inspection from the source page.
A 'scenario-emotion-behavior' dataset includes experimental results and statistical analysis. The dataset, created by Yi-bo Chen, is available under a CC BY 4.0 license and was last updated in March 2026.
A dataset related to first-order theorem proving, but no specific details on content, size, or structure are available.
A multisensory model for dynamic spatial orientation developed by Joshua Borah. The model translates aircraft or simulator motion into stimuli processed by dynamic models of visual, vestibular, tactile, and proprioceptive sensors, feeding into a central estimator modeled as a steady-state Kalman Filter. The computer program has predicted qualitative characteristics of human orientation under combined visual and platform motion.
P-curve analysis addresses publication bias and p-hacking by examining the distribution of statistically significant p-values across a set of studies. The method was introduced by Uri Simonsohn of the University of Pennsylvania. It is designed to help determine whether reported effects reflect true findings or selective reporting.