Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,485 datasets
MESS is a mixed collection of statistical functions, some of which are referenced in Claus Thorn Ekstrøm's book, The R Primer. The collection includes useful and semi-useful scripts for statistical analysis. The dataset's author is Claus Thorn Ekstrøm, and it is hosted on the paperswithcode platform.
Mixture and flexible discriminant analysis data, likely used to illustrate methods from the seminal textbook 'Elements of Statistical Learning'. The dataset is associated with authors Trevor Hastie, Robert Tibshirani, and Jerome Friedman. It is referenced in the context of multivariate adaptive regression splines (MARS), BRUTO, and vector-response smoothing splines.
Shinystan provides a graphical user interface for Markov chain Monte Carlo diagnostics and posterior sample analysis. The tool, created by Jonah Gabry, is powered by the Shiny framework and works with output from MCMC programs in any language, with extended support for Stan models via rstan and rstanarm packages.
The 'pwrss' R package provides functions for statistical power and minimum required sample size calculations. It is authored by Metin Buluş and is designed for a wide range of commonly used hypothesis tests in psychological, biomedical, and social sciences. The dataset's size, row count, and last update date are unknown.
ISLR2 is a collection of datasets used in the textbook 'An Introduction to Statistical Learning with Applications in R, Second Edition'. The collection includes datasets from the first edition, some with minor changes, and some new datasets. The data was compiled by author Gareth James for educational purposes.
Daniel Lüdecke's easystats is a meta-package providing a unifying framework for statistical analysis in R. It bundles multiple packages to offer consistent modeling, visualization, and reporting workflows. The collection includes teaching articles for instructors and a dashboard for new users to access summaries and visualizations with minimal programming.
The 'patchwork' package by Thomas Lin Pedersen extends the 'ggplot2' API to compose multiple plots. It provides mathematical operators for combining plots, addressing a need also targeted by packages like 'gridExtra' and 'cowplot'. The dataset likely contains examples or metadata related to this plot composition functionality.
Explanatory Combinatorial Dictionary is a formalized, semantically-based lexicon designed as part of a linguistic model of natural language. The paper describes its main properties, the structure of a lexical entry, groupings of entries, and principles for compilation, illustrated with a series of entries for an English ECD. The platform indicates it is related to lexicography, combinatorics, and natural language processing.
A simulation-based dataset for machine learning-driven optimization of perovskite solar cells. The dataset is described as large-scale, suggesting it contains a substantial number of simulated material or device configurations. It was sourced from Kaggle, but specific authorship, creation date, and exact size are not provided.
Metropolitan Area Look-Up is a system from the U.S. Department of Housing and Urban Development that allows users to determine if a selected county is part of an OMB-defined Core Based Statistical Area. It provides a mapping of state and county combinations to their FY 2009 CBSA status.
A collection of academic papers analyzing the economic impact of the 1930s Great Depression across Latin America. The work, edited by Rosemary Thorp, includes case studies on Argentina, Brazil, Chile, Colombia, Mexico, Peru, and Central America. The analysis covers topics such as the shift from export-led to import-substituting economies, the role of state policy, and international economic pressures.
Statistical data on student performance, likely containing metrics related to academic outcomes. The dataset is hosted on Kaggle, a platform for data science projects. Details regarding its specific source, size, and creation date are not provided in the available metadata.
An R package by Ethan Heinzen providing functions for generating large-scale statistical summaries. The toolkit includes functions for creating Table-1-like summaries, frequency tables, model summaries, and data frame comparisons, designed to integrate with R and RStudio reporting tools. Its primary functions are tableby(), paired(), modelsum(), freqlist(), comparedf(), and write2().
A collection of statistical tools for biologists, authored by Ken Aho. The description mentions parameters for parent distributions including normal, t, exponential, and uniform. The dataset likely contains statistical functions or parameters for biological analysis.
2026 data provides the official tariff rates and statistical categories for all merchandise imported into the United States, based on the international Harmonized System. It is maintained by the US International Trade Commission and includes all revisions for the current year.
A collection of elementary-level math problems presented in Vietnamese, likely containing both textual questions and illustrative images. The dataset is hosted on Kaggle, but details on its size, creation date, and authorship are currently unknown. Columns and specific content require verification after download.
Bayesian Structural Time Series models for regression, fit using Markov Chain Monte Carlo methods. The methodology is described in the 2014 paper by Scott and Varian. The dataset's specific size, columns, and temporal coverage are not detailed in the provided metadata.
LaplacesDemon is a software environment for Bayesian inference created by Byron Hall. The description mentions it provides a variety of different statistical samplers for performing inference. The specific data format, size, and update frequency are not detailed in the provided metadata.
Routines for astrochronologic testing, astronomical time scale construction, and time series analysis, as described in a 2018 paper by Stephen Meyers. The tool also includes a range of statistical analysis and modeling routines relevant to time scale development and paleoclimate analysis.
Matthew Harper published this mathematical dataset in 2026 to provide computed values of the Links-Gould polynomial. While the total record count is not specified, the data serves as a computational supplement to the research findings in arXiv:2509.16868 regarding quantum invariants in knot theory.