Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,485 datasets
An interactive document on basic statistical analysis created by Kartikeya Bolar. It is built using the 'rmarkdown' and 'shiny' packages, with runtime examples provided in the package function and via a hosted web application. The content is designed to demonstrate statistical concepts interactively.
User-facing R functions parse, compile, test, estimate, and analyze Stan models by accessing the header-only Stan library. The Stan project develops a probabilistic programming language implementing full Bayesian inference via Markov Chain Monte Carlo, variational approximation, and maximum likelihood estimation, using automatic differentiation for gradient evaluation. The interface is authored by Jiqiang Guo and hosted on the paperswithcode platform.
An interactive document on basic statistical analysis created by Kartikeya Bolar. It uses the 'rmarkdown' and 'shiny' packages to provide runtime examples, accessible via a live web application. The document likely contains examples and exercises for foundational statistical concepts.
The 'entropy' package by Jean Hausser implements multiple estimators for entropy, mutual information, and related quantities. It includes the shrinkage estimator by Hausser and Strimmer (2009), maximum likelihood, Millow-Madow, Bayesian, and Chao-Shen estimators, and provides an R interface to the NSB estimator. The package also offers functions for estimating Kullback-Leibler divergence, chi-squared divergence, and for discretizing continuous random variables.
Bayesian approaches for analyzing multivariate data in ecology, using Markov Chain Monte Carlo (MCMC) methods via JAGS. The boral package fits three types of models: independent column GLMs, latent variable models for ordination, and correlated GLMs with latent variables. It was authored by Francis K.C. Hui and is featured on the paperswithcode platform.
mlrMBO is a flexible R toolbox for model-based optimization, also known as Bayesian optimization, authored by Bernd Bischl. It implements the Efficient Global Optimization Algorithm for single- and multi-objective problems with mixed continuous, categorical, and conditional parameters. The toolbox integrates with the 'mlr' machine learning library for regression modeling and provides features for parallel execution, visualization, and logging.
An R package by David Stanley for creating American Psychological Association (APA) style tables from statistical output. The package generates Word (.doc) files to minimize manual transcription errors and reduce the number of R commands needed. It is designed for researchers who need to format results from several types of analyses.
A system for writing hierarchical statistical models largely compatible with 'BUGS' and 'JAGS', developed by Perry de Valpine. It includes default methods for MCMC, Laplace Approximation, and Monte Carlo Expectation Maximization, and allows users to write nimbleFunctions to operate models and compile them via custom-generated C++. The system extends the 'BUGS'/'JAGS' language by making it extensible, enabling the addition of new distributions and functions.
The third edition of Edward Gibbon's seminal historical work, 'The History of the Decline and Fall of the Roman Empire'. The description notes the first print-run was 1000 copies, and this edition features improved layout with footnotes at the bottom of each page and chapter numbers in the margins. The text is sourced from the paperswithcode platform.
The Ross Sea Marine Protected Area ecosystem research data is documented in the final report from the CCAMLR Working Group on Ecosystem Monitoring and Management. The report likely contains analysis of monitoring data and management agenda items for the Antarctic region. The dataset originates from the AMD_KOPRI organization and is hosted on the NASA Earthdata platform.
Statistical data on penguin diving duration compiled from published literature. The dataset aggregates observational data into a structured internet database known as The Penguiness Book. It was compiled by the organization SCIOPS from the NASA EarthData platform.
Statistical data on penguin diving depth and duration compiled from published literature. The dataset is a compilation from the internet database known as The Penguiness Book. It was aggregated by the organization SCIOPS.
Breeding success data for Adelie penguins is collected within the Ross Sea Marine Protected Area, a critical Antarctic conservation zone. The dataset is registered with the CCAMLR Ecosystem Monitoring Program (CEMP) and originates from the Korea Polar Research Institute (KOPRI). Specific temporal coverage and data volume are not detailed in the available metadata.
A domestic scientific research roadmap outlines analysis agendas for ecosystem conservation in the Ross Sea Marine Protected Area. The dataset originates from the Korea Polar Research Institute (AMD_KOPRI) and is hosted on NASA's Earthdata platform. Temporal coverage and data volume are unspecified.
QEDBench is a benchmark for evaluating large language models on formal proof generation and evaluation. It contains 272 proof-based problems spanning 10 distinct mathematical domains, created by researcher Quanquan C. Liu. The dataset was published in February 2026.
Statistical tables by Erika Harrell detail the level and rates of nonfatal violent victimization against persons with and without disabilities from 2009 to 2013. The report describes types of disabilities and compares victim characteristics. Estimates are based on 2-year rolling averages to improve reliability.
NOAA/WDS Paleoclimatology provides a Bayesian ANOVA scheme for calculating climate anomalies. The data covers parameters from a global geographic location over a period from -11 to -40 calendar years before present. This archived study is maintained by NOAA National Centers for Environmental Information.
Precipitation data from stations across the USSR, compiled by RIHMI-WDC in 1982. The original records are held by the Russian State Fund of data and were stored on 800 bit/inch magnetic tape. It contains maximum and minimum monthly and annual precipitation amounts of various statistical significance.
A 1995 geospatial dataset maps the fraction of land equipped for irrigation within 5-minute by 5-minute grid cells globally. The map was created by the CEOS_EXTRA organization, combining statistical data from administrative units with geographical information on irrigated areas. It provides a snapshot of irrigation infrastructure at a resolution of approximately 9.25 km x 9.25 km at the equator.
Irrigation-equipped land fraction is mapped for South America at a 5-minute by 5-minute grid cell resolution, approximately 9.25 km square at the equator. The dataset combines statistical administrative data with geographical point, polygon, and raster information to estimate irrigation coverage. It was produced by CEOS_EXTRA and represents conditions around the year 1995.