Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,485 datasets
CanadaFiresD is a subset of wildfire data referenced in the SOAK paper by Hocking et al. published in Statistical Analysis and Data Mining (2026). The dataset is described as a time/space subset and is a downsampled version of a larger collection. It is shared under a CC-BY-4.0 license on the OpenML platform.
A dataset of user interaction records likely intended for optimizing user interface layouts. Published on Kaggle, the dataset's specific size, collection method, and temporal coverage are not detailed in the available metadata. The raw description suggests it contains records related to user behavior and UI optimization.
An R package designed to replace statistical tables from the first two editions of the textbook 'Nonparametric Statistical Methods' by Hollander, Wolfe, and Chicken. The package provides functions to perform exact, Monte Carlo, or asymptotic nonparametric procedures, with wrappers for base R functions to standardize output. It was authored by Grant Schneider.
An extensible framework for pipeable sequences of feature engineering steps, created by Max KΓΌhn. The framework provides preprocessing tools where statistical parameters can be estimated from an initial dataset and applied to others. The processed output is designed for use as input to statistical or machine learning models.
Arithmetic for arbitrary precision floating point numbers, including transcendental functions. The package interfaces to the 'LGPL' licensed 'MPFR' Library, which itself is based on the 'GMP' Library. It was authored by Martin Maechler.
Functions for simulation and inference for stochastic differential equations (SDEs) accompany the book 'Simulation and Inference for Stochastic Differential Equations: With R Examples' by Stefano Maria Iacus, published in 2008. The dataset's specific size, format, and temporal coverage are not detailed in the provided metadata.
James E. Johndrow developed this code for performing Bayesian model averaging in capture-recapture studies. The package includes functions to stratify records, check strata for suitable overlap, and plot estimated population sizes. It is hosted on the Papers with Code platform.
Statistical analysis methods for environmental data are implemented, with a particular focus on robust methods and methods for compositional data. Larger data sets from geochemistry are provided. The statistical methods are described in the book 'Statistical Data Analysis Explained' by Reimann, Filzmoser, Garrett, and Dutter (2008).
Armadillo is a templated C++ linear algebra library designed for speed and ease of use, with syntax similar to Matlab. The RcppArmadillo package provides header files and bindings, allowing R users to leverage Armadillo's efficient vector, matrix, and cube classes, support for dense/sparse matrices, and integration with LAPACK. It is authored by Dirk Eddelbuettel and is licensed under GNU GPL version 2 or later.
Claus Bendtsen provides an implementation of particle swarm optimization consistent with the standard PSO 2007/2011 by Maurice Clerc. The dataset likely contains code or parameters for testing and evaluating the algorithm. Ancillary routines are included for easy testing and graphics.
Dimitris Rizopoulos developed this software for fitting joint models under a Bayesian framework. It accommodates multiple longitudinal outcomes of mixed type and multiple event times, including competing risks and multi-state processes. The methodology is based on the author's 2012 book (ISBN:9781439872864).
spatialEco is a collection of utilities for spatial data manipulation, query, sampling, and modelling in ecological applications. The functions include models for species population density, spatial smoothing, multivariate separability, and point process models for creating pseudo-absences. The package was authored by Jeffrey S. Evans.
Routines for the statistical analysis of indirectly measured haplotypes, developed by Schaid Daniel. The methods assume all subjects are unrelated and haplotypes are ambiguous due to unknown linkage phase. The main functions include haplo.em(), haplo.glm(), haplo.score(), and haplo.power().
A collection of robust statistical methods based on Wilcox' WRS functions. It implements robust t-tests, ANOVA, correlation, mediation, and nonparametric ANCOVA models. The collection was authored by Patrick Mair and is hosted on the paperswithcode platform.
A network estimation procedure for binary data, developed by Claudia van Borkulo and Sacha Epskamp. The method, named eLasso, fits Ising models using l1-regularized logistic regression and selects relevant variable relationships with the Extended Bayesian Information Criterion (EBIC). The resulting network represents variables as nodes and their relevant relationships as edges.
BrainGraph is a set of tools for performing graph theory analysis on brain MRI data. It works with outputs from Freesurfer, diffusion tensor tractography, and resting-state fMRI analyses. The package was authored by Christopher G. Watson and includes a graphical user interface for visualization and figure generation.
Latent Semantic Analysis (LSA) is a technique for uncovering the underlying semantic structure in text obscured by word usage. The dataset likely contains text documents processed into a conceptual index via a truncated singular value decomposition of a document-term matrix. It was authored by Fridolin Wild and is hosted on the paperswithcode platform.
pscl is a collection of datasets and statistical tools developed by Simon Jackman for writing and teaching. The data supports analysis of item-response theory models, roll call data, and zero-inflated count models. It is associated with the Political Science Computational Laboratory.
R wrappers around the C libraries cubature and Cuba provide adaptive multivariate integration over hypercubes. The package includes scalar and vector interfaces for deterministic and Monte Carlo integration methods. It was authored by Balasubramanian Narasimhan.
The lubridate package provides functions for parsing, extracting, and manipulating date-time and time-span data in R. Functions enable fast parsing of date-time data and algebraic operations on date-time objects. The package was authored by Vitalie Spinu.