Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,470 datasets
A dataset for market basket analysis, likely containing retail transaction records. It is hosted on Kaggle, but the author, organization, and specific collection details are unknown. The data's size, structure, and time period are not specified in the available metadata.
Market-Basket-Optimization is a dataset published on Kaggle, a popular platform for data science competitions and projects. The title suggests it contains transactional data suitable for analyzing product co-occurrence. Specific details on the data's origin, size, and collection period are not provided in the available metadata.
Interior image features for spatial optimization and authenticity analysis. The dataset is hosted on Kaggle, but details about its size, creation, and update history are not provided. Its specific content and structure must be inspected after download.
Satellite-derived sea-surface height measurements from TOPEX/Poseidon/ERS, JASON-1/Envisat, and Jason-2/Envisat sensors, combined with a Niiler climatology to obtain absolute heights. The data provides global coverage at approximately 0.25-degree spatial resolution from 1993 to the present, with weekly and monthly science-quality data. Geostrophic current components are mathematically derived from this data.
FineProofs SFT contains 7,777 mathematical Olympiad samples featuring chain-of-thought reasoning and formal proofs distilled from DeepSeek-Math-V2. Developed by lm-provers and updated in February 2026, the dataset sources 4,300 unique problems from international competitions and Art of Problem Solving (AoPS).
An R package by Alboukadel Kassambara providing a pipe-friendly framework for statistical analysis. It supports basic tests like t-tests, ANOVA, and correlation analyses, with outputs automatically formatted as tidy data frames. The package includes functions for effect size calculation, outlier identification, and assumption checking for factorial experimental designs.
AI-ready structured dataset for statistical analysis, prediction, and machine learning. The dataset is hosted on Kaggle. The specific source, author, and temporal coverage are unknown.
Ipl_predictions2020 contains match-by-match data from the Indian Premier League (IPL) cricket tournament. The data covers 11 seasons from 2008 through 2019, aggregated from sources including CricSheet.org and the official IPL T20 website. The dataset is intended for statistical analysis and modeling of team and player performance.
European-Soccer-Dataset-by-Role is a modified version of the European Soccer Database, containing 25,000 matches from 11 European countries' top championships between 2008 and 2016. Researchers Carpita, M., Ciavolino, E., and Pasca, P. created 25 role-based performance indicators from EA Sports' FIFA game attributes, averaged by match and player role. The dataset was used to test predictive models in a 2019 academic study.
AS/RS-Datasets is a curated repository of benchmark test instances and algorithmic implementations for Automated Storage and Retrieval Systems, developed by Zakarya Amara and hosted on the AS/RS Research Dataverse. Updated in March 2026, the collection provides standardized data derived from scientific literature to support the modeling and optimization of warehouse automation. It specifically includes data related to Flow Rack AS/RS configurations and operations research problems.
Georob is an R package providing functions for fitting linear models with spatially correlated errors using robust and Gaussian (Restricted) Maximum Likelihood methods. The package includes utilities for variogram modeling, cross-validation, conditional simulation of Gaussian processes, and robust point and block external-drift Kriging predictions. It was authored by Andreas Papritz and references foundational statistical work from 1977 to 2013.
A dataset likely containing retail transaction records for market basket analysis. The data is hosted on Kaggle, but its specific origin, size, and creation date are unknown. Columns and sample data are unavailable, limiting immediate assessment of its content and structure.
ScottKnottESD implements a mean comparison approach using hierarchical clustering to partition treatment means into statistically distinct groups. The method is described in a 2018 paper by Chakkrit Tantithamthavorn, published in IEEE Transactions on Software Engineering. The dataset likely contains results or parameters for applying this statistical test to fields like model evaluation or feature importance analysis.
Functions for performing the Bayesian bootstrap as introduced by Rubin (1981). The implementation can handle summary statistics that work on a weighted version of the data and on a resampled data set. The package was authored by Rasmus BΓ₯Γ₯th.
Statistical modeling for correlated count data using the beta-binomial distribution, described in Martin et al. (2020). The method allows for covariates to model both the mean and overdispersion of the data. The dataset likely contains the statistical models and associated data used for the described regression analysis.
Prioritizr is an R package for systematic conservation prioritization using mixed integer linear programming (MILP). It provides a flexible interface for building and solving conservation planning problems, with solutions guaranteed to be optimal or within a specified gap. The package was authored by Jeffrey O. Hanson, with a related paper published in 2025.
A comparison of differential equation solver suites across MATLAB, R, Julia, Python, C, Mathematica, Maple, and Fortran. The dataset was created by Christopher Rackauckas and is published under an Open Access (diamond) license on Papers with Code. It aims to detail the construction and rationale of each software suite to help users select appropriate tools.
A Thurstone scaling technique was applied to construct a measure of a subject's affective reaction under field experimental conditions. The scale, developed by Robert H. Kerle, was validated using four contrived and natural situations to test its application and reliability. It detected significant affective changes in situations judged stressful by the experimenters.
An R package for statistical procedures in agricultural research, originally presented in a Master's thesis at the National Engineering University (UNI) in Lima, Peru. It offers functionality for planning experimental designs like lattice, Alpha, Cyclic, and factorial designs. The package also provides analysis facilities for treatment comparisons, non-parametric tests, and biodiversity indexes.
afex is an R package providing convenience functions for analyzing factorial experiments. It supports ANOVA and mixed models for between-subjects, within-subjects, and mixed designs, automatically aggregating data in long format. The package, authored by Henrik Singmann, includes functions for model fitting and high-level plotting with ggplot2.