Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,469 datasets
A software package for Bayesian Variable Selection and Model Averaging in linear and generalized linear models, developed by Merlise A. Clyde. It implements stochastic or deterministic sampling without replacement from posterior distributions, supporting various prior distributions and model selection criteria. The work was supported by the National Science Foundation under grant 1106891.
Implements a James-Stein-type shrinkage estimator for covariance matrices, with separate shrinkage for variances and correlations. The method is described in Schafer and Strimmer (2005) and Opgen-Rhein and Strimmer (2007). The package provides functions for computing partial correlations, matrix inverses, powers, singular value decomposition, and checking matrix properties.
A statistical software package for performing Bayesian model-averaged meta-analysis, as proposed by Gronau et al. (2017). It computes posterior model probabilities for null, fixed-, and random-effects models, allowing for a wide range of priors on effect size and heterogeneity. The tool, authored by Daniel W. Heck, supports models with continuous and discrete moderators using pre-compiled Stan models.
A collection of visualization primitives for the 'ggplot2' R package, designed to represent statistical distributions and uncertainty. The package, authored by Matthew Kay, supports both frequentist and Bayesian modes, handling analytical distributions and sample-based representations. It includes methods such as eye plots, quantile dot plots, and fit curves with multiple uncertainty ribbons.
A software package implementing methods for analyzing survey data collected via the item count technique, also known as the list experiment. The package includes Bayesian MCMC regression for standard and multiple sensitive item designs, hierarchical models, combined experiment models, and diagnostic tests for response error. It is authored by Graeme Blair and implements methods from a series of academic papers published between 2011 and 2018.
Daily gap-free sea surface temperature maps for the Mediterranean Sea at a 0.0625-degree horizontal resolution. The data are derived from satellite infrared radiometer measurements and statistical interpolation. It is the nominal operational sea surface temperature product from CMEMS.
The Mediterranean Sea is covered by daily gap-free sea surface temperature maps at a 0.01-degree horizontal resolution. These L4 analysis data are derived from infrared satellite radiometer measurements and statistical interpolation. The product is the nominal operational sea surface temperature dataset from the Copernicus Marine Environment Monitoring Service.
Geochemical and Bayesian modeling data identifies and quantifies sediment sources for the Fitzroy River coastal zone in Queensland, Australia. The study reveals a recent increase in basaltic material contributions, now the dominant source with an estimated enrichment of approximately 3 times catchment abundance. Analysis indicates consistent weathering and transport regimes throughout the Holocene period.
Radiance measurements from the CrIS instrument and collocated VIIRS imager data from the NOAA-21 satellite. The dataset includes 2,223 hyperspectral channels across shortwave, midwave, and longwave infrared bands, with VIIRS providing 22 bands of high-resolution imagery and cloud mask statistics for each CrIS field of view. Data is produced in six-minute granules by the GES DISC, with a documented update in March 2026.
AlphaXiv's dataset contains prepared splits of the GSM8K grade school math problems for comparing Evolution Strategies and Group Relative Policy Optimization methods in LLM fine-tuning. It includes 6,725 training samples, 1,867 validation samples, and 200 test samples. The dataset was last updated on March 5, 2026.
NOAA Operational Forecast System-Related Guidance Products provide daily and next-day probability forecasts for Vibrio vulnificus bacteria in the Chesapeake Bay. Forecasts are generated by forcing a statistical model with temperature and salinity averaged over the top 1 meter and 24 hours centered on 06Z. The data is produced by NOAA_NCEI.
Quarterly and annual reports from the Australian Taxation Office detail Self-Managed Superannuation Funds (SMSFs). These reports present tables on member demographics, fund assets, contributions, benefit payments, and performance derived from SMSF annual returns lodged with the ATO.
figshare hosts a 3.1 KB CSV supplementary file for the article 'Evaluating Tuberculosis Intervention Strategies Through Mathematical Models: A Systematic Review'. Authored by Mohammad Hanif Takal, this data supports the review's findings.
Empirical mean, standard deviation, and standard error of the mean for model parameters estimated using a Bayesian dynamic linear model (BDLM). The 9.5 KB Excel file, created by George Bamwebaze and last updated in March 2026, contains summary statistics likely derived from a study on neonatal mortality in Uganda. The model's description suggests it incorporates temporal dimensions and related covariates to provide forecasts.
2026 data from Iman Hindi compares reinforcement learning algorithm behaviors for agricultural automation in stochastic greenhouse control. The dataset is a 5.5 KB Excel file with time series data. Specific row and column counts are unknown.
Statistical results from Mann-Whitney U tests comparing soil temperature and moisture measurements. The dataset, authored by Jan-Michael Schönebeck, is a 5.5 KB Excel file published in March 2026 under a CC BY 4.0 license.
Marginaleffects is an R package for computing predictions, comparisons, slopes, and marginal means from statistical models. It supports over 100 classes of models and calculates uncertainty via the delta method, bootstrapping, or simulation. The package is authored by Vincent Arel‐Bundock and detailed in a 2024 Journal of Statistical Software article.
ggstatsplot is an R package that extends 'ggplot2' to create graphics with statistical test details included directly in the plots. The package, authored by Indrajeet Patil and referenced in a 2021 JOSS publication, provides a simplified syntax for generating plots for continuous and categorical data analysis. It supports common statistical approaches including parametric, nonparametric, robust, and Bayesian versions of tests like t-tests, ANOVA, correlation, and regression.
Monthly gaming machine data for all Statistical Area 2 areas in Queensland. The dataset is provided by the Queensland Department of Justice and was last updated in March 2026. It likely contains metrics on gaming machine activity aggregated by small statistical regions.
Raw data for calculations and graph plotting related to the functionalization of bark lignins. The dataset, authored by Véronic Landry and last updated in March 2026, includes files for process optimization and life cycle insights. It is a 9.4 MB collection of files in formats including XLSX, R, and PNG.