DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Mathematics & Statistics Datasets | DataSalon

All Categories

📐

Mathematics & Statistics

Mathematical datasets, statistical benchmarks, probability, optimization, operations research

2,485 datasets

Math Strategy Diversity Evaluation Framework: LLM Reasoning Assessment

Math Strategy Diversity Evaluation Framework is a dataset for evaluating Large Language Model mathematical reasoning. It likely contains problems and reference solutions based on the American Mathematics Competitions (AMC/AIME) and the Art of Problem Solving (AoPS) platform. The dataset's author, organization, and exact size are unknown.

TabularMathematical ReasoningBenchmarkLlm EvaluationAmc AimeAops Strategies+1

0 views

Mathematics & Statistics

LearnBayes: Functions for Learning Bayesian Statistical Inference

A collection of functions for learning Bayesian statistical inference, created by Jim Albert. It contains functions for summarizing basic one and two parameter posterior and predictive distributions. The collection also includes MCMC algorithms for user-defined posteriors, plus functions for regression models, hierarchical models, Bayesian tests, and Gibbs sampling illustrations.

TabularBayesian InferenceInferenceBayesian ProbabilityMachine LearningHierarchical ModelsComputer ScienceArtificial IntelligenceStatistical FunctionsMcmc AlgorithmsRegression Models+1

0 views

Mathematics & Statistics

LDA: Implementation of Topic Models with Collapsed Gibbs Sampling

Jonathan Chang's software implements latent Dirichlet allocation and related models like sLDA and corrLDA. The core inference for these models is performed via a fast collapsed Gibbs sampler written in C. Utility functions for reading/writing topic model data and examining posterior distributions are also included.

TextBayesian ProbabilityBayesian StatisticsGibbs SamplingComputer ScienceMathematicsLatent Dirichlet AllocationComputer VisionArtificial IntelligenceTopic ModelingStatisticsSampling Signal Processing+1

0 views

Mathematics & Statistics

Steepness: Dominance Hierarchy Slope Analysis

Steepness is a statistical property of dominance hierarchies defined as the slope fitted to normalized David's scores. The steepness package computes this metric from observed sociomatrices and estimates statistical significance via randomization tests. Authors David Leiva and Han de Vries developed this method for analyzing dyadic dominance indices.

TabularAnimal BehaviorEconometricsComputer ScienceMathematicsBiologySocial BehaviorDominance GeneticsSociomatrixDominance Hierarchies+1

0 views

Mathematics & Statistics

Statistical Shape Analysis Routines for Landmark Data

Routines for the statistical analysis of landmark shapes, including Procrustes analysis, graphical displays, and principal components analysis. The dataset is based on methods from the 2016 textbook 'Statistical Shape Analysis, with Applications in R' by Ian Dryden and K.V. Mardia. It is hosted on the Papers with Code platform.

TabularComputer ScienceThin Plate SplineMathematicsStatistical AnalysisProcrustes AnalysisStatisticsMorphometricsStatistical Shape AnalysisLandmark Data+1

0 views

Mathematics & Statistics

pROC: Tools for ROC Curve Visualization and Statistical Comparison

Xavier Robin's pROC provides tools for visualizing, smoothing, and comparing receiver operating characteristic (ROC) curves. The package allows for statistical comparison of (partial) area under the curve (AUC) values using U-statistics or bootstrap methods. Confidence intervals can be computed for (p)AUC or the ROC curves themselves.

TabularComputer ScienceData VisualizationComputer Graphics ImagesRoc CurvesMachine Learning Metrics+1

0 views

Mathematics & Statistics

ISMEV: Functions for Statistical Modeling of Extreme Values

Functions supporting computations from the textbook 'An Introduction to Statistical Modeling of Extreme Values' by Stuart Coles. The package, authored by Janet E. Heffernan, provides tools for maxima/minima, order statistics, peaks over thresholds, and point processes. The dataset's size, specific temporal coverage, and row count are not provided in the input metadata.

TabularR PackageExtreme Value AnalysisComputer ScienceComputational MethodsStatistics+1

0 views

Mathematics & Statistics

WebPower: Statistical Power Analysis Tools for Common Models

A collection of tools for conducting statistical power analysis across methods including correlation, t-tests, ANOVA, regression, and structural equation modeling. The collection was created by Zhiyong Zhang and serves as the engine for the WebPower online analysis platform.

TabularComputer ScienceData SciencePsychometricsMathematicsStatistical AnalysisStatistical ModelingPower AnalysisStatisticsComputer Security+1

0 views

Mathematics & Statistics

mvabund: Statistical Methods for Multivariate Abundance Data in Ecology

A set of tools for displaying, modeling and analysing multivariate abundance data in community ecology. The package is implemented with the Gnu Scientific Library and Rcpp R/C++ classes. Author Yi Wang developed this statistical package for ecological research.

TabularEcologyAbundance EcologyMathematicsBiologyMultivariate StatisticsCommunity EcologyMultivariate AnalysisGeographyStatisticsAbundance Data+1

0 views

Mathematics & Statistics

rbacon: Bayesian Age-Depth Models for Deposits

An approach for age-depth modelling uses Bayesian statistics to reconstruct accumulation histories. The method combines radiocarbon dates with prior information on accumulation rates and variability, as described by Blaauw & Christen (2011). The dataset likely contains parameters and outputs from such Bayesian reconstructions.

TabularBayesian InferenceBayesian ProbabilityBayesian StatisticsComputer ScienceMathematicsStatisticsAccumulation HistoryAge Depth ModellingPaleoenvironment+1

0 views

Mathematics & Statistics

ncf: Spatial Covariance Functions and Geostatistical Tools

Spatial (cross-)covariance and related geostatistical tools, including the nonparametric (cross-)covariance function, the spline correlogram, and the nonparametric phase coherence function. The dataset, authored by Ottar N. Bjørnstad, is sourced from the paperswithcode platform. Specific details on data volume, format, and temporal coverage are not provided in the metadata.

GeospatialCovariance IntersectionSpatial StatisticsCovariance FunctionsComputer ScienceMathematicsCovariance FunctionNonparametric MethodsStatisticsCovarianceMatrn Covariance FunctionGeostatistics+1

0 views

Mathematics & Statistics

Rtsne: R Wrapper for Barnes-Hut t-SNE Implementation

Rtsne is an R package wrapper for the fast Barnes-Hut implementation of t-distributed Stochastic Neighbor Embedding (t-SNE) by Laurens van der Maaten. The package was authored by Jesse H. Krijthe. The dataset likely contains high-dimensional data suitable for dimensionality reduction via this t-SNE algorithm.

TabularMachine LearningDimensionality ReductionComputer ScienceMathematicsEmbeddingArtificial IntelligenceStatistics+1

0 views

Mathematics & Statistics

Conformal Field Theory Textbook with Exercises and Background Material

A pedagogical text develops conformal field theory from first principles. The treatment is self-contained and includes background material on quantum field theory, statistical mechanics, and Lie algebras. It is intended for graduate students and researchers in theoretical high-energy physics, mathematical physics, and condensed matter theory.

TextField Theory PsychologyGraduate StudentsMathematicsPsychologyComplement MusicQuantum Field TheoryMathematical PhysicsConformal MapPhysicsGeometryQuantum MechanicsConformal Field TheoryPure MathematicsTheoretical PhysicsField Mathematics+1

0 views

Mathematics & Statistics

PolyMath: 11,090 High-Difficulty Competition Math Problems

PolyMath is a curated dataset of 11,090 high-difficulty mathematical problems designed for training reasoning models. It was created by AIMO-Corpus for the AIMO Math Corpus Prize and was last updated on February 9, 2026. The dataset addresses noise and usability issues found in other math datasets.

TextMathematicsAi TrainingReasoningNatural Language ProcessingCompetition Problems+1

0 views

Mathematics & Statistics

Fractured Rock Flow Models for Multi-Scale Heterogeneity

Project details describe novel numerical methods for modeling flow and transport in fractured rock systems. The research aims to bridge mathematical advances in fractal systems with hydrogeological needs for modeling multi-scale heterogeneity and connectivity. The project is led by the British Geological Survey and was last updated in March 2026.

Nerc DdcHydrogeologyFluid dynamicsFluid flow+1

0 views

Mathematics & Statistics

Negative Control Falsification Tests: Data and Code for IV Designs

This replication package contains the data and code for implementing negative control falsification tests in instrumental variable (IV) designs, authored by Oren Danieli and updated in 2026. The collection facilitates conditional independence tests between negative control variables and IVs or outcomes to validate identification assumptions and functional form.

Placebo TestsInstrumental VariablesNegative ControlFalsification Tests+1

0 views

Mathematics & Statistics

Cognitive Performance and Biometric Optimization Data

A dataset from Kaggle exploring the relationship between bio-rhythms, deep work habits, and daily focus scores. The author and organization are unknown. The last update date and specific data volume are not provided.

TabularFocus PredictionCognitive PerformanceBiometricsBehavioral Science+1

0 views

Mathematics & Statistics

MedProofX H100 Stable Sourcehouse: AI-Generated Medical Proof Data

MedProofX H100 Stable Sourcehouse is a dataset published on Kaggle. The title suggests it contains data related to medical proof or validation, potentially generated using AI models like Stable Diffusion. Its specific content, size, and structure require verification after download.

TabularMedical ProofH100Source CodeStable DiffusionAi Training+1

0 views

Mathematics & Statistics

MedProofX H100 SourceHouse: Medical Evidence Data

A dataset titled 'medproofx_h100_sourcehouse' published on Kaggle. The title suggests a focus on medical evidence or source information, possibly related to clinical or pharmaceutical data. The specific content, size, and authorship details are not provided in the available metadata.

TabularMedical ProofSource HouseClinical Data+1

0 views

Mathematics & Statistics

Pandera: Statistical Data Validation for Pandas Dataframes

A talk by Niels Bantilan introduces pandera, an open source Python package for validating pandas dataframes. The presentation covers data validation theory and practice, using a case study analysis of the Fatal Encounters dataset to demonstrate how the tool can improve reproducibility and reliability in data analysis and machine learning.

TabularMachine LearningR PackageProgramming LanguageComputer SciencePythonArtificial IntelligenceData ValidationSoftwareOpen SourcePython Programming LanguagePandasReproducible ResearchData Mining+1

0 views

PreviousPage 93 of 124Next