DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Mathematics & Statistics Datasets | DataSalon

All Categories

📐

Mathematics & Statistics

Mathematical datasets, statistical benchmarks, probability, optimization, operations research

2,485 datasets

North American Cross-Border Freight Data by Mode and Commodity

The Bureau of Transportation Statistics TransBorder Freight program provides U.S. cross-border freight data with Canada and Mexico. Data includes mode of transportation, commodity type, and geographic detail for exports and imports, used for trade corridor studies and infrastructure planning. BTS publishes a monthly statistical release highlighting key trends.

Frieight Trucking+1

0 views

Mathematics & Statistics

nimbleSCR: Bayesian Spatial Capture-Recapture Modeling Utilities

nimbleSCR provides utility functions, distributions, and fitting methods for Bayesian Spatial Capture-Recapture (SCR) and Open Population Spatial Capture-Recapture (OPSCR) modeling. The package, authored by Richard Bischof, is built using the nimble package and was motivated by the need for flexible and efficient analysis of large-scale SCR data.

TabularMedicineBayesian StatisticsEnvironmental scienceEcologyNimbleComputer ScienceMark And RecaptureSpatial Capture RecaptureLarge ScalePopulation Modeling+1

0 views

Mathematics & Statistics

astsa: Applied Statistical Time Series Analysis Data and Scripts

Data sets and scripts for analyzing time series in both the frequency and time domains, including state space modeling. The collection supports the textbooks 'Time Series Analysis and Its Applications: With R Examples' (5th ed, 2025) and 'Time Series: A Data Analysis Approach Using R' (2nd ed, 2026). Most scripts are designed to require minimal input to produce aesthetically pleasing output for ease of use in live demonstrations and course work.

Time SeriesFrequency DomainR LanguageComputer ScienceGeologyMathematicsStatistical AnalysisSeries StratigraphyStatisticsPaleontology+1

0 views

Mathematics & Statistics

Blandr: Bland-Altman Method Comparison Analysis Package

A 2015 software package created by Deepankar Datta to carry out Bland-Altman analyses, also known as Tukey mean-difference plots. The package was developed to address the lack of confidence interval calculations in existing functions and to create reproducible plots, with an available module for the 'jamovi' statistical spreadsheet.

TabularLimits Of AgreementMedicineNuclear MedicineComputer ScienceMathematicsBland AltmanMedical ResearchBlandaltman PlotStatisticsMethod Comparison+1

0 views

Mathematics & Statistics

KernelBot: 100K+ Optimized AMD MI300 GPU Kernel Submissions

GPUMODE released this dataset in early 2026, containing between 100,000 and 1,000,000 GPU kernel submissions from the KernelBot competition platform. The collection focuses on optimized code specifically targeting AMD MI300 hardware and includes subsets for successful and deduplicated entries.

ModalitytextSize Categories100 Kn1 MCodeModalitytabularLicensecc By 40Regionus+1

0 views

Mathematics & Statistics

Qwen3-1.7B Summarization Task Arithmetic: A Benchmark for LLM Evaluation

Kaggle hosts this dataset, which appears to be a benchmark for evaluating the Qwen3-1.7B language model. The title suggests it involves tasks combining summarization and arithmetic reasoning. The dataset's author, size, and specific contents are not detailed in the provided metadata.

TextArithmetic ReasoningLlm EvaluationSummarizationTask Benchmark+1

0 views

Mathematics & Statistics

Phosphorus Behavior in Marine Sediments During CO2 Release Experiment

Ardmucknish Bay, Scotland, hosts data from a 2012 sub-seabed CO2 controlled release experiment assessing impacts on sedimentary phosphorus. The study, published in the International Journal of Greenhouse Gas Control, found no statistically significant effects on solid-phase P content during the experiment. Laboratory analyses using the SEDEX sequential extraction technique revealed differences in P release potential among sediment types.

TabularTime SeriesPhosphorus CyclingMarine SedimentUnited KingdomCarbon CaptureGeological Storage+1

0 views

Mathematics & Statistics

Community Development Block Grant Grantee Areas: Federal Funding Boundaries

The dataset denotes boundaries for Community Development Block Grant (CDBG) Entitlement Communities and State Administered Non-Entitlement grantees. CDBG is a federal block grant distributed via formula to states and local governments for housing, economic development, and public improvement efforts serving low and moderate-income communities. The Department of Housing and Urban Development maintains this dataset, last updated on March 11, 2026.

GeospatialCpdHud Official ContentGovernment FundingHousingU S Department Of Housing And Urban DevelopmentHudCommunity Planning And DevelopmentUrban PlanningFinanceCommunity Development ProgramsCommunity Planning And Development Grantee AreasCommunity Development Block GrantsNgdaCdbgCommunity Development+1

0 views

Mathematics & Statistics

Open Source Mathematics Olympiad Textbook Collection

Featuring chunked content from 12 open-source mathematics textbooks, including works like 'An Infinitely Large Napkin' and 'Mathematical Reasoning: Writing and Proof'. It is intended for retrieval-augmented generation, embedding, and math reasoning research. The source code for the data pipeline is publicly available on GitHub.

OPTIMIZED-PARQUETParquetSize Categories1 Kn10 KLibrarypolarsTask Categoriesquestion AnsweringTextbooksLanguageenModalitytextAlgebraMathematicsTask Categoriestext RetrievalModalitytabularLibrarymlcroissantProofsLicensecc By Sa 40LibrarydatasetsLibrarypandasTopologyRegionus+1

0 views

Mathematics & Statistics

Customer Insights Statistical Investigation

A statistical investigation of customer insights, likely containing data for analysis. The dataset is hosted on Kaggle, but its specific origin and creation date are unknown. The number of records and features are not specified in the available metadata.

TabularBusiness IntelligenceCustomer InsightsStatistical Analysis+1

0 views

Mathematics & Statistics

Smart Tourism Service Quality Dataset with IoT Optimization Records

Smart Tourism Service Quality Dataset is a collection of records related to tourism service optimization, likely gathered via Internet of Things (IoT) devices. The dataset is hosted on Kaggle, but details about its creator, size, and specific contents are not provided. Its structure and specific variables are unknown from the available metadata.

TabularService QualityTourismOptimizationIot+1

0 views

Mathematics & Statistics

Bayesian Hierarchical Model Data

Bayesian hierarchical model data likely contains parameters, hyperparameters, or simulated observations for statistical analysis. The dataset is hosted on Kaggle, a platform for data science projects. Its specific source, size, and creation date are unknown.

TabularBayesian StatisticsHierarchical ModelsStatistical Modeling+1

0 views

Mathematics & Statistics

Historical Dutch Population Records Linked to Modern Registers

This dataset links historical life trajectories from the Historical Sample of the Netherlands (HSN) for individuals born between 1812 and 1922 to contemporary outcomes in the System of Social statistical Datasets (SSD). It represents a Proof of Concept linkage, with a revised strategy successfully linking 77% of linkable HSN records. The linkage is based on matching birth dates of the individual, father, and mother, marriage date, and sex.

Arts And HumanitiesSocial Sciences+1

0 views

Mathematics & Statistics

SciCode Programming Problems: 22,532 AI-Generated Scientific Computing Tasks

22,532 programming problems generated by AI, inspired by real scientific computing code snippets. Each problem is paired with a solution and focuses on concepts like numerical algorithms, data analysis, and mathematical modeling. The dataset was created by SciCode and was last updated on 2026-02-19.

TextScientific ComputingAi GeneratedProgramming ProblemsCode GenerationLarge ScaleSynthetic+1

0 views

Mathematics & Statistics

UK Superficial Deposit Thickness Models at 50-Meter Resolution

Three national geological models covering Great Britain estimate the thickness of Quaternary and younger deposits. The British Geological Survey derived these 50 m x 50 m grids by interpolating borehole records and map data. Models provide indicative thickness values and proximity to source data for geohazard assessment.

Geological ModelsGeological DataNerc DdcQuaternarySurficial geology+1

0 views

Mathematics & Statistics

Deepfake Detection: 15,000 Multi-Modal Tensors for Direct Training

15,000 multi-modal tensors combine CLIP embeddings with statistical features for deepfake detection. The dataset is optimized for direct training of machine learning models. The author, organization, and last update date are unknown.

MultimodalMachine LearningMultimodal TensorsComputer VisionDeepfake Detection+1

0 views

Mathematics & Statistics

MathVision-Latex: Handwritten Mathematical Expressions with LaTeX

MathVision-Latex pairs images of handwritten mathematical expressions with corresponding LaTeX code. The dataset appears designed for training models to recognize and transcribe mathematical handwriting. Its source and scale are not detailed in the provided metadata.

ImageMultimodalLatexMathematical ExpressionsHandwritten MathOCR+1

0 views

Mathematics & Statistics

Accuracy Analysis of Numerical Solutions for Ordinary Differential Equations

IOSR Journals presents a dataset from a paper analyzing numerical solutions to initial value problems for ordinary differential equations. The data likely contains results from solving several example problems using the Euler method, comparing approximate and exact solutions. The analysis investigates and computes error for different step sizes.

TabularMathematical AnalysisInitial Value ProblemOrdinary Differential EquationsComputer ScienceMathematicsEulers FormulaOrdinary Differential EquationDifferential EquationValue MathematicsStatisticsOdeNumerical AnalysisMonotonic FunctionEuler EquationsApplied mathematicsBackward Euler MethodEuler Method+1

0 views

Mathematics & Statistics

Haplotype Data and Methods for DNA Sequence Analysis and Parsimony Networks

Caner Aktas provides S4 classes and methods for reading and manipulating aligned DNA sequences. The package supports indel-coding, shows base substitutions and indels, calculates pairwise distances, and collapses sequences into haplotypes. It also includes methods for estimating genealogical relationships among haplotypes using statistical parsimony and plotting parsimony networks.

TabularDna SequencesCombinatoricsPhylogeneticsDna SequencingMaximum ParsimonyMathematicsEvolutionary BiologyGeneticsBiologyComputational BiologyCladeGenotypeDNAHaplotypeGeneRank Graph Theory+1

0 views

Mathematics & Statistics

rdrobust: Data-Driven Inference for Regression-Discontinuity Designs

rdrobust is a package for statistical inference in regression-discontinuity (RD) designs, a quasi-experimental method popular in social, behavioral, and natural sciences. It provides tools for point estimation, robust confidence intervals, bandwidth selection, and exploratory data analysis in Sharp, Fuzzy, and Kink RD settings. The package was authored by Sebastian Calonico.

TabularInferenceRegression Discontinuity DesignEconometricsComputer ScienceMathematicsRegressionStatistical InferenceArtificial IntelligenceStatisticsRobust RegressionCausal InferenceData MiningRegression Discontinuity+1

0 views

PreviousPage 89 of 124Next