DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Mathematics & Statistics Datasets | DataSalon

All Categories

📐

Mathematics & Statistics

Mathematical datasets, statistical benchmarks, probability, optimization, operations research

3,064 datasets

Spider DPO 1040: Preference Pairs for Text-to-SQL Model Training

Spider DPO 1040 is a compact training dataset containing 1,040 preference pairs for Direct Preference Optimization, derived from frontier-model disagreements on the Spider V1 benchmark. It also includes 7,000 supervised training examples from Spider formatted for use with LLaMA-Factory. The dataset was created by jk200201 and was last updated on July 5, 2026.

TextLanguage Model FinetuningPreference LearningText To SqlBenchmarkSql Generation+1

0 views

Mathematics & Statistics

A Unified Bayesian Framework for Cross-Technology Collision Cross Section Postcalibration

Yi-Hui Zhou presents a hierarchical Bayesian framework for harmonizing collision cross section (CCS) measurements across ion mobility platforms. The dataset includes 840 measurements for 347 compounds from a multilaboratory study, used to validate the framework. The approach reduced intertechnology CCS variability by approximately 95% and lowered median absolute percentage error from 8.9% to 3.2%.

TabularExcelBayesian StatisticsCollision Cross SectionAnalytical ChemistryInstrument HarmonizationIon Mobility Spectrometry+1

0 views

Mathematics & Statistics

UK Alcohol Duty Clearances and Receipts Monthly Bulletin

Monthly statistics from HM Revenue and Customs track clearances of alcohol products and the resulting duty receipts for the United Kingdom. The data is designated as National Statistics, indicating it meets official quality standards. It is published in HTML format and was last updated in June 2026.

TabularTime SeriesTax RevenueMonthly DataUk EconomyAlcohol DutyGovernment Statistics+1

0 views

Mathematics & Statistics

PlanetGSD 1.0: Grain-Size Distribution Data for Earth, Moon, and Mars

PlanetGSD 1.0 is a cross-planetary grain-size distribution dataset containing separate tables for Terrestrial, Lunar, and Martian soils. The dataset includes 3.7 MB of data in TXT and XLSX formats, authored by Jun Zhang and last updated on 2026-05-22. It also provides site descriptions, image references for Martian soils, and statistical simulation and fitting code.

TabularTextExcelGeologySoil analysisPlanetary ScienceGrain Size Distribution+1

0 views

Mathematics & Statistics

Average 1-NHV Values of QMOHH and Variants

Average 1-NHV values of QMOHH and its variants are presented in a 17.5 KB Excel file. The data was uploaded by Haoyi Zhao to figshare and last updated on May 21, 2026. It likely contains performance metrics from a computational study on reconfigurable assembly line scheduling.

TabularExcelMulti Objective OptimizationBenchmark InstancesBenchmarkManufacturing SchedulingHyper HeuristicReconfigurable AssemblySynthetic+1

0 views

Mathematics & Statistics

Pareto Solutions for a Reconfigurable Assembly Line Scheduling Model

A set of Pareto-optimal solutions achieved by a multi-objective mathematical model for reconfigurable assembly line scheduling. The dataset, shared by Haoyi Zhao on figshare, was last updated on 2026-05-21. It contains results from a numerical case study and computational experiments on 120 benchmark instances.

TabularExcelMulti Objective OptimizationBenchmarkAssembly LineManufacturing SchedulingHyper HeuristicPareto FrontSynthetic+1

0 views

Mathematics & Statistics

Part-Requirement Matrices for Reconfigurable Assembly Line Scheduling

5.5 KB of part-requirement matrices for different product types, supporting a study on reconfigurable assembly line scheduling. The dataset, authored by Haoyi Zhao and last updated in May 2026, was used to formulate a multi-objective mathematical model minimizing reconfiguration cost and balancing production workloads. A computational study on 120 generated benchmark instances demonstrated the performance of a proposed Q-learning-based hyper-heuristic algorithm.

TabularExcelSchedulingBenchmarkAssembly LinePart RequirementOptimizationManufacturingSynthetic+1

0 views

Mathematics & Statistics

Benchmark Instances for Reconfigurable Assembly Line Scheduling Optimization

pone.0348884.t011 contains benchmark instances for a reconfigurable assembly line scheduling problem. The dataset supports a study proposing a Q-learning-based multi-objective hyper-heuristic algorithm. It was authored by Haoyi Zhao and last updated on 2026-05-21.

TabularExcelMulti Objective OptimizationBenchmark InstancesBenchmarkManufacturing SchedulingHyper HeuristicQ LearningSynthetic+1

0 views

Mathematics & Statistics

Reconfigurable Assembly Line Scheduling Benchmark Instances

120 generated benchmark instances for a reconfigurable assembly line scheduling problem. The dataset supports a study proposing a Q-learning-based multi-objective hyper-heuristic algorithm to optimize product sequencing, reconfiguration cost, workload equalization, and logistics leveling. It was authored by Haoyi Zhao and last updated on 2026-05-21.

TabularExcelMulti Objective OptimizationBenchmarkAssembly LineManufacturing SchedulingHyper HeuristicQ LearningSynthetic+1

0 views

Mathematics & Statistics

pone.0348884.t009: Reconfigurable Assembly Line Scheduling Benchmark Instances

120 generated benchmark instances for a reconfigurable assembly line scheduling problem. The dataset supports a study proposing a Q-learning-based multi-objective hyper-heuristic algorithm and is provided by author Haoyi Zhao in an XLS file. It was last updated on 2026-05-21.

TabularExcelMulti Objective OptimizationBenchmarkManufacturing SchedulingHyper HeuristicQ LearningReconfigurable AssemblySynthetic+1

0 views

Mathematics & Statistics

pone.0348884.t008: Reconfigurable Assembly Line Scheduling Benchmark Results

A figshare dataset by Haoyi Zhao, last updated on 2026-05-21. It contains results from a numerical case study and computational experiments on 120 generated benchmark instances for a reconfigurable assembly line scheduling problem. The study formulates a multi-objective mathematical model and proposes a Q-learning-based hyper-heuristic algorithm.

TabularExcelMulti Objective OptimizationBenchmarkManufacturing SchedulingHyper HeuristicQ LearningReconfigurable AssemblySynthetic+1

0 views

Mathematics & Statistics

Rat Toxicology Simulation Data on Virtual vs. Concurrent Control Agreement

Supplementary file 1_Analysis of rat toxicology studies presents simulation results comparing virtual control groups (VCGs) to concurrent control groups (CCGs) for detecting treatment effects on liver enzymes and bodyweight. The dataset, authored by Guillemette Duchateau-Nguyen and last updated in April 2026, contains results from 100 VCGs generated per reanalyzed study to assess statistical agreement.

TabularAnimal StudiesToxicologyLarge ScaleStatistical SimulationLiver EnzymesVirtual ControlsSynthetic+1

0 views

Mathematics & Statistics

InP/InAsP Microring Laser Data for Multi-Objective Bayesian Optimization

1.9 GB of datasets linking epitaxial growth conditions, device geometry, and lasing performance for InP/InAsP multi-quantum well microring lasers. The data includes optical microscopy images, binarized ring images, and field-level statistics for variance-aware optimization. Mihir Rajendra Athavale published this dataset on figshare in 2026.

ImageTabularHDF5Semiconductor PhysicsMachine LearningBenchmarkBayesian OptimizationComputer VisionLaser PerformanceEpitaxy+1

0 views

Mathematics & Statistics

Vitamin D-Related Genetic Variants and Breast Cancer Risk in Jordanian Women

300 patients with breast cancer and 300 healthy controls were genotyped for polymorphisms in CYP2R1, CYP27B1, CYP24A1, and DBP genes. Laith N. Al-Eitan authored this case-control study, which was last updated on 2026-05-28. The results suggest associations between specific genotypes and altered breast cancer risk.

TabularExcelGenetic VariantsCase Control StudyJordanian PopulationVitamin DBreast cancer+1

0 views

Mathematics & Statistics

Bayesian Model Search for Nonstationary Periodic Time Series

A methodology paper with applications in e-Health and sleep research, proposing a Bayesian approach for analyzing nonstationary time series with oscillatory behavior. The method uses a trans-dimensional Markov chain Monte Carlo algorithm to estimate change-points and periodicities. The work is authored by Beniamino Hadj-Amar and is available under an Open Access license.

Time SeriesE HealthTime Series AnalysisHealthcareNonstationary SignalsBayesian methodsChange Point Detection+1

0 views

Mathematics & Statistics

Double Entry Volumetric Models for Eucalyptus Trees in Brazil

180 tree samples of Eucalyptus urophylla and Eucalyptus grandis aged 5-7 years were used to develop volumetric models for quantifying wood volume in Brazil. The study, authored by Valdir Carlos Lima de Andrade, evaluated model performance using statistical criteria including standard error of the estimate and adjusted coefficient of determination. A volumetric model adapted from the form factor equation to the Gompertz biomathematical model was selected.

Tabular🇧🇷 BrazilEucalyptusBiomathematicsForestryVolumetric Models+1

0 views

Mathematics & Statistics

CRISP: Bayesian Model Data for Sex Estimation from Cremated Human Remains

A prehistoric Italian sample of 155 individuals with gender-specific grave goods was used to develop a Bayesian model for sex assessment in cremation contexts. The model is based on 21 postcranial metric variables and was validated using an independent Austrian prehistoric sample of 45 individuals. The dataset supports the CRISP open-source software and demonstrates the model's 89% accuracy in sex prediction.

TabularCSVAnthropologySex EstimationBioarchaeologyBayesian modelCremated Remains+1

0 views

Mathematics & Statistics

Lactulose Synthesis Optimization from Whey Permeate Using Factorial Design

50.06 grams of lactulose per 100 grams of whey powder was obtained under optimal conditions. This dataset likely contains experimental results from a study optimizing lactulose synthesis from bovine whey permeate using response surface methodology. The study, authored by Fernanda Caspers Zimmer, investigated reaction time and isomerization type to maximize yield.

TabularWhey PermeateLactulose SynthesisFactorial designResponse Surface MethodologyDairy Industry+1

0 views

Mathematics & Statistics

Survey Data on Smart Open Innovation Adoption Among Manufacturing SMEs in Singapore

A mixed-methods study from 2026 by Sanmugam Annamalah explores factors influencing Smart Open Innovation adoption. It uses survey data collected directly from Singaporean manufacturing SMEs producing technology-based products. The research employs a positivist paradigm and statistical methods to analyze relationships between variables like digital readiness and sustainability orientation.

TabularCSVSingaporeSupply Chain CollaborationDigital TransformationManufacturing SmesSmart Open Innovation+1

0 views

Mathematics & Statistics

Emotional Manifestation in Mathematics Learning: 349 Chilean Secondary Students

349 secondary school students from six schools in a Chilean region participated in a quantitative study linking cognitive and affective variables. The research by Verónica Marín Díaz investigates emotional manifestation during the process of apprehending the modeling of the linear function mathematical object. Results indicate a nexus between performance in Mathematics and emotionality variables, but not a relation between gender and emotions.

TabularChilean EducationStudent EmotionsMathematics EducationAffective DomainQuantitative Research+1

0 views

PreviousPage 26 of 154Next