Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,458 datasets
A cleaned mathematics supervised fine-tuning dataset built by kaushik-harsh-99 and last updated on 2026-05-17. It contains instruction-solution pairs, mathematical proofs, derivations, and olympiad-style solutions for theorem reasoning and stepwise explanations. The dataset is designed specifically for mathematical supervised fine-tuning and removes explicit chain-of-thought tags.
949,100 Monte Carlo simulation replications accompany a methodological study on stabilizer variables for measurement invariance. The results are organized into six phases, including a core performance evaluation of 800,000 replications and sensitivity analyses. The dataset, created by Salim Yılmaz, was updated in March 2026.
A figshare-hosted dataset by Jihang Jia, last updated March 2026, containing materials for a statistical test of the mean matrix. The 3.4 MB resource includes PDF, ZIP, and TXT files detailing a projection-based method that incorporates structural information of matrix-valued data.
A 39.8 KB dataset from figshare, authored by Daniel Ernesto Rojas Ventura and last updated on 2026-04-24. It contains consolidated architectural correlation tables mapping physical constants like the Bekenstein Bound and Planck Area to signed integer limits. The dataset also includes a forensic validation of the 1986 Chernobyl disaster as a macro-scale arithmetic overflow event.
CO-OPS water level stations across coastal U.S. states and territories provide annual exceedance probability levels for extreme high and low water events. The National Oceanic and Atmospheric Administration produced this dataset by analyzing historical data from stations with at least 30 years of records. The statistical analysis focuses on storm tides, excluding wave runup and tsunami peaks.
Rui Yang provides a 17.0 MB ZIP file supplementing the paper 'Compact photonic spiking neuron with inherent stochasticity based on phase-change material for probabilistic computing'. The dataset was last updated on 2026-04-24 and is shared under a CC-BY-4.0 license on figshare. The specific data types and row counts are not detailed in the available metadata.
Farshad Farahbod of Islamic Azad University, Firoozkooh Branch conducted research on applying nanoparticles for cooling in drilling operations. The dataset likely contains experimental measurements and numerical results from a mathematical model predicting temperature profiles in the drilling zone. Studies show nano fluid injection into a heat pipe significantly reduces the temperature profile.
A mathematical dataset from a paper by Xiangling Li of Hebei University of Architecture. It supports theorems on strong coupled fixed points via cyclic contractive mappings in fuzzy metric spaces. The dataset is used to ensure the existence of a common solution for two Urysohn-type integral equations.
A stochastic programming model combined with an agent-based spread model for the Asian Papaya Fruit Fly in Queensland, Australia. It was developed by Hoa Thi Minh Nguyen of the Australian National University and is calibrated to a highly detailed land-use raster map (50m×50m) and weather-related data. The model is validated against a historical outbreak and used to assess current surveillance levels.
A working paper from the Reality Drift Archive examines the concept of 'workslop'—AI-assisted output that appears polished but lacks substantive advancement. The paper situates this within the Optimization Trap, analyzing how efficiency gains reshape perceptions of value and authenticity. It was last updated on April 26, 2026.
Allaberen Ashyralyev from Near East University authored a paper studying the space identification problem for the elliptic-telegraph differential equation in Hilbert spaces. The work proves a main theorem on the stability of this problem. Applications include establishing stability theorems for three source identification problems for one-dimensional and multidimensional elliptic-telegraph differential equations.
From 1960 to 2018, this dataset contains annual population figures for six countries: the USA, France, Britain, Italy, Spain, and Turkey. It was created by Ertuğrul Karaçuha of Istanbul Technical University to model and predict population trends using fractional calculus and the Least Squares method. The data is intended for research into mathematical modeling techniques applied to demographic forecasting.
Scottish Sea Fisheries Statistical Districts reflect the area of responsibility of local fishery offices. The dataset shows the approximate extent of each district using the 1:50,000 scale OS Meridian 2 mean high water spring coastline. Districts were updated in 2013 to reflect changes in the Ullapool fishery office and rename the Pittenweem district as Anstruther.
Sediment sources to the Fitzroy River coastal zone have been identified and quantified using an integrated geochemical and modelling approach. The dataset, provided by the Australian Ocean Data Network, indicates a sediment composition consistent with derivation from mixed catchment sources. A Bayesian statistical model revealed changes in catchment sediment sources over time, with the proportion of basaltic material increasing and becoming dominant with an estimated enrichment of ca. 3 relative to catchment abundances.
An R script authored by Luigi Baciadonna, last updated on June 2, 2026. It contains the code used to perform all statistical analyses for a research manuscript on temporal consistency in bumblebee judgment biases. The script is shared under a CC-BY-4.0 license on figshare.
A 2026 study from figshare details the structure-activity relationship of over 80 sulfonyl-containing analogs designed as influenza A virus hemagglutinin inhibitors. The dataset includes potency measurements (EC50) for compounds like SHJ-027 and (S)-63, with the lead compound showing over 10-fold enhanced potency against an oseltamivir-resistant H1N1 strain. In vivo efficacy data from a lethal mouse model is also provided, showing compounds achieving 20–30% survival.
GEmO (Gemini-Empowered Olympiad Math Dataset) is a programmatically verified collection of advanced mathematical problem-solving trajectories. It contains exactly 11,500 unique rows of college-level, honors, and Olympiad-level mathematics across 11 specialized domains. The dataset was released by the Surpem team and was last updated on 2026-05-17.
A 1967–1999 time series on counterbalancing and coup-proofing strategies introduced in academic research by Pilster and Böhmelt. The dataset is the most recent version of data used in studies analyzing the relationship between regime type, civil-military relations, and military effectiveness. It is hosted by the Harvard Dataverse and was last updated in May 2026.
Five distinct seabed sediment classes were statistically defined for Keppel Bay, a macrotidal interface between the Fitzroy River and the Great Barrier Reef shelf. The Australian Ocean Data Network published this assessment, which combined sediment sampling with acoustic seabed mapping. The dataset was last updated in April 2026.
Australian Ocean Data Network provides a dataset describing the use of cross-spectral techniques and admittance functions to model isostatic processes. The dataset likely contains mathematical filters representing the relationship between gravity and topography for different lithospheric rheologies. It was last updated on 2026-04-16.