Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,486 datasets
A dataset from paperswithcode concerning solutions to the confluent hypergeometric differential equation. The description details Kummer functions and regularized confluent hypergeometric functions, which are analytic at a regular singular point. The author, organization, and temporal coverage are unknown.
Statistical information about the participation of women, minorities, and persons with disabilities in science and engineering education and employment. The dataset is sourced from the National Science Foundation (NSF) and aggregated on the paperswithcode platform. The last update date and specific temporal coverage are unknown.
A Kaggle-hosted corpus of mathematical problems focusing on integer-based challenges typical of competition settings. The dataset likely contains problem statements and potentially solutions or annotations, though its exact structure and size are unspecified. Metadata is minimal; the actual content and its utility for training or analysis require verification after download.
A dataset titled 'Theorem - Proof Updated' is hosted on Kaggle. The dataset's specific content, size, and structure are not detailed in the provided metadata. Further inspection after download is required to verify its actual scope and utility.
Observations from 2022 and 2023 field trials assess fall armyworm and corn earworm damage on maize genotypes. The dataset includes categorical 1-9 scale scores for leaf damage, plant aspect, ear aspect, seed set, and ear rot, plus continuous ear damage measurements in mm. It contains the raw data and R code for reproducing Bayesian statistical models from the associated study.
An R package providing unified plotting tools for statistical analysis results. The package, authored by Masaaki Horikoshi, offers a single interface for visualizing results from methods like GLM, time series, PCA, clustering, and survival analysis using the 'ggplot2' style. The dataset's specific size, temporal coverage, and geographic scope are not detailed in the provided metadata.
BayesianTools provides general-purpose Markov Chain Monte Carlo and Sequential Monte Carlo samplers for Bayesian statistics. The package, authored by Florian HΓ€rtig, includes various Metropolis MCMC variants, the T-walk, differential evolution MCMCs, DREAM MCMCs, and an SMC particle filter. It focuses on calibrating complex system models and offers plot and diagnostic functions.
A project poster from the UKCCSRC Call 2 initiative, presented at the Cardiff Biannual in September 2014. The poster details research on CO2 flow metering using multi-modal sensing and statistical data fusion techniques, with grant number UKCCSRC-C2-218. It was contributed by the British Geological Survey.
A source of files for replicating figures from a research article on variance deltas for posterior uncertainty. It includes Stan models, corresponding input data in JSON format, and Python scripts for running the sampler and generating variance deltas. The dataset was authored by Collin Cademartori.
Kaggle hosts a dataset focused on optimizing manufacturing processes for titanium alloys. The data likely contains parameters for reinforcement learning applications in materials science. The author, organization, and specific data scale are not provided in the metadata.
WithinUsAI released 1,000,000 JSONL records on January 5, 2026. The dataset is designed for training and evaluating prompt orchestration techniques. Each record references at least one evidence capsule with a public source.
Supplying replication data and code for the paper "(Empirical) Bayes Approaches to Parallel Trends" by Soonwoo Kwon and Jonathan Roth, published in AEA Papers and Proceedings. The specific contents, including row count, column count, and data structure, are not detailed in the input.
Version 8 includes updated data schemas, performance metrics, and a Monte Carlo simulation architecture for analyzing football betting odds. The dataset is associated with a transparency dashboard for odds data. It is hosted on Kaggle under the title 'OddsFlow Transparency: Schemas & Dashboard'.
Berlin-Brandenburg's Office for Statistics provides block page boundaries from its Regional Reference System (RBS). A block page is an edge that limits a statistical block within the RBS framework. The data is served via a Web Feature Service (WFS) and was last updated on 2025-12-31.
GSM8K-Hi is a Hindi-translated version of the English GSM8K test set for mathematical reasoning. The dataset, created by NVIDIA and last updated in January 2026, contains problems requiring 2 to 8 steps to solve using basic arithmetic operations. Samples were translated via Google Cloud Platform and subsequently reviewed and corrected by human annotators for quality.
DeepScaler is a collection of challenging mathematical reasoning problems. The dataset was created by author Tyrion279 and was last updated on Hugging Face in February 2026. Its specific size, structure, and content require inspection after loading.
The Novel Proof Cognitive Benchmark (NPCB) is a dataset hosted on Kaggle. Its title suggests it likely contains problems or tasks related to mathematical proofs and cognitive reasoning. The dataset's specific content, size, and authorship are unknown.
Kaggle hosts a dataset focused on elliptic differential equations and large language model reasoning. The dataset's specific content, size, and origin are not detailed in the provided metadata. Its primary application appears to be in the intersection of mathematical modeling and AI reasoning tasks.
A dataset related to optimizing Retrieval-Augmented Generation (RAG) systems for cost efficiency. The dataset is hosted on Kaggle, but its specific contents, size, and authorship are not detailed in the provided metadata. Further details about the data's structure and origin require inspection after download.
Kaggle dataset focused on optimizing Retrieval-Augmented Generation systems for cost efficiency. The dataset likely contains metrics, configurations, or performance results related to RAG model tuning. Metadata is minimal; actual content and scale require verification after download.