Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,461 datasets
NLCO is a benchmark dataset containing 6,450 total samples across 43 tasks for evaluating large language models on natural-language combinatorial optimization problems. The dataset, created by summer142857jiang and last updated in April 2026, is organized into 129 CSV files with 50 samples per file and three difficulty tiers: Set-S, Set-M, and Set-L.
Statistically significant metabolites identified from a comparison of winter and spring samples in the SEA group. The dataset, created by Julia Drespling and last updated in April 2026, is a 9.5 KB XLSX file containing results from a sectional study on airway metabolome profiles.
Metabolite concentrations from bronchoalveolar lavage fluid samples of CUA horses, analyzed for seasonal differences. The dataset, created by Julia Drespling and last updated in April 2026, is an 8.8 KB XLSX file containing results from a statistical comparison between warm and cold seasons. It likely includes metabolite names and associated statistical significance values.
Statistically significant metabolites winter vs. spring in CUA group. The dataset, authored by Julia Drespling and last updated in April 2026, is an 8.6 KB XLSX file containing results from a statistical comparison of metabolite profiles across seasons.
A dataset from figshare containing metabolite profiles from bronchoalveolar lavage fluid (BALF) samples. The data, created by Julia Drespling and last updated in April 2026, statistically compares metabolite concentrations between warm and cold seasons in a group of horses. The dataset is 9.0 KB in size and is available in XLSX format.
A 9.8 KB dataset listing metabolites with statistically significant differences between winter and spring seasons in the MEA group of horses. The data was authored by Julia Drespling and last updated on figshare in April 2026. It is licensed under CC-BY-4.0 and is available in XLSX format.
Julia Drespling's dataset contains metabolites identified as statistically significant between warm and cold seasons in the MEA group of horses. The data is stored in a 10.2 KB XLSX file and was last updated on April 3, 2026. It originates from a study analyzing bronchoalveolar lavage fluid (BALF) samples using NMR to profile the airway metabolome.
Xiaoting Ma provides raw data supporting statistical analyses and figures for a study on transdermal lidocaine delivery. The 24.6 KB Excel file contains measurements related to cytotoxicity, permeation efficiency, and transdermal absorption. This dataset was published on figshare in April 2026.
Lake Chad region geospatial data covering parts of seven African countries: Cameroon, Chad, Nigeria, Niger, Sudan, Central African Republic, and Libya. It was developed by UNEP/GRID for the Lake Chad Commission on Sustainable Development, with source materials including the 1977 FAO/UNESCO Soil Map of the World. The dataset was published in December 1988.
Kaihua Bao authored a paper proving a local equivariant index theorem for sub-signature operators. This work generalizes a previous index theorem established by Weiping Zhang. The dataset likely contains the mathematical paper and its associated research content.
Vessel details are recorded for statistical analysis. The dataset is provided by the Government Digital Service via the eu_open_data platform. The specific volume, time range, and update frequency are not detailed in the available metadata.
The HR Recruitment and Redeployment database is a collection of personnel records from the UK Government Digital Service. The description indicates it contains personal information such as names and National Insurance numbers, along with statistical data. The dataset's specific size, update frequency, and detailed structure are not provided in the available metadata.
Statistics on the Franco-Ontarian population include breakdowns by year, age, gender, language, and geography. All population counts are rounded to a base of five. The dataset is produced by the Government of Ontario and was last updated in March 2026.
50,000 synthetic samples of math problems written in Bahasa Indonesia. The dataset includes explicit chain-of-thought reasoning traces for each problem, designed to train language models on arithmetic problem solving. It was created by Sandroeth and last updated in April 2026.
50,000 synthetic Indonesian-language samples train language models on arithmetic problem solving. Author Sandroeth created this dataset, which was last updated in April 2026. Each sample includes a math problem and an explicit chain-of-thought reasoning trace.
Mo(Wa)Β²TER Datasets provides replication data for a submitted research paper on optimizing dynamic cloth media filtration in primary wastewater treatment. The data and code are intended to reproduce the results of the study, which uses Response Surface Methodology. The dataset was last updated on April 21, -2026.
Geoscience Australia Data provides a dataset on the spectral representation of isostatic models. The data describes the use of admittance functions, or mathematical filters, to model the relationship between gravity anomalies and topography based on lithospheric rheology. This representation offers a computationally efficient alternative to conventional line-integral methods for calculating free-air gravity anomalies.
CIFAR-10H extends the CIFAR-10 test set by adding human-annotated label distributions. It provides probability distributions, raw vote counts, and majority-choice labels for each image. The dataset was created by MKZuziak and uploaded to Hugging Face in April 2026.
22 pediatric patients were used to evaluate a seizure prediction framework based on hypergraph convolutions and Kuramoto oscillator dynamics. The dataset summarizes model contributions with statistical significance and was authored by Masoud Amiri, last updated in April 2026. It is a small dataset at 5.5 KB, stored in an XLS file.
ACTG Statistical and Data Analysis Center provides data from 235 participants across 6 Analytic Treatment Interruption (ATI) studies. The dataset includes HIV-1 RNA level data for all participants and HIV-persistence measures for a subset of 124 individuals. These data were examined in multiple statistical and modeling papers from 2016 to 2025.