Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,487 datasets
A hierarchical representation of all UK statistical geographies within their geography groups and geographical areas, as at June 2021. The dataset is provided by the Government Digital Service and was last updated on the platform in August 2025. The primary file is a PDF document scaled to A3 paper size, with a file size of 4 MB.
The Hierarchical Representation of UK Statistical Geographies dataset from the Government Digital Service shows all UK statistical geographies within their groups and geographical areas. The data represents the state of these geographies as of June 2020 and is provided in a PDF scaled to A3 paper size. The file size is 5 MB.
A single PDF file scaled to A3 paper size details the hierarchical relationships between all UK statistical geographies as of April 2020. The 5 MB document, provided by the Government Digital Service, shows geographies within their groups and geographical areas. Its structure is intended for visual reference of administrative and statistical boundaries.
Multi-objective optimization results for the splint design case study, including Pareto Front and Analytic graphs. The dataset was authored by Garcรญa-Domรญnguez, Amabel and last updated on October 14, 2025.
OLMo Mix 1124 is a collection of data used to train the OLMo-2-1124 models, released in November 2024. The majority of the dataset, 3.70 trillion tokens, comes from the DCLM-Baseline source. It was created by AllenAI and includes components such as ArXiv papers, pes2o, StarCoder, and Algebraic-stack.
2023โ2025 records from the ZEN AI Pioneer Program, a verifiable youth AI literacy initiative in the U.S. The repository, authored by ZENLLC, preserves historic milestones and proofs of students as young as 11 building cloud-hosted AI applications. Data is anchored to the Bitcoin blockchain for public permanence.
A 645 KB HTML document provides the user guide for Standard Area Measurements (SAM) products. The guide, published by the Government Digital Service, includes information on measurement types, data tolerance, accuracy, currency, and conditions of use. It was last updated on August 11, 2025.
Government Digital Service provides a 561 KB user guide for Standard Area Measurements (SAM) products. The document, updated on 2025-08-11, details measurement types, data tolerance, accuracy, currency, and conditions of use. It offers guidance on applying these measurements for statistical purposes.
December 2021 boundaries for the Census statistical geography hierarchy in England and Wales. The dataset includes Output Areas (OAs), Lower layer Super Output Areas (LSOAs), and Middle layer Super Output Areas (MSOAs). Boundaries are generalised (20m) and clipped to the coastline, provided by the Government Digital Service.
2021 population weighted centroids are described in this document from the Government Digital Service, last updated on August 11, 2025. The resource provides information on the centroids and the methodology used to produce them. The file is available in HTML format and is 356 KB in size.
U.S. data provides a statistical overview of export activities for small and medium-sized enterprises with fewer than 500 employees. The dataset includes only identifiable exports linked to individual companies via U.S. export declarations.
Multiple creativity assessment tasks and human preference ratings are provided for evaluating model-generated creative content. The data includes human-generated responses and qualitative scores used in the Creative Preference Optimization framework.
England's former standard statistical regions (SSR) as recorded on 31 December 2005. This dataset contains names and codes for these superseded administrative units. The data is provided by the Government Digital Service and was last updated on the platform in August 2025.
GPT-oss-120B-Distilled-Reasoning-math is a dataset of mathematical problems with generated reasoning processes and answers. The data was created by author Jackrong using the gpt-oss-120b model and was last updated on August 17, 2025. The dataset is formatted in JSON Lines and includes fields for the question, category, reasoning steps, and final answer.
A story map from the Government Digital Service, last updated on August 11, 2025, explains the creation and application of a new statistical geography for major towns and cities. The resource details how and why the boundaries were defined and provides guidance on their use for statistical purposes. The primary format is an interactive HTML story map.
We-Math is a benchmark dataset of 6,500 visual math problems, spanning 67 hierarchical knowledge concepts and 5 layers of knowledge granularity, introduced at ACL 2025. It was created by the We-Math team and last updated on the Hugging Face platform in August 2025. The dataset is designed to explore problem-solving principles beyond end-to-end performance.
Government Digital Service documentation details the evolution of census geography from 2011 to 2021. The report explains changes to Output Areas (OAs) and Super Output Areas (SOAs), the fundamental building blocks for UK census statistics. This HTML document was last updated on August 11, 2025.
39,764 examples of formal mathematical problems paired with their step-by-step solutions. The dataset, created by TamasSimonds, consists of prompt-completion pairs in English, sourced from a cleaned CSV file and last updated on August 17, 2025.
Analysis data from the paper 'Hidden Dynamics of Massive Activations in Transformer Training' characterizes the emergence patterns of large scalar values in transformer hidden states. The dataset provides detailed measurements and mathematical characterizations across the Pythia model family during training. It was created by Aimpoint-Digital and last updated on August 14, 2025.
NuminaMath-LEAN is a large-scale dataset of 100,000 mathematical competition problems formalized in the Lean 4 theorem prover language. It was created by AI-MO and is derived from a challenging subset of the NuminaMath 1.5 dataset, focusing on problems from competitions like the IMO and USAMO. The dataset was last updated on July 31, 2025.