Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,487 datasets
Mathematically informative IPython notebooks were collated from sources including OpenWebMath, RedPajama, and the Algebraic Stack. The AutoMathText effort used Qwen 72B to score text excerpts based on mathematical quality. This dataset was created by author 'casey-martin' and last updated on Hugging Face in May 2024.
A dataset named 'Dolma V1 7 Algebraic Stack Train' authored by emozilla and published on the Hugging Face platform on 2024-05 29. The title suggests it likely contains text data related to algebraic concepts, possibly intended for training language models. Specific details regarding size, format, and exact content are not provided in the available metadata.
Expert trajectories for the ETO (Exploration-based Trajectory Optimization) framework, a method for training large language model agents through iterative trial and error. The dataset was created by authors Yifan Song, Da Yin, Xiang Yue, Jie Huang, Sujian Li, and Bill Yuchen Lin, with a paper published on arXiv in March 2024. It was uploaded to Hugging Face by the user 'agent-eto' on April 9, 2024.
This collection features multiple benchmark instances for the Unit Commitment (UC) problem across various power system scales. It includes standardized data for generator thermal properties, hourly load demands, and transmission network topologies.
Statistical classification of currencies (SCV) is a dataset from the State Statistics Service of Ukraine. The dataset has been marked as invalid due to the approval of a new official list of currency codes in January 2020. The data was last updated on the eu_open_data platform in February 2024.
A dataset containing gesture form coding from a study by author Kate Mesh. Statistical analysis was performed using R version 3.6.1 and the lme4 package. The dataset was last updated on March 18, -2024.
Seventeen novels and three short story collections were produced by British New Woman novelist Violet Hunt. This collection includes digitized proofs and letters from the author, who was part of a pre-Raphaelite social circle including Oscar Wilde and Radclyffe Hall. The data was contributed by Jullianne Ballou and digitized as part of Project REVEAL (Read and View English & American Literature).
A collection of manuscripts and letters written by American poet Hart Crane, digitized as part of Project REVEAL. The collection includes a corrected typescript and galley proofs for his epic poem 'The Bridge' (1930) and ten letters to family and fellow poet Samuel Loveman. The dataset was contributed by an author from the Texas Data Repository and last updated in March 2024.
Serving as part of the SRSD-Feynman collection designed to evaluate Symbolic Regression for Scientific Discovery. It contains a hard set of formulas from the Feynman Symbolic Regression Database, with variables sampled within realistic ranges to assess if methods can (re)discover physical laws.
MetaMath_DPO_FewShot is an augmented version of the GSM8K dataset, containing grade school math word problems designed to test mathematical reasoning. The dataset, created by abacusai, partitions problems into queries and step-by-step logical responses. It was last updated on the Hugging Face platform in February 2024.
Designed for evaluating Symbolic Regression methods in the context of Scientific Discovery. It is derived from the Feynman Symbolic Regression Database, with formulas and variables reviewed to define realistic sampling ranges. The dataset is categorized as the 'Medium' set within the SRSD-Feynman collection.
Lattice Green's Functions computes lattice Green's functions in two dimensions for square lattices. The dataset was created by Michael Marder for a research project and uses recurrence relations implemented in high-precision arithmetic. It was last updated on March 18, -2024.
Data from Texas Data Repository supports the manuscript 'Stochastic and Deterministic Analysis of Reactivity Ratios in the Partially Reversible Copolymerization of Lactide and Glycolide'. The collection includes Nuclear Magnetic Resonance (NMR), Gel Permeation Chromatography (GPC), and model data. Author Louise Kuehster deposited the data, which was last updated on March 18, β.
Thirty-seven personal letters from novelist Charles Brockden Brown to his wife, Elizabeth Linn Brown, are included in this collection. The Charles Brockden Brown collection contains manuscript prose, poetry, mathematical calculations, notes, and architectural drawings by the American novelist, historian, and editor. Julianne Ballou contributed this collection to the Texas Data Repository, which was last updated on March 18, 2024.
The Baron Alfred Tennyson Collection includes manuscripts, fragments, and proofs of works and letters written by the English poet. The collection was contributed by author Jullianne Ballou and last updated on March 18, 2024. Manuscripts represent works such as Enid, The Falcon, Gareth and Lynette, Idylls of the King, The Lover's Tale, Poems (1869), and The Promise of May.
Reconstruction of GRACE satellite mass change time series using a Bayesian framework. The data spans 206 months from April 2002 to August 2020, with measurements in centimeters. Rateb Ashraf published this reconstruction in 2021, hosted on the Texas Data Repository.
A dataset from garage-bAInd, last updated on 2024-01-24, focused on improving large language model logical reasoning skills. It was used to train the Platypus2 models and is comprised of several filtered datasets including PRM800K, MATH, ScienceQA, SciBench, ReClor, and TheoremQA.
Supplemental Data for a paper by Weiss and Martindale. The data likely contains survey questions and responses from students and professors, along with statistical analyses of the results. The dataset was authored by Rowan C. Martindale and last updated on March 18,ζ们εη°δΊδΈδΈͺιθ――οΌ2024.
Data from Anselmetti and Eberli (1997) and Fabricious et al. (2010) used for a Bayesian analysis of inclusion models. The dataset was authored by Kyle Spikes and is hosted by the Texas Data Repository. It was last updated on March 18, 2024.
Teapot Dome data, a known geological site, has been processed with windowing and rotation techniques. The dataset includes statistical wavelets, suggesting a focus on signal analysis and feature extraction. Sean Bader authored this dataset, which was last updated on March 18, 2024.