Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,487 datasets
Data from Anselmetti and Eberli (1997) and Fabricious et al. (2010) used for a Bayesian analysis of inclusion models. The dataset was authored by Kyle Spikes and is hosted by the Texas Data Repository. It was last updated on March 18, 2024.
Teapot Dome data, a known geological site, has been processed with windowing and rotation techniques. The dataset includes statistical wavelets, suggesting a focus on signal analysis and feature extraction. Sean Bader authored this dataset, which was last updated on March 18, 2024.
Unalignment Toxic Dpo V0.2 Zh Cn is a multilingual dataset intended to illustrate the use of Direct Preference Optimization (DPO) for model unalignment. The dataset was created by tastypear and last updated on 2024-01-31. Its description states it contains highly toxic or harmful examples.
Snowpack simulations for a domain in the Swiss Alps (Dischma) are generated using the Flexible Snow Model (FSM2oshd) with wind and snow redistribution (FSM2trans). The simulations are forced by atmospheric data downscaled to 250m, 100m, and 50m resolutions using the HICAR and COSD methods. The dataset was published by ENVIDAT and updated in 2024.
64,727 one-second .wav audio files containing 30 to 35 distinct spoken English words and background noise. The collection includes ten core directional and action commands alongside auxiliary words and a dedicated _silence_ class for noise simulation.
Toxic-DPO v0.2 is a dataset created by 'unalignment' to illustrate the use of Direct Preference Optimization for de-aligning language models. It contains a collection of text examples labeled as toxic or harmful, including profanity. The dataset was uploaded to Hugging Face on January 9, 2024.
Giving access to supplementary materials for a nanophotonics research paper focused on inverse design and plasmonic switches, updated in March 2024 by author ehsan20e20e. It includes data and code for training artificial neural networks using TensorFlow, Keras, and MATLAB to optimize photonic structures.
635,000 short math textbooks covering topics like algebra, calculus, geometry, logic, probability, and statistics. The dataset was created by author nampdn-ai and last updated on January 27, 2024. Its specific source and compilation method are not detailed in the provided metadata.
2023-12-26 dataset from unalignment illustrates using direct preference optimization (DPO) to de-censor language models. It contains toxic and harmful text examples, many with attached warnings or disclaimers.
The Harmonized Tariff Schedule of the United States (2023) provides the official tariff rates and statistical categories for all merchandise imported into the United States. It is maintained by the US International Trade Commission and is based on the international Harmonized System for global trade in goods. The dataset includes all revisions for the 2023 year.
7,473 training and 1,319 testing samples of Chinese mathematical word problems, translated from the English GSM8K dataset. The dataset was created by the author 'meta-math' using GPT-3.5-Turbo with few-shot prompting and was last updated on December 4, 2023. It is intended for supervised fine-tuning and evaluation of models on mathematical reasoning in Chinese.
Tribal Designated Statistical Areas (TDSA) are geospatial boundaries for statistical purposes. The data is provided as an OGC Web Map Service (WMS) layer by the Bundesamt fΓΌr Kartographie und GeodΓ€sie. Its vintage is from January 1, 2022, and it was last updated on the platform in November 2023.
55 billion tokens of mathematical text across three categories: arXiv papers, OpenWebMath web content, and the Algebraic Stack code repository. The collection integrates LaTeX-formatted scientific documents with formal proof scripts and general mathematical discourse.
Providing over 20 network datasets and Python tutorials specifically curated for the textbook 'A First Course in Network Science' by Menczer, Fortunato, and Davis. The data includes edge lists and node attributes for diverse systems such as social interactions, biological pathways, and technological infrastructures used for educational demonstrations.
Autoevaluate generated these model predictions for the algebra__linear_1d configuration of the math_dataset. The predictions were produced by the umarkhalid96/t5-small-train model on the train split for a summarization task. The dataset card was last updated on October 4, 2023.
A tokenized test dataset for mathematical proofs, likely derived from the Proofpile corpus. The dataset was uploaded by author 'emozilla' to the Hugging Face platform and was last updated on October 7, 2023. Its specific size, row count, and column structure are not documented.
Statistical data on appeals of citizens received to the Department of Housing and Communal Services Svyatoshinsky district in the city of Kiev state administration. The dataset likely contains records related to the implementation of the Law of Ukraine "On Access to Public Information". It was published on the States site of Ukraine and last updated on August 22, 2023.
Proof Pile is a text dataset focused on mathematical content, created by the hoskinson-center. It was last updated on Hugging Face in August 2023. The dataset's specific size, format, and exact content require verification after download.
DISCOVR consortia data includes annual weather, algae cultivation composition, and pond water chemistry measurements from 2018 to 2021. The data supports the State of Technology analysis for the Department of Energy's Bioenergy Technologies Office.
277 power system test cases ranging from 3 to 70,000 buses across various network topologies. The dataset provides detailed electrical parameters including branch impedances, bus shunts, and generator cost functions for the Optimal Power Flow (OPF) problem.