DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Mathematics & Statistics Datasets | DataSalon

All Categories

📐

Mathematics & Statistics

Mathematical datasets, statistical benchmarks, probability, optimization, operations research

2,487 datasets

Replication Data for Bayesian Analysis of Inclusion Models from Geological Studies

Data from Anselmetti and Eberli (1997) and Fabricious et al. (2010) used for a Bayesian analysis of inclusion models. The dataset was authored by Kyle Spikes and is hosted by the Texas Data Repository. It was last updated on March 18, 2024.

TabularBayesian StatisticsGeological DataInclusion ModelsReplication Data+1

0 views

Mathematics & Statistics

Teapot Dome Well Tie: Windowed and Rotated Seismic Data with Statistical Wavelets

Teapot Dome data, a known geological site, has been processed with windowing and rotation techniques. The dataset includes statistical wavelets, suggesting a focus on signal analysis and feature extraction. Sean Bader authored this dataset, which was last updated on March 18, 2024.

Time SeriesTeapot DomeWavelet AnalysisGeophysicsSeismic Data+1

0 views

Mathematics & Statistics

Toxic-DPO: Preference Data for Model Unalignment

Unalignment Toxic Dpo V0.2 Zh Cn is a multilingual dataset intended to illustrate the use of Direct Preference Optimization (DPO) for model unalignment. The dataset was created by tastypear and last updated on 2024-01-31. Its description states it contains highly toxic or harmful examples.

TextMultilingualModel UnalignmentDpoPreference Optimization+1

0 views

Mathematics & Statistics

High-Resolution Snowpack Simulations for Swiss Alpine Terrain

Snowpack simulations for a domain in the Swiss Alps (Dischma) are generated using the Flexible Snow Model (FSM2oshd) with wind and snow redistribution (FSM2trans). The simulations are forced by atmospheric data downscaled to 250m, 100m, and 50m resolutions using the HICAR and COSD methods. The dataset was published by ENVIDAT and updated in 2024.

Time SeriesGeospatialGeospatial DownscalingSnowpack SimulationHydrological ModelingAlpine Terrain+1

0 views

Mathematics & Statistics

Speech Commands

64,727 one-second .wav audio files containing 30 to 35 distinct spoken English words and background noise. The collection includes ten core directional and action commands alongside auxiliary words and a dedicated _silence_ class for noise simulation.

Source DatasetsoriginalLanguage CreatorscrowdsourcedLanguageenSize Categories100 Kn1 MTask Categoriesaudio ClassificationLicensecc By 40RegionusTask Idskeyword SpottingMultilingualitymonolingualAnnotations CreatorsotherArxiv180403209+1

0 views

Mathematics & Statistics

Toxic Preference Data for Direct Preference Optimization

Toxic-DPO v0.2 is a dataset created by 'unalignment' to illustrate the use of Direct Preference Optimization for de-aligning language models. It contains a collection of text examples labeled as toxic or harmful, including profanity. The dataset was uploaded to Hugging Face on January 9, 2024.

TextModel AlignmentMachine Learning SafetyNatural Language ProcessingPreference Optimization+1

0 views

Mathematics & Statistics

Circularrr Aops: Nanophotonic Inverse Design Data for Plasmonic Switches

Giving access to supplementary materials for a nanophotonics research paper focused on inverse design and plasmonic switches, updated in March 2024 by author ehsan20e20e. It includes data and code for training artificial neural networks using TensorFlow, Keras, and MATLAB to optimize photonic structures.

Scientific ReportsInverse DesignNanophotonicsCodeAnacondaTensorflowNeural NetworksPythonKerasArtificial IntelligenceSupplementary InformationSupplementary MaterialsMatlabArtificial Neural NetworkPlasmonic SwitchDeep Learning+1

0 views

Mathematics & Statistics

Tiny Math Textbooks: 635k Short Educational Texts on Core Topics

635,000 short math textbooks covering topics like algebra, calculus, geometry, logic, probability, and statistics. The dataset was created by author nampdn-ai and last updated on January 27, 2024. Its specific source and compilation method are not detailed in the provided metadata.

TextTextbooksAlgebraMathematicsEducationCalculus+1

0 views

Mathematics & Statistics

Toxic Preference Optimization Dataset for Model De-alignment

2023-12-26 dataset from unalignment illustrates using direct preference optimization (DPO) to de-censor language models. It contains toxic and harmful text examples, many with attached warnings or disclaimers.

ParquetLibrarypolarsSize Categoriesn1 KModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasLicensecc By 40RegionusNot For All Audiences+1

0 views

Mathematics & Statistics

Harmonized Tariff Schedule for U.S. Imports (2023)

The Harmonized Tariff Schedule of the United States (2023) provides the official tariff rates and statistical categories for all merchandise imported into the United States. It is maintained by the US International Trade Commission and is based on the international Harmonized System for global trade in goods. The dataset includes all revisions for the 2023 year.

Product ClassificationHtsTariff Schedule+1

0 views

Mathematics & Statistics

GSM8K_zh: Chinese-Language Grade School Math Word Problems for Fine-Tuning

7,473 training and 1,319 testing samples of Chinese mathematical word problems, translated from the English GSM8K dataset. The dataset was created by the author 'meta-math' using GPT-3.5-Turbo with few-shot prompting and was last updated on December 4, 2023. It is intended for supervised fine-tuning and evaluation of models on mathematical reasoning in Chinese.

TextMathematical ReasoningTranslationBenchmarkQuestion AnsweringEducationChinese Language+1

0 views

Mathematics & Statistics

TIGERweb Tribal Designated Statistical Areas: 2022 Vintage

Tribal Designated Statistical Areas (TDSA) are geospatial boundaries for statistical purposes. The data is provided as an OGC Web Map Service (WMS) layer by the Bundesamt für Kartographie und Geodäsie. Its vintage is from January 1, 2022, and it was last updated on the platform in November 2023.

GeospatialGovernment DataAdministrative RegionsGeospatial BoundariesTribal Statistics+1

0 views

Mathematics & Statistics

Proof Pile 2: 55-Billion-Token Mathematical Corpus

55 billion tokens of mathematical text across three categories: arXiv papers, OpenWebMath web content, and the Algebraic Stack code repository. The collection integrates LaTeX-formatted scientific documents with formal proof scripts and general mathematical discourse.

Task Categoriestext GenerationLanguageenSize Categories10 Bn100 BArxiv231010631RegionusArxiv231006786Math+1

0 views

Mathematics & Statistics

Firstcoursenetworkscience

Providing over 20 network datasets and Python tutorials specifically curated for the textbook 'A First Course in Network Science' by Menczer, Fortunato, and Davis. The data includes edge lists and node attributes for diverse systems such as social interactions, biological pathways, and technological infrastructures used for educational demonstrations.

TutorialsIndiana UniversityNetwork ScienceTextbookPythonNetworkxSocial Network+1

0 views

Mathematics & Statistics

Autotrain Evaluator: Model Predictions for Algebra Linear 1D Math Problems

Autoevaluate generated these model predictions for the algebra__linear_1d configuration of the math_dataset. The predictions were produced by the umarkhalid96/t5-small-train model on the train split for a summarization task. The dataset card was last updated on October 4, 2023.

TextAlgebraAutotrainMathematicsModel EvaluationBenchmarkSynthetic+1

0 views

Mathematics & Statistics

Proofpile Test Tokenized Mistral: A Tokenized Mathematical Proof Dataset

A tokenized test dataset for mathematical proofs, likely derived from the Proofpile corpus. The dataset was uploaded by author 'emozilla' to the Hugging Face platform and was last updated on October 7, 2023. Its specific size, row count, and column structure are not documented.

TextMathematicsProofsTokenized TextTest Set+1

0 views

Mathematics & Statistics

Citizen Appeals Statistics for Kyiv Housing Services Department

Statistical data on appeals of citizens received to the Department of Housing and Communal Services Svyatoshinsky district in the city of Kiev state administration. The dataset likely contains records related to the implementation of the Law of Ukraine "On Access to Public Information". It was published on the States site of Ukraine and last updated on August 22, 2023.

TabularUkraineCitizen AppealsPublic InformationGovernment StatisticsHousing Services+1

0 views

Mathematics & Statistics

Proof Pile: A Dataset of High Quality Mathematical Text

Proof Pile is a text dataset focused on mathematical content, created by the hoskinson-center. It was last updated on Hugging Face in August 2023. The dataset's specific size, format, and exact content require verification after download.

TextMathematicsProofsAi TrainingText Corpus+1

0 views

Mathematics & Statistics

Algae Cultivation Weather and Chemistry Data 2018-2021

DISCOVR consortia data includes annual weather, algae cultivation composition, and pond water chemistry measurements from 2018 to 2021. The data supports the State of Technology analysis for the Department of Energy's Bioenergy Technologies Office.

AlgaeBiofuelBioenergySotCompositionAlgae CultivationWeatherCrop RotationChemistryCultivationPowerEnergyRaw DataAlternative FuelOutdoor CultivationAlgae Strain ScreeningPond Water+1

0 views

Mathematics & Statistics

PGLib-OPF: Power Grid Library for Optimal Power Flow Benchmarks

277 power system test cases ranging from 3 to 70,000 buses across various network topologies. The dataset provides detailed electrical parameters including branch impedances, bus shunts, and generator cost functions for the Optimal Power Flow (OPF) problem.

Optimal Power FlowMatpowerBenchmark+1

0 views

PreviousPage 116 of 125Next