Loading...
Loading...
Mathematical datasets, statistical benchmarks, probability, optimization, operations research
2,489 datasets
Annual statistical information from timely filed New York State personal income tax returns, beginning with tax year 1999. The dataset provides major tax structure components like income, deductions, and tax liability, broken down by size of income and filer's permanent place of residence. It is produced by the New York State Department of Taxation and Finance.
COBAYN is a framework for automatically tuning compiler optimization parameters using Bayesian networks. The project was created by amirjamez and last updated in May 2022. It was developed as part of the European Union's Antarex project.
The Dataset Direct Download Service (WFS): Watercourses subject to the Nièvre Water Act (58) is a geospatial layer from the Bureau de Recherches Géologiques et Minières (BRGM). It classifies watercourses in the Nièvre department of France based on legal criteria from Articles L.214-1 to L.214-6 of the French Environmental Code and case law. The dataset was last updated on February 9, 2022.
A 2021 scientific paper evaluated the conformance of spatial planning goals with outcomes for urban compactness, services, and nature conservation in São Paulo State, Brazil. The dataset includes rasterized land use and cover variables derived from Landsat 5 and Landsat 8 satellite imagery classified for 2005 and 2015. It was created using Partial Least Squares Path Modelling to analyze the relationship between 2005-2006 planning strategies and land-use change ten years later.
Quarterly and Annual Analytical and Statistical Reporting of Children's Services is a dataset from the States site of Ukraine. The dataset was last updated on February 3, 2022. It likely contains tabular data on children's services, available in XML, CSV, WORD DOC, and EXCEL XLS formats.
A collection of handwritten mathematical expression images paired with LaTeX code, created by Azu and hosted on Hugging Face. The dataset contains between 10,000 and 100,000 samples, as indicated by its size category, and was last updated in March 2022. It is designed for training models to convert visual mathematical notation into structured text markup.
A dataset containing code and output data for simulating one-dimensional heat and mass transport in snow using the FEniCS finite element library. The data was created by ENVIDAT to reproduce a key figure from a 2022 publication in The Cryosphere journal. It includes the solver code and resulting simulation data.
Functioning as titled 'Ising2D' and was authored by yonesuke. It was last updated on January 18, 2022. The dataset's size, row count, column structure, and specific content are unknown.
Statistical information on the receipt and handling of public information requests submitted to the Ministry of Digital Information of Ukraine. The dataset covers the year 2020 and was published on the EU Open Data portal in November 2021. It likely contains metrics on request volumes, processing outcomes, and satisfaction rates.
Encompassing text data for document summarization tasks, sourced from AI Hub. The dataset size is categorized as between 10,000 and 100,000 entries, and it was last updated in December 2021.
Published on September 20, 2021, this dataset lists educational institutions in Ukraine, including preschools, secondary schools, and vocational schools. It is provided by the States site of Ukraine and likely contains statistical information about these institutions. The data is available in tabular formats such as Excel and CSV.
2017 geographic boundaries for U.S. states and equivalent entities, extracted from the Census Bureau's MAF/TIGER Database. The dataset includes the fifty states, the District of Columbia, Puerto Rico, and U.S. Island Areas, designed as standalone shapefiles that can be combined for national coverage.
Geospatial data on green areas within Swedish urban agglomerations, produced by Statistics Sweden. The dataset defines green areas as contiguous, publicly available spaces of at least 0.5 hectares, excluding arable land but including pasture. It covers two reference years, 2010 for the 37 largest agglomerations and 2015 for all agglomerations, with data last updated in December 2020.
Simulation results and code support research on modeling snow saltation, specifically the effects of grain size and interparticle cohesion. The data originates from a study published in the Journal of Geophysical Research: Atmospheres and was contributed by ENVIDAT. The supporting code uses a Large Eddy Simulation flow solver coupled with a Lagrangian Stochastic Model.
2020 data records total electron content in the ionosphere over the Jang Bogo Station in Antarctica. The dataset supports the study of statistical characteristics of the ionosphere at southern high latitudes. It was created by AMD_KOPRI and published via NASA EarthData in October 2020.
A 2020 cluster randomized controlled trial in two Moroccan districts analyzed data from 210 recruited pregnant women to evaluate a primary care intervention for gestational diabetes. The study assessed outcomes including birthweight, maternal weight gain, glucose control, and pregnancy complications. It was conducted by Bettina Utz and registered under NCT02979756.
A Bayesian method for detecting mass-extinction impacts on molecular phylogenies, developed by Michael R. May and published in 2020. The CoMET model analyzes lineage diversification rates using a compound Poisson process to distinguish background rate variation from extinction events. An empirical application identified a major mass-extinction event in conifers approximately 23 million years ago.
Microsatellite genotype data from 15 female and 174 hatchling spectacled caimans (Caiman crocodilus) across 20 nests. The dataset was used to investigate mating systems, demonstrating a 95% frequency of multiple paternity over four reproductive seasons from 2007 to 2010 in the Brazilian Amazon.
458 specimens of the wild wheat relative Aegilops triuncialis from 31 populations in a 60 km x 20 km area in Southern Spain were genotyped to estimate wheat admixture levels. The data, generated by Mila Pajkovic and published in 2020, includes results from Approximate Bayesian Computation modeling to estimate selfing rates and the magnitude and tempo of wheat allele introgression.
Motion data from 12 dyads of scientists recorded during face-to-face conversations using depth-sensing cameras. It was created to analyze dyadic modes and motion motifs, such as synchronized parallel torso motion and still segments. The data supports the study of individuality in motion modes, which was maintained for at least 6 months in a subset of 5 dyads.