Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
39,903 datasets
January 2026 survey data from 2,591 German adults aged 18-74 on wearable device usage and adoption motivations. The dataset was created by André Hajek and includes statistics on current use of smartwatches, fitness trackers, and other devices, as well as factors motivating future use. It was last updated on June 2, 2026.
A dataset of 4499 mineral-associated organic carbon observations from 364 sites across global drylands. It combines machine-learning prediction, segmented regression, and multi-scenario projections to assess drought-induced regime shifts from 1991 to 2094. The dataset was authored by Zhaoxin Li and published on figshare.
NASA's VALERI project provides Leaf Area Index (LAI) maps for a 3x3-km protected forest site in Larose, Ontario, Canada. The dataset contains 30-meter resolution geospatial imagery derived from ground measurements and Landsat TM/ETM+ satellite data using a developed transfer function. These maps are intended for validating satellite-derived biophysical products and modeling surface-atmosphere exchanges.
Over 3,000 sediment samples from Geoscience Australia's MARS database provide a regional synthesis of the inter-reefal seabed, which comprises 95% of the Great Barrier Reef Marine Park area. This dataset offers new quantitative information on surface sediment trends and geomorphic features, refining the facies model for the mixed carbonate-siliciclastic margin. The analysis reveals regional patterns and local-scale characteristics of sediment distribution, including gravel, sand, and mud concentrations across the shelf.
A scored subset of approximately 18,400 underwater images collected by an Autonomous Underwater Vehicle (AUV) in the Tasman Fracture Commonwealth Marine Reserve during a 2014/15 pilot study. The images were scored for the proportion cover of visible macrobiota using 25 random points per image. The data was contributed by the Australian Ocean Data Network as part of the National Marine Biodiversity Hub's monitoring program.
A 128.1 KB Excel file uploaded by Yongfu Lou on June 3, 2026, presents genetic evidence for causal pathways between gut microbiota, circulating inflammatory proteins, and degenerative lumbar spine disorders. The data was generated using two-sample Mendelian randomization on FinnGen R12 summary statistics and validated with rat model experiments. It includes results from mediation analyses quantifying indirect effects across 13 genetically supported pathways.
473 genetically predicted gut microbiota taxa and 91 circulating inflammatory proteins were analyzed for causal links to three degenerative lumbar spine disorders using FinnGen R12 summary statistics. The dataset includes results from two-sample and two-step Mendelian randomization mediation analyses, with experimental validation from rat models. It was authored by Yongfu Lou and last updated on June 3, 2026.
Performance Evaluation using DistilRoBERTa Model is a manually annotated dataset of Amazon product reviews. The reviews are annotated according to perceived complexity using the attitudinal categories of appraisal theory: appreciation and judgment. The dataset was created by Shoukat Ullah and last updated on 2026-05-26.
A manually annotated dataset of Amazon product reviews labeled for perceived complexity using attitudinal categories from appraisal theory. The dataset likely contains reviews annotated for appreciation (user-friendliness) and judgment (effectiveness). The author is Shoukat Ullah, and the dataset was last updated on May 26, 2026.
A manually annotated dataset of Amazon product reviews, labeled according to perceived complexity using appraisal theory categories of appreciation and judgment. The dataset was created by Shoukat Ullah and last updated on May 26, 2026. It is used to evaluate a hybrid deep learning model combining DistilRoBERTa and BiGRU.
Individual participant data from 27 studies involving 91,404 asymptomatic women with singleton pregnancies, used to analyze the association between mid-trimester cervical length and spontaneous preterm birth. The data was collected by Kelly Hughes for an IPD meta-analysis, with a search updated in November 2025. It includes a mean cervical length of 40 mm and records 4,442 instances of preterm birth before 37 weeks.
Geochemical data from the Northern Denison Trough in Australia's Bowen Basin, collected by the Queensland Government's ZeroGen project. The dataset includes vitrinite reflectance measurements and Rock-Eval analysis results for Permian sandstone reservoirs, assessing their thermal maturity and CO2 storage potential. It was published via the Australian Ocean Data Network and last updated in June 2026.
CausalBGM is an AI-powered Bayesian generative modeling approach for estimating individual treatment effects in observational studies. The method, developed by Qiao Liu, uses a low-dimensional latent feature representation to mitigate confounding in high-dimensional covariate scenarios. The 2.6 MB release includes code, documentation, and supplementary materials for reproducing the work.
A methodological paper and supplementary materials for a novel online inference algorithm in high-dimensional generalized linear models. The work, authored by Ruijian Han and last updated on 2026-06-04, includes simulation experiments and a real-world application to spam email classification. The associated files are 1.1 MB in size and available under a CC-BY-4.0 license.
37 participants' multimodal telemetry data from a four-scene XR experience at the santralistanbul Energy Museum. The dataset includes logged movement, head orientation, and hand movements used to classify users into one of four interaction profiles. It was collected by Başak Çakmak in 2026 to evaluate a curatorially constrained, profile-aware procedural content generation framework.
A single-group deployment study with 37 participants at the santralistanbul Energy Museum in 2026. The dataset likely contains telemetry logs of user behavior, including movement, head orientation, and hand movements, used to classify interaction profiles and adapt an XR experience. It was created by Başak Çakmak and includes pre/post questionnaire results on heritage perception.
36 Science Dataset layers provide black-sky and white-sky albedo values at 1 kilometer resolution for nine VIIRS moderate bands (M1-M5, M7-M). The product is generated daily using a 16-day rolling window of VIIRS data, weighted to the ninth day, and employs the RossThick/Li-Sparse-Reciprocal BRDF model to correct for surface anisotropic effects. It is part of a suite designed to continue the MODIS BRDF/Albedo data record.
Cook County, Illinois, provides detailed records of deaths falling under the jurisdiction of its Medical Examiner's Office from August 2014 onward. The dataset includes demographic information, dates of incident and death, and detailed cause-of-death analysis with specific flags for gun, opioid, heat, cold, and transportation-related fatalities. Records are updated daily and reflect changes in jurisdictional policy, such as the exclusion of most hospital and hospice COVID-19 deaths after April 1, 2022.
A 5.5 KB Excel dataset compares the performance of a proposed Dynamic Reserve Power Point Tracking (DRPPT) algorithm against conventional methods for grid-tied solar systems. The data likely contains simulation and hardware test results, including metrics like Total Harmonic Distortion (THD) reduction. Authored by Sajjan Kumar and last updated on June 1, 2026, it is shared under a CC-BY-4.0 license.
Sajjan Kumar's dataset contains parameters for photovoltaic panels related to a proposed Dynamic Reserve Power Point Tracking control algorithm. The data, last updated in June 2026, is stored in a 5.5 KB XLS file. The algorithm was tested on hardware and in simulation to improve grid stability by dynamically adjusting solar reserve power.