Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
44,346 datasets
Forster, Cape Hawke to Black Head in New South Wales, Australia, is covered by this seabed survey. The dataset contains 32-bit floating point GeoTIFF files of bathymetry and backscatter data at a 5-meter resolution, acquired by the NSW Department of Planning and Environment between 27 February 2019 and 14 October 2020. Data was collected using an R2Sonic 2022 multibeam sonar onboard the RV Bombora as part of the SeabedNSW program.
A multi-scale dataset from 2000 to 2022 integrates LULUCF greenhouse gas inventories, forest mortality data, and continental disturbance records across 18 countries. Gabriel Osei Forkuo parameterized the DREM framework with this data to evaluate past drivers and model future carbon sink trajectories through 2050. The analysis applied multiple regression, generalized linear mixed models, and empirical scenario projections under contrasting climate and management pathways.
3.9 GB of reference databases, benchmarking resources, and a reproducibility container for the CoMR workflow. The deposit includes FASTA databases, BLAST resources, orthogroup alignments, HMM profiles, and benchmarking scripts for yeast and a protist. It was authored by Julie Boisard and last updated on 2026-04-21.
The Bureau of Mineral Resources initiated a program of systematic reconnaissance geological surveys of the continental shelf following a 1967 monograph. The results include 1:1,000,000 lithofacies maps of shelf sediments, with three sheets printed by early 1974 and work on two further sheets advanced. Users should refer to Bulletin 83 (GeoCat #163) for interpretation guidance, as the map does not distinguish between modern and relict sediments.
Mineral Deposits by Property from the Government of Yukon provides quantitative analysis of commodities like gold and silver in Yukon's hard rock deposits. It details total tonnage, average grades, and contained metal ounces, differentiating between NI 43-101, JORC, and historical resource estimates.
Irina Zlotnikova developed a manually curated and annotated corpus of 1,700 Setswana sentences for training a grammar checking system. The dataset supports a Long Short-Term Memory (LSTM) model that achieved 96% classification accuracy for grammatical correctness. The work was published on figshare in April 2026.
12,000 filtered samples for generating concise, descriptive titles from a user's first message in a conversation. The dataset was curated by SupraLabs for training the experimental Supra Title model family. It was last updated on June 12, 2026.
SpeechJBB is an audio benchmark for evaluating safety alignment and comprehension in large audio language models. It tests model responses to harmful spoken requests across monolingual speech, code-switched speech, and code-switched speech with pseudo-word obfuscation. The dataset was created by virginiaceccatelli and was last updated on 2026-06-16.
An inventory of public information generated, obtained, acquired, or controlled by the Colombian Financial Superintendency (Superfinanciera) that has been classified as confidential or reserved. The dataset includes metadata such as series, generation date, responsible leader, classification date, and legal justification. It is published by datos.gov.co and was last updated on May 18, 2026.
Functional traits for six Neotropical woody bamboo species were measured to analyze differences between savanna and forest-associated species. The dataset includes aboveground characteristics like culm height and biomass, and belowground features such as rhizome length and bud counts. Elizabeth McMurchie created this 1.0 MB dataset, which was last updated on April 30, 2026.
Xiao-Tian Li published a clinical case report on figshare in 2026. The document describes a Chinese female patient with parkinsonism caused by two novel compound heterozygous mutations (c.313G>T and c.23T>A) in the PLA2G6 gene. The report details the patient's clinical presentation, response to therapy, and discusses the phenotypic complexity of PLA2G6-associated disorders.
Rodrigo Sandoval's research data from 2026 investigates the role of Cyclin-dependent kinase 5 in purinergic receptor-mediated pain. The dataset likely contains results from calcium imaging in trigeminal neurons and behavioral assays in mice. Findings indicate Cdk5 modulates pain by influencing P2X2/3 receptor kinetics without affecting membrane expression.
A research paper proposing an energy-based limit curve for reinforced concrete moment-resisting frames equipped with steel damper columns. The study by Kenji Fujii uses incremental critical pseudo-multi impulse analyses on three eight-story structures and validates the framework with nonlinear time-history analyses of recorded ground motions, including sequences from the 2011 Tohoku and 2016 Kumamoto earthquakes. The dataset is a PDF file of 11.3 MB last updated on 2026-05-04.
A National Aeronautics and Space Administration paper introduces a novel reasoning methodology for expert troubleshooting of complex military and industrial processes. The methodology combines models and measurements to identify and locate faulty system components, using a Model Based Reasoning paradigm and a Dynamic Case Based Reasoning method as an intelligent database. A case study employs a helicopter Intermediate Gearbox to illustrate the efficacy of the approach.
Egomnia S.p.A. developed this proprietary dataset from content sourced from userprompt.ai. The version available on Hugging Face is a free preview representing only a small portion of the full Italian text corpus focused on artificial intelligence. The complete dataset can be purchased from the author's website.
A Probabilistic Tsunami Hazard Assessment (PTHA) report developed by Geoscience Australia in partnership with Queensland Fire Department for the Gladstone region. The report details modelling validated against three historic tsunami events and provides conservative inundation zone estimates corresponding to current tsunami warning categories. The dataset was last updated on 2026-05 14.
Bathymetry and seabed feature data acquired during the SSCN Subsea Cable Network route survey. The survey was conducted by EGS onboard the RV Bold Explorer in April 2024, focusing on the South-west Corner Marine Park. A subsequent geotechnical sampling program was also executed.
77 academic articles on AI-induced technostress, curated for a systematic literature review. The dataset was compiled by Sunet Eybers using Harzing's Publish or Perish and Google Scholar searches, focusing on articles published up to 2026. The final list excludes 17 pre-2020 articles due to the rapid evolution of Generative AI.
A high-resolution 100-meter bathymetry grid compiles three decades of scientific data for the Cape Darnley region in East Antarctica. The compilation, published in Antarctic Science in 2021, integrates single-beam, multibeam, and chart data from numerous institutions. This detailed seafloor morphology forms a baseline for oceanographic and glaciological modeling.
BeyondMasks evaluates whether video object removal methods also remove causal and physical aftereffects like shadows and reflections. The dataset repository supports research accepted to ECCV 2026. It was created by authors from multiple institutions and last updated on 2026-06-26.