Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
39,937 datasets
Leaf Area Index (LAI) maps provide half the total live foliage area per unit ground surface for selected Canadian sites at 30-meter resolution. The data, produced by NASA using algorithms from ground measurements and Landsat TM/ETM+ imagery, are formatted as TIFF files. These maps are intended for validating other LAI products and modeling surface-atmosphere exchanges.
Supplementary file 1_Human-made vs. AI-generated: how provenance labels drive strategic curation via perceived effort.docx is a 23.5 MB document containing data from a between-subjects experiment with 618 short-form video users. The study, authored by Han Sol Lim and last updated in June 2026, examined how AI-generated and human-made labels affect perceived creator effort and strategic curation intentions. The data is shared under a CC-BY-4.0 license on figshare.
A 44.8 KB Excel file contains the specimen list for the cream-coloured giant squirrel, Ratufa affinis, from the Natural History Museum, London (NHMUK) collection. It was published as supplementary data for a 2026 research article in the Raffles Bulletin of Zoology by author Shivaram Rasu. The dataset supports the article's conclusion that the original holotype is lost and a neotype from Singapore was designated.
Six sedimentary cycles, each hundreds of metres thick, have been identified in the Surat Basin, corresponding to nine global sea-level oscillations. The cycles from the Jurassic and Cretaceous periods are described, detailing depositional environments like braided streams, deltas, and marine settings. This dataset is provided by the Australian Ocean Data Network and was last updated in June 2026.
Eighteen trace gases were quantified from biofuel burning in Zambia during September 2000. This dataset provides emission ratios and factors for compounds like carbon dioxide, methane, and nitrogen oxides, based on ground-based open-path Fourier transform infrared spectroscopy measurements. The data was collected by the University of Montana as part of the SAFARI 2000 initiative.
FlowER datasets provide mechanistic data for retraining the FlowER model for chemical reaction prediction. The repository contains two datasets: one for full reproducibility of the original paper and a revised version with corrected templates. The data, released under an MIT license by author Tim Pinkhassik, is stored in ZIP files totaling 292.8 MB.
A geological dataset describes the morphology, sediment types, and structural features of the continental shelf off southeast Australia between Sugarloaf Point and Gabo Island. The data likely contains information on shelf width variations, sediment composition, and submarine canyon features. It was published by the Australian Ocean Data Network.
2.7 KB of production-hardened C++ source code for a Fixed-Point Hanning Window Generator Core, part of the Davis Logic V2 DULLEA framework. Authored by Jamie Davis and published on figshare under a CC-BY-4.0 license, it was last updated on 2026-06-01. The module is engineered for deterministic, ultra-low-latency signal conditioning on 32-bit bare-metal architectures.
Three 1:1,000,000 scale lithofacies maps of shelf sediments published by early 1974, covering the Rowley Shoals, Scott Reef, and Arafura Sea areas. The maps result from systematic reconnaissance geological surveys initiated by the Bureau of Mineral Resources following a 1967 monograph. Users are directed to Bulletin 83 for interpretation, as the map does not distinguish modern sediments from relics of earlier regimes.
3090 km of high-quality seismic data and 7584 km of bathymetric data were collected from the Kenn Plateau off northeast Australia. The voyage, managed by Geoscience Australia and CSIRO, recovered the first ancient rocks from this frontier offshore region. The data aims to improve geological understanding and bathymetric maps of a poorly known part of Australia's marine jurisdiction.
Shuo Xu's dataset, last updated May 29, 2026, summarizes test performance for five model families classifying emotional tone in cancer peer-support text. The 5.5 KB Excel file contains results from a study using the 'Mental Health Insights: Vulnerable Cancer Survivors & Caregivers' dataset, comparing models like TF-IDF Logistic Regression, Random Forest, LightGBM, GRU, and fine-tuned ALBERT. It includes performance metrics such as weighted F1 and macro one-vs-rest AUC with bootstrap confidence intervals.
A dataset of patient-authored and caregiver-authored text from online cancer peer-support communities, annotated for emotional tone. The dataset includes original labels and parallel AI-generated labels from a large language model, along with structured context variables. It was created by Shuo Xu and last updated on May 29, 2026.
A dataset containing torsion angle correlation analyses for a nine-component protein degradation complex. The data was generated by Carolina Escobar Palacio and published on figshare on June 2, 2026. It applies the TDTAC framework to map directional, sequential conformational motions within the dBET70 PROTAC, BRD4, and CRL4A E3 ligase assembly.
Three experiments tested the Pleasure-Interest Model of Aesthetic Liking to locate the source of bias against AI-generated art. The data, shared by Yongquan Wang on figshare under CC-BY-4.0, indicates the bias emerges predominantly during controlled, not automatic, cognitive processing. Providing interpretive semantic cues significantly mitigated the negative bias.
564 bytes of tabular data from three experiments investigating bias in aesthetic evaluation. The data, published by Yongquan Wang on figshare under CC-BY-4.0, examines how negative bias toward AI-generated art emerges during controlled versus automatic cognitive processing. Results indicate the bias was smaller under automatic processing conditions and was mitigated by providing interpretive semantic cues.
A study on Ruditapes variegatus clams established a 3x3 diallel cross using inner shell color phenotypes (red, white, orange). The dataset likely contains measurements of heterosis, survival rates under heat stress, and heritability estimates for growth and thermal tolerance traits. The data was authored by Jinlong Li and last updated on May 28, 2026.
An interim Population Consequence of Disturbance (iPCoD) model adapted for blue whales and southern right whales off the southern Australian coast. The model provides a framework for policymakers and industry to assess population-level impacts from offshore wind farm developments, specifically for data-poor marine mammal populations. Developed under the NESP MaC 4.9 project and hosted by the Australian Ocean Data Network, this methodology is designed to be updated as more baseline data becomes available.
28,684 participants were recruited in a retrospective cross-sectional study from March to August 2018 in two Chinese counties. The research found an inverse association between the albumin-globulin ratio (AGR) and peripheral arterial disease (PAD), with the relationship being more significant in smoking and stroke populations. The dataset, authored by Chao Yu and shared under a CC-BY-4.0 license, was last updated in May 2026.
Nine high-risk pediatric patients in India showed subtherapeutic trough asparaginase activity during induction therapy for acute lymphoblastic leukemia. The prospective pilot study by Himanshu Dhanda measured serum enzyme activity at pre-dose and 24-hour post-dose time points, correlating levels with biochemical toxicities. The document, last updated in 2026, presents evidence for potential inadequate drug exposure in a resource-limited setting.
An 8-week pilot study assessed the feasibility of a group-based, video conference-delivered behavioral intervention for cognitive fatigability in people with multiple sclerosis. The dataset, published by Tamanna Islam in 2026, includes results from 18 enrolled participants, tracking metrics like eligibility, recruitment, adherence, and satisfaction. Feasibility was evaluated using a traffic-light framework to determine if progression to a full-scale randomized controlled trial is advised.