Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
39,931 datasets
Yang Zhou published a dataset on figshare in 2026 to accompany research on covariance testing for discretely observed functional data. The dataset includes simulation studies and real data examples used to evaluate a pool-smoothing, FPC-based test statistic. The materials are available for download in PDF and ZIP formats under a CC-BY-4.0 license.
Supplementary files for a study analyzing Joyner's endurance performance model in 888 recreational to world-class athletes. The data includes physiological test results for 495 runners and 393 cyclists, measuring VO2max, lactate thresholds, and exercise economy. The files were authored by Lois Mougin and last updated on June 2, 2026.
37.8 MB of supplementary materials for a paper submitted to the 33rd International Conference on Geoinformatics 2026. The package contains spatial data inputs, derived indices, and processing scripts focused on urban heat vulnerability in Melbourne, Victoria, Australia. It was authored by Ryan Turner and last updated on 2026-05-26.
Supplementary data from a study on reconfigurable nonlinear-processing units (RNPUs) with capacitive control. The dataset likely contains simulation results demonstrating a tenfold reduction in power consumption and classification accuracies of 95.9% Β± 0.2% for MNIST and 86.7% Β± 0.2% for FashionMNIST. The data was authored by R. J. C. Cool and last updated on 2026-05-26.
Ayesha Siddiqua's dataset contains experimental results from propagating corals from the Arabian Gulf. The data includes daily skeletal growth measurements and survivorship for 1,540 coral ramets from seven species, reared for 130 days under three light-color treatments. It was last updated on 2026-06-02 and is shared under a CC-BY-4.0 license.
A 2025 survey of 32 European National Anti-Doping Organization (NADO) leaders, representing a 91% response rate, explored strategic priorities and operational challenges. The study, authored by Fredrik Lauritzen, provides evidence-based insights into shared challenges like funding and human resources. The data was last updated on 2026-06-02 and is available as a DOCX file.
A tomato lateral shoot image dataset was constructed using RGB imaging in greenhouse environments. The dataset was used to develop a YOLOv8n instance segmentation model with a Convolutional Block Attention Module, achieving a mAP of 98.1%. The dataset was created by Yingchun Jiang and last updated in June 2026.
Raw data from a study investigating the role of the orphan G protein-coupled receptor GPR37 in systemic glucose regulation. The dataset includes measurements from heterozygous Gpr37 +/- mice and wild-type controls, assessing body weight, glucose tolerance, and insulin sensitivity. Authored by Mariam Ahmed and published on figshare under a CC-BY-4.0 license in May 2026.
Records from 1991 onward detail non-equity pre-seed grants awarded by the New York State Energy Research and Development Authority (NYSERDA) to support clean energy research and development. The dataset lists companies, consultants, and entrepreneurs, providing details on project titles, technologies, award amounts, and contractor locations. It is maintained by NYSERDA and updated quarterly.
King County's Swim Beach program monitors temperature and bacteria levels at freshwater beaches, primarily on Lake Washington and Lake Sammamish. Each row represents a single observation, with data typically collected from mid-May to late September each year. The dataset tracks both current conditions and 30-day statistical summaries for public health and recreational safety.
Cross-dataset generalization results present performance metrics for a novel hybrid adversarially-trained deep learning framework. The dataset, authored by Sudip Saha and last updated in June 2026, contains results from experiments on reconnaissance, shellcode, and worms datasets. It reports model accuracy on clean data and under FGSM and PGD adversarial attacks.
SGDiff-OS is a synthetic dataset for marine oil spill remote sensing, containing generated pairs of SAR backscatter coefficient (Οβ°) fields and corresponding binary oil spill mask labels. The dataset was created by author Rui Zhang and published on figshare in May 2026. It was constructed to address the limited availability of well-annotated SAR oil spill samples with structural consistency and scattering representation.
The Australian Financial Security Authority publishes quarterly statistics on transactions from the national Personal Property Securities Register (PPSR). The PPSR, which began operations on 30 January 2012, records security interests in personal property, which includes all property other than land, buildings, and fixtures. Data is provided in PDF, CSV, and XLSX formats under a CC-BY-4.0 license.
A dataset of 1,879 individuals used to investigate integrating NLP with clinical data for type 2 diabetes risk prediction. The data includes structured clinical variables and unstructured textual entries processed with a BERT-based NLP pipeline. The study, authored by Yaoyan Lu and last updated in 2026, validated the integrated model on a post-2020 cohort of 939 individuals.
A 2026 study by Yaoyan Lu analyzes a public dataset of 1,879 individuals to improve type 2 diabetes risk prediction. It integrates structured clinical variables like BMI and HbA1c with unstructured medical text processed via a BERT-based NLP pipeline. The hybrid model's performance was validated on a post-2020 cohort of 939 individuals.
A research dataset of 1,879 individuals used to investigate integrating natural language processing with traditional clinical data for type 2 diabetes risk prediction. The dataset includes structured variables like BMI and HbA1c alongside unstructured textual entries such as symptom descriptions and lifestyle notes. It was created by Yaoyan Lu and last updated on 2026-05-28.
A dataset summarizing 107 studies from a scoping review on biofabrication for craniomaxillofacial reconstruction published between 2010 and 2025. The data was compiled by author Shantanu Dixit using a four-dimensional analytical framework to evaluate fabrication strategy, construct complexity, and translational maturity. The dataset is hosted on figshare under a CC-BY-4.0 license.
The Consumer Insights Tracker is a monthly online survey commissioned by the Food Standards Agency and administered by YouGov since July 2023. It monitors the behavior and attitudes of adult consumers aged 16+ in England, Wales, and Northern Ireland regarding food availability, affordability, concerns, and confidence in the supply chain. Data is published quarterly on science.food.gov.uk, with additional ad hoc topics included periodically.
Kerisa Hall's research report from 2026 investigates the molecular basis for monoclonal antibody bezlotoxumab's reduced neutralizing activity against Clostridioides difficile toxin B variants. The report identifies four major variants (TcdB1βTcdB4) that account for approximately 99.9% of circulating C. difficile strains globally and links susceptibility to a single amino acid substitution at position 2033. The dataset is a 76.3 KB PDF published under a CC-BY-4.0 license.
A single-case clinical report details a 58-year-old male with hepatocellular carcinoma and T11 vertebral metastasis causing paraplegia. The dataset includes temporal data on targeted immunotherapy, electroacupuncture rehabilitation, and serial cytokine measurements over a 28-month follow-up period. Author Xionghao Pang published the data on figshare under a CC-BY-4.0 license.