Loading...
Loading...
Text classification, translation, QA, summarization, dialogue, sentiment analysis, language modeling, text corpora
41,487 datasets
Statistics Canada provides a quarterly survey measuring the percentage of purchases made directly from U.S. suppliers by Canadian businesses. The data is broken down by NAICS industry classification, business employment size, type of business, activity, and majority ownership for the second quarter of 2026. It is available in XML, CSV, and HTML formats under the OGL-CA-2.0 license.
A prospective cohort study by Xue Wei, published on figshare, followed 231 singleton pregnant women to investigate early predictors of gestational diabetes mellitus (GDM). The dataset likely contains measurements of serum agouti signalling protein (ASIP), the triglyceride-glucose (TyG) index, and routine metabolic parameters taken during the first (8–12 weeks) and second (24–28 weeks) trimesters. The study found that elevated first-trimester ASIP and TyG index were independent risk factors for GDM, with their combination showing predictive value.
1497 health assessments were conducted for people seeking asylum in North-Central London from July 2021 to March 2023. The data, published by Paola Cinardo on figshare, includes clinical findings and interview data, showing high rates of physical and mental health needs. 83.2% of attendees had at least one identified health need.
Senior management expense reports from The City of Calgary, released twice per year in spring and fall. The data includes line item details for the City Manager, general managers, and directors. Budgets for these positions are reviewed and approved annually by City Council.
Geoscience Australia and the Australian Ocean Data Network provide a regional synthesis of inter-reefal seabed environments in the Great Barrier Reef Marine Park. The dataset integrates over 3,000 sediment samples from the MARS database with geomorphic feature data, offering the first such synthesis since the 1980s. It reveals regional trends and local-scale characteristics in sediment distribution, including gravel, sand, and mud concentrations across the shelf.
115.6 MB of standardized experimental data generated by Benhui Pang for a study on interface-engineered glass fibers. The dataset includes raw and processed data for index properties, compaction behavior, tensile performance, mix optimization, strength tests, durability assessments, and microstructural analyses. It was last updated on 2026-05-21 to support transparency and reproducibility.
8.3 MB of model input data, simulation outputs, and Python code supporting a study on land-use change and terrestrial carbon stocks across Belt and Road countries. The dataset covers a historical period from 2000 to 2022 and includes scenario simulations for future policy pathways. It was authored by Lulu Qu and last updated on 2026-05-21.
A 23.5 GB dataset for visual teach-and-repeat (VTR) navigation designed to operate robustly in environments with variable or low light levels. The data, authored by Fuhai Ling and last updated in May 2026, supports a framework integrating deep-learned descriptors, stereo imaging, and event-based cameras. Experiments demonstrate the system's performance in night-time urban environments for both indoor and outdoor navigation.
Polygon extents represent bathymetry compilation products delivered by Geoscience Australia as of June 2019. The compilations were generated from numerous data sources including survey data, lidar, and interpolation. Each polygon's attributes contain information regarding data sources, product details, and access methods.
307,629 high-quality somatic mutations were identified in litchi embryogenic callus treated with pingyangmycin. The dataset, authored by Guo Wang and last updated in May 2026, contains whole-genome resequencing results from treated callus and 40 regenerated mutant lines, reporting mutation frequencies of 1.8×10⁻⁴ and 1.4×10⁻⁴ per site.
A research document details the establishment of a pingyangmycin-induced mutagenesis system for litchi using in vitro-cultured embryogenic callus. The study identified 307,629 high-quality somatic mutations in treated callus and over 1.2 million variants in regenerated mutant lines, with mutation frequencies exceeding typical EMS-induced rates. The document was authored by Guo Wang and last updated on 2026-05-08.
A research dataset from a prospective cohort study using UK Biobank data. It examines the relationship between frailty status, its longitudinal changes, and the incident risk of degenerative bone and joint diseases and their multimorbidity. The study was authored by Minghao Jin and the dataset was last updated in May 2026.
A 12-year prospective cohort study of 2,370 Japanese patients with nonalcoholic fatty liver disease (NAFLD) evaluates the predictive performance of twelve metabolic composite indices for incident type 2 diabetes mellitus. The dataset, authored by Nan’nan Chen and last updated in May 2026, likely contains patient-level data used to calculate indices like TyG-WC, TyG-WHtR, and VAI, and their association with diabetes onset via Cox models and ROC analysis.
A 12-year prospective cohort study of 2,370 Japanese patients with nonalcoholic fatty liver disease (NAFLD) evaluates twelve metabolic composite indices for predicting incident type 2 diabetes. The triglyceride–glucose–waist–height ratio (TyG-WHtR) demonstrated the highest predictive accuracy with an AUC of 0.680 and an optimal cut-off of 4.54. Authored by Nan’nan Chen and shared under CC-BY-4.0, this research dataset was last updated on May 1, 2026.
A 12-year prospective cohort study of 2,370 Japanese patients with nonalcoholic fatty liver disease (NAFLD) evaluates the predictive ability of twelve metabolic composite indices for incident type 2 diabetes mellitus. The dataset, authored by Nan’nan Chen and last updated in 2026, likely contains patient-level clinical and outcome data used to calculate indices like TyG-WC, TyG-WHtR, and VAI. Results indicate the TyG-WHtR index had the highest predictive accuracy for diabetes onset in this population.
A secondary analysis of 2,370 NAFLD patients from a prospective Japanese cohort study evaluates the predictive ability of twelve metabolic composite indices for incident type 2 diabetes over a 12-year follow-up. The dataset, authored by Nan’nan Chen and shared under a CC-BY-4.0 license, includes hazard ratios and AUC values for indices like TyG-WC, TyG-WHtR, and VAI. It was last updated on 2026-05-01.
Nan’nan Chen's research dataset contains results from a 12-year prospective cohort study of 2,370 Japanese patients with nonalcoholic fatty liver disease (NAFLD). It compares the predictive ability of twelve metabolic composite indices for the onset of type 2 diabetes mellitus (T2DM). The dataset was last updated on 2026-05-01.
A Japanese prospective cohort of 2,370 patients with nonalcoholic fatty liver disease (NAFLD) was used to evaluate twelve metabolic composite indices for predicting incident type 2 diabetes mellitus (T2DM). The study, authored by Nan’nan Chen, was last updated on May 1, 2026. It found the triglyceride–glucose–waist–height ratio (TyG-WHtR) had the highest predictive accuracy with an AUC of 0.680.
Colombian municipal and district-level financial transfers from the General System for poor, uninsured populations from 2015 to 2021. The dataset includes columns for payer, payment orders, identification numbers, payment dates, concepts, funding sources, and transferred values. It is hosted by the Colombian government's open data portal, datos.gov.co, and was last updated in May 2026.
Supplementary materials for a pilot evaluation of 17 open-weight large language models screening RNA-seq metadata. The dataset includes performance metrics like AUPRC and F1 scores, runtime distributions, and reproducibility data across 150 projects per model. Mitsuo Shintani authored this CC-BY-4.0 licensed dataset, last updated in May 2026.