Loading...
Loading...
DNA/RNA sequences, gene expression, protein structures, metagenomics, single-cell sequencing
23,758 datasets
A 2.7 MB dataset released by Jonathan Xu on April 15, 2026, containing the SpatialFinder framework. It is a human-in-the-loop vision-language model designed to predict gene expression heterogeneity and rank high-value regions of interest from H&E tissue slides. The framework was evaluated across four Visium HD tissue types.
SpatialFinder is a human-in-the-loop vision-language framework for ranking high-value regions of interest (ROIs) on H&E tissue slides for spatial transcriptomics. The dataset, authored by Jonathan Xu and last updated in April 2026, contains evaluation results from four Visium HD tissue types. The framework aims to make spatial transcriptomics more cost-effective by predicting gene expression heterogeneity from standard histology images.
Jonathan Xu's SpatialFinder framework combines a biomedical vision-language model with human-in-the-loop optimization to predict gene expression heterogeneity and rank high-value regions of interest (ROIs) across H&E tissue slides. The dataset, last updated in April 2026, contains results from evaluating the framework across four Visium HD tissue types, where it outperformed baseline models for ROI ranking. The framework aims to make spatial transcriptomics more cost-effective and clinically actionable.
SpatialFinder is a 3.5 MB dataset containing a framework that combines a biomedical vision-language model with a human-in-the-loop pipeline to predict gene expression heterogeneity from H&E tissue slides. It was authored by Jonathan Xu and last updated on April 15, 2026. The framework was evaluated across four Visium HD tissue types, achieving performance metrics like Spearman’s ρ up to 0.89.
A 113.8 KB PDF authored by Jonathan Xu and last updated on April 15, 2026, describes the SpatialFinder framework. This framework combines a biomedical vision-language model with a human-in-the-loop pipeline to predict gene expression heterogeneity from H&E tissue slides. It aims to make spatial transcriptomics more cost-effective by identifying smaller, high-value regions of interest for sequencing.
A retrospective time-lapse study of 3,103 transferred autologous blastocysts, authored by Emma Adolfsson and last updated in April 2026. It evaluates how early cleavage patterns, morula compaction behavior, and blastocyst quality influence clinical pregnancy and live birth rates. The analysis includes unadjusted and multivariable models adjusting for maternal age and blastocyst developmental day.
Listed Buildings in York is a geospatial dataset from the City of York Council, published via the Government Digital Service. The data is provided as a live API link to the council's GIS server, meaning changes to the master copy are reflected immediately. It is available in GEOJSON, KML, and CSV formats under an OGL-UK-3.0 license.
VOSviewer files, Draw.io diagrams, and Python scripts support the bibliometric and content analyses of a study on corporate external financing in emerging economies. The dataset includes original bibliographic records from Scopus and Web of Science, a merged dataset, and analysis results. Hector Julian Diaz Aránzazu uploaded these materials to Harvard Dataverse on June 10, 2026, to ensure research transparency and reproducibility.
A 2026 study isolates and characterizes a goose astrovirus (GAstV) strain, designated GAstV/HNJZ, from infected goslings in Henan Province, China. The complete genome is 7,183 nucleotides in length, sharing 99.6% nucleotide identity with a virulent strain from Anhui Province. Author Wang Dong provides this data under a CC-BY-4.0 license.
Table 1_Gestational and postpartum maternal consequences of gestational diabetes mellitus.docx contains results from a nationwide population-based study of 206,831 pregnant women in Taiwan. The study, authored by Chung-Kuan Wu and published on figshare under CC-BY-4.0, analyzes the association between gestational diabetes mellitus (GDM) and maternal health outcomes using logistic and Cox regression. It reports odds and hazard ratios for conditions like preterm labor, preeclampsia, type 2 diabetes, and chronic kidney disease.
Phillip Zhou's supplemental table provides functional data from a prime editing screen of 5,184 LDLR coding variants, linked to a 2025 preprint. The dataset includes experimentally derived functional scores, ACMG evidence scores, plasmid lists, and primer sequences. It is a 4.5 MB XLSX file shared under a CC-BY-4.0 license on figshare.
An integrated database assembled from 27 large-scale cancer proteomics sources by Yuanyu Huang, last updated in May 2026. It provides a unified catalog of 15,964 MS-supported human proteins harmonized to UniProtKB-Swiss-Prot. The resource spans human tumor cohorts, cancer cell lines, and patient-derived xenograft models.
15,964 human proteins detected by mass spectrometry across 27 cancer proteomics sources are unified in this resource. The MSCP database, created by Yuanyu Huang and updated in May 2026, harmonizes evidence from tumor cohorts, cell lines, and patient-derived xenografts to a UniProtKB reference. It identifies 525 proteins newly supported by MS evidence in a cancer context, enabling standardized comparisons for translational studies.
15,964 mass spectrometry-supported human proteins integrated from 27 large-scale cancer proteomics sources. The MSCP resource, created by Yuanyu Huang and updated in May 2026, harmonizes protein identifications to UniProtKB and benchmarks against reference proteomes. It identifies 525 proteins newly supported by MS evidence in a cancer context.
Mass Spectrometric Detected Cancer Proteins (MSCP) is an integrated database assembled from 27 large-scale cancer proteomics sources. The resource contains a unified catalog of 15,964 MS-supported human proteins, harmonized to UniProtKB-Swiss-Prot, and was created by Yuanyu Huang, last updated in May 2026.
15,964 MS-supported human proteins were harmonized from 27 large-scale cancer proteomics sources. The MSCP resource, created by Yuanyu Huang and updated in May 2026, integrates data from human tumor cohorts, cell lines, and patient-derived xenograft models. Benchmarking identified 525 proteins newly supported by mass spectrometry evidence in an integrated cancer context.
154,695 eligible activity windows from 196 subjects, covering 211.6 hours of real in-the-wild IMU data. The benchmark is built from real wearable IMU streams in the Nymeria dataset and provides unseen-subject and cross-device evaluation settings for wearable motion recognition. It was created by the CRUISEResearchGroup and last updated in May 2026.
Geoscience Australia Data provides a palynological report on the A.O.G. Wentworth No. 1 well in New South Wales. The report details the examination of cores and cuttings, indicating transitions from Tertiary to Lower Cretaceous and into Lower Permian sediments. It also compares the Oaklands-Coorabin coalfield coal measures to the Upper Coal Measures of the Sydney Basin.
Participatory workshops involving people with aphasia and clinicians generated principles for implementing psychological care in Ireland. The 9.5 KB XLS file, authored by Molly X. Manning and last updated in April 2026, contains meeting data from this engagement process. Findings offer early guidance for developing coordinated, interdisciplinary aphasia psychological care.
All-atom molecular dynamics simulations examine how amphipathic helix domains from the Cdc12 septin protein interact with lipid bilayers. The dataset, authored by S. Mahsa Mofidi and last updated in April 2026, contains results from simulations of single and paired peptide domains. Findings suggest a cooperative binding mechanism where linked domains induce lipid packing defects on planar membranes.