DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Genomics & Bioinformatics Datasets | DataSalon

All Categories

🧬

Genomics & Bioinformatics

DNA/RNA sequences, gene expression, protein structures, metagenomics, single-cell sequencing

23,758 datasets

Data Sheet 1_SpatialFinder: a human-in-the-loop vision-language framework for prioritizing

A 2.7 MB dataset released by Jonathan Xu on April 15, 2026, containing the SpatialFinder framework. It is a human-in-the-loop vision-language model designed to predict gene expression heterogeneity and rank high-value regions of interest from H&E tissue slides. The framework was evaluated across four Visium HD tissue types.

MultimodalZIPVision Language ModelHuman In The LoopBioinformaticsComputer VisionSpatial TranscriptomicsHistopathology+1

0 views

Genomics & Bioinformatics

Data Sheet 2_SpatialFinder: a human-in-the-loop vision-language framework for prioritizing

SpatialFinder is a human-in-the-loop vision-language framework for ranking high-value regions of interest (ROIs) on H&E tissue slides for spatial transcriptomics. The dataset, authored by Jonathan Xu and last updated in April 2026, contains evaluation results from four Visium HD tissue types. The framework aims to make spatial transcriptomics more cost-effective by predicting gene expression heterogeneity from standard histology images.

MultimodalZIPVision Language ModelHuman In The LoopBiomedical ImagingComputer VisionGenomicsSpatial Transcriptomics+1

0 views

Genomics & Bioinformatics

Data Sheet 4_SpatialFinder: a human-in-the-loop vision-language framework for prioritizing

Jonathan Xu's SpatialFinder framework combines a biomedical vision-language model with human-in-the-loop optimization to predict gene expression heterogeneity and rank high-value regions of interest (ROIs) across H&E tissue slides. The dataset, last updated in April 2026, contains results from evaluating the framework across four Visium HD tissue types, where it outperformed baseline models for ROI ranking. The framework aims to make spatial transcriptomics more cost-effective and clinically actionable.

MultimodalZIPVision Language ModelHuman In The LoopBioinformaticsComputer VisionSpatial TranscriptomicsTissue Analysis+1

0 views

Genomics & Bioinformatics

Data Sheet 3_SpatialFinder: a human-in-the-loop vision-language framework for prioritizing

SpatialFinder is a 3.5 MB dataset containing a framework that combines a biomedical vision-language model with a human-in-the-loop pipeline to predict gene expression heterogeneity from H&E tissue slides. It was authored by Jonathan Xu and last updated on April 15, 2026. The framework was evaluated across four Visium HD tissue types, achieving performance metrics like Spearman’s ρ up to 0.89.

ImageMultimodalZIPGene ExpressionVision Language ModelHuman In The LoopBiomedical ImagingComputer VisionSpatial Transcriptomics+1

0 views

Genomics & Bioinformatics

SpatialFinder: Framework for Prioritizing High-Value Regions in Spatial Transcriptomics

A 113.8 KB PDF authored by Jonathan Xu and last updated on April 15, 2026, describes the SpatialFinder framework. This framework combines a biomedical vision-language model with a human-in-the-loop pipeline to predict gene expression heterogeneity from H&E tissue slides. It aims to make spatial transcriptomics more cost-effective by identifying smaller, high-value regions of interest for sequencing.

MultimodalVision Language ModelHuman In The LoopBioinformaticsComputer VisionSpatial TranscriptomicsHistopathology+1

0 views

Genomics & Bioinformatics

Embryo Development Patterns and Live Birth Outcomes from 3,103 Blastocyst Transfers

A retrospective time-lapse study of 3,103 transferred autologous blastocysts, authored by Emma Adolfsson and last updated in April 2026. It evaluates how early cleavage patterns, morula compaction behavior, and blastocyst quality influence clinical pregnancy and live birth rates. The analysis includes unadjusted and multivariable models adjusting for maternal age and blastocyst developmental day.

TabularTime Lapse ImagingHealthcareIvf OutcomesEmbryologyClinical Research+1

0 views

Genomics & Bioinformatics

Listed Buildings in York, UK, with Live GIS Updates

Listed Buildings in York is a geospatial dataset from the City of York Council, published via the Government Digital Service. The data is provided as a live API link to the council's GIS server, meaning changes to the master copy are reflected immediately. It is available in GEOJSON, KML, and CSV formats under an OGL-UK-3.0 license.

GeospatialCSVListed BuildingsHeritageUrban Planning+1

0 views

Genomics & Bioinformatics

Influence of Behavioral, Technological, and Institutional Factors Involved in External Fin

VOSviewer files, Draw.io diagrams, and Python scripts support the bibliometric and content analyses of a study on corporate external financing in emerging economies. The dataset includes original bibliographic records from Scopus and Web of Science, a merged dataset, and analysis results. Hector Julian Diaz Aránzazu uploaded these materials to Harvard Dataverse on June 10, 2026, to ensure research transparency and reproducibility.

TextTabularContent AnalysisResearch TransparencyEmerging EconomiesExternal FinancingBibliometric AnalysisSynthetic+1

0 views

Genomics & Bioinformatics

Genomic Characterization of a Goose Astrovirus Strain from Henan Province, China

A 2026 study isolates and characterizes a goose astrovirus (GAstV) strain, designated GAstV/HNJZ, from infected goslings in Henan Province, China. The complete genome is 7,183 nucleotides in length, sharing 99.6% nucleotide identity with a virulent strain from Anhui Province. Author Wang Dong provides this data under a CC-BY-4.0 license.

TextVirologyHealthcareFinanceLarge ScaleAvian diseasePathogen CharacterizationGenomic Analysis+1

0 views

Genomics & Bioinformatics

Gestational Diabetes Maternal Consequences in Taiwan, 2004-2015

Table 1_Gestational and postpartum maternal consequences of gestational diabetes mellitus.docx contains results from a nationwide population-based study of 206,831 pregnant women in Taiwan. The study, authored by Chung-Kuan Wu and published on figshare under CC-BY-4.0, analyzes the association between gestational diabetes mellitus (GDM) and maternal health outcomes using logistic and Cox regression. It reports odds and hazard ratios for conditions like preterm labor, preeclampsia, type 2 diabetes, and chronic kidney disease.

TabularEpidemiologyMaternal HealthGestational DiabetesHealthcareTaiwan Health Data+1

0 views

Genomics & Bioinformatics

LDLR Variant Functional Scores and ACMG Evidence from Prime Editing Screening

Phillip Zhou's supplemental table provides functional data from a prime editing screen of 5,184 LDLR coding variants, linked to a 2025 preprint. The dataset includes experimentally derived functional scores, ACMG evidence scores, plasmid lists, and primer sequences. It is a 4.5 MB XLSX file shared under a CC-BY-4.0 license on figshare.

TabularExcelFunctional GenomicsVariant ClassificationLdlr GeneHealthcarePrime Editing+1

0 views

Genomics & Bioinformatics

MSCP: Integrated Catalog of 15,964 Mass Spectrometry-Detected Cancer Proteins

An integrated database assembled from 27 large-scale cancer proteomics sources by Yuanyu Huang, last updated in May 2026. It provides a unified catalog of 15,964 MS-supported human proteins harmonized to UniProtKB-Swiss-Prot. The resource spans human tumor cohorts, cancer cell lines, and patient-derived xenograft models.

TabularExcelMass SpectrometryHealthcareBioinformatics ResourceCancer ProteomicsLarge ScaleProtein DatabaseSynthetic+1

0 views

Genomics & Bioinformatics

MSCP: Integrated Catalog of 15,964 Mass Spectrometry-Detected Cancer Proteins

15,964 human proteins detected by mass spectrometry across 27 cancer proteomics sources are unified in this resource. The MSCP database, created by Yuanyu Huang and updated in May 2026, harmonizes evidence from tumor cohorts, cell lines, and patient-derived xenografts to a UniProtKB reference. It identifies 525 proteins newly supported by MS evidence in a cancer context, enabling standardized comparisons for translational studies.

TabularExcelBiomedical ResearchMass SpectrometryHealthcareCancer ProteomicsLarge ScaleProtein DatabaseSynthetic+1

0 views

Genomics & Bioinformatics

MSCP: Integrated Cancer Proteomics Database from 27 Sources

15,964 mass spectrometry-supported human proteins integrated from 27 large-scale cancer proteomics sources. The MSCP resource, created by Yuanyu Huang and updated in May 2026, harmonizes protein identifications to UniProtKB and benchmarks against reference proteomes. It identifies 525 proteins newly supported by MS evidence in a cancer context.

TabularExcelMass SpectrometryHealthcareBioinformatics ResourceCancer ProteomicsLarge ScaleProtein DatabaseSynthetic+1

0 views

Genomics & Bioinformatics

MSCP: Mass Spectrometry-Detected Cancer Proteins from 27 Proteomics Sources

Mass Spectrometric Detected Cancer Proteins (MSCP) is an integrated database assembled from 27 large-scale cancer proteomics sources. The resource contains a unified catalog of 15,964 MS-supported human proteins, harmonized to UniProtKB-Swiss-Prot, and was created by Yuanyu Huang, last updated in May 2026.

TabularExcelMass SpectrometryHealthcareBioinformatics ResourceCancer ProteomicsLarge ScaleProtein DatabaseSynthetic+1

0 views

Genomics & Bioinformatics

MSCP: Mass Spectrometry-Detected Cancer Proteins from 27 Proteomics Sources

15,964 MS-supported human proteins were harmonized from 27 large-scale cancer proteomics sources. The MSCP resource, created by Yuanyu Huang and updated in May 2026, integrates data from human tumor cohorts, cell lines, and patient-derived xenograft models. Benchmarking identified 525 proteins newly supported by mass spectrometry evidence in an integrated cancer context.

TabularExcelBiomedical ResearchMass SpectrometryHealthcareCancer ProteomicsLarge ScaleProtein DatabaseSynthetic+1

0 views

Genomics & Bioinformatics

AnyMo Bench: A Fine-Grained In-the-Wild Human Activity Recognition Benchmark

154,695 eligible activity windows from 196 subjects, covering 211.6 hours of real in-the-wild IMU data. The benchmark is built from real wearable IMU streams in the Nymeria dataset and provides unseen-subject and cross-device evaluation settings for wearable motion recognition. It was created by the CRUISEResearchGroup and last updated in May 2026.

Time SeriesWearable SensorsImu DataBenchmarkHuman Activity Recognition+1

0 views

Genomics & Bioinformatics

A.O.G. Wentworth No. 1: Palynological Report on Permian Sediments in the Murray Basin

Geoscience Australia Data provides a palynological report on the A.O.G. Wentworth No. 1 well in New South Wales. The report details the examination of cores and cuttings, indicating transitions from Tertiary to Lower Cretaceous and into Lower Permian sediments. It also compares the Oaklands-Coorabin coalfield coal measures to the Upper Coal Measures of the Sydney Basin.

Text🇦🇺 AustraliaGeologyPalynologyPermianSediment Analysis+1

0 views

Genomics & Bioinformatics

Sequence and Focus of Meetings for Aphasia Psychological Care Implementation

Participatory workshops involving people with aphasia and clinicians generated principles for implementing psychological care in Ireland. The 9.5 KB XLS file, authored by Molly X. Manning and last updated in April 2026, contains meeting data from this engagement process. Findings offer early guidance for developing coordinated, interdisciplinary aphasia psychological care.

TabularExcelMental HealthClinical WorkshopsAphasiaHealthcareComputer VisionParticipatory ResearchSynthetic+1

0 views

Genomics & Bioinformatics

Molecular Dynamics Simulation Results for Septin Protein Membrane Binding

All-atom molecular dynamics simulations examine how amphipathic helix domains from the Cdc12 septin protein interact with lipid bilayers. The dataset, authored by S. Mahsa Mofidi and last updated in April 2026, contains results from simulations of single and paired peptide domains. Findings suggest a cooperative binding mechanism where linked domains induce lipid packing defects on planar membranes.

TabularExcelMolecular DynamicsProtein Membrane InteractionSeptin ProteinsLipid Bilayers+1

0 views

PreviousPage 259 of 1188Next