Name: Telomere-to-Telomere Genome Assembly of Wild Soybean Glycine Soja HAAS216
Creator: Muhammad Asad
Published: 2026-04-14T07:21:38
License: CC-BY-4.0
Keywords: Genome Assembly, Soybean, Bioinformatics, Healthcare, Text, Genomics, Plant Genetics

Description

Muhammad Asad's dataset provides a high-quality telomere-to-telomere genome assembly and annotation for the wild soybean Glycine soja accession HAAS216. The assembly spans 1,018.17 Mb, integrates PacBio HiFi, ONT ultra-long reads, and Hi-C data, and contains 48,390 predicted protein-coding genes. It was last updated on April 14, 2026.

Use Cases

Comparative genomics between wild and cultivated soybean based on the high-quality assembly.
Structural variation detection based on the resolved repetitive and complex genomic regions.
Identification of genomic regions associated with stress tolerance and agronomic traits based on the functional gene annotations.

Strengths

Assembly spans 1,018.17 Mb with 55.84% repetitive sequences resolved.
Contains 48,390 predicted protein-coding genes, with 98.30% functionally annotated.
Integrates multiple high-fidelity data sources (PacBio HiFi, ONT ultra-long reads, Hi-C) for chromosome-scale scaffolding.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: Muhammad Asad via figshare.
Collection Method: Assembly integrates PacBio HiFi long reads, Oxford Nanopore Technologies ultra-long reads, and Hi-C data.
Time Range: null
Freshness: Last updated 2026-04-14 08:29:24; freshness should be verified.
Geography: null

Data is provided in GFF3 and FA file formats, which require specific bioinformatics tools for analysis.

Text Genome Assembly Soybean Bioinformatics Healthcare Genomics Plant Genetics

Telomere-to-Telomere Genome Assembly of Wild Soybean Glycine Soja HAAS216

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info