Hepatocellular Carcinoma Transcriptomic and Clinical Data for Prognostic Modeling
by Songquan Huang·Updated 2mo ago
336.2 MB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
Transcriptomics data for hepatocellular carcinoma (HCC) patients from the TCGA-LIHC cohort, consisting of 424 samples including 374 tumor and 50 normal tissue samples. The dataset includes prognostic information, somatic mutation data, tumor mutation burden, and clinical variables such as age, gender, TNM stage, and overall survival. An independent validation cohort of 243 HCC samples from the ICGC-LIRI-JP database is also provided.
Use Cases
Predict overall survival (OS) time and survival status from clinical features like age, gender, and TNM stage using Cox regression models.
Identify gene expression signatures correlated with tumor grade and lymph node metastasis status from transcriptomic data.
Analyze the relationship between tumor mutation load and patient prognosis across the 365 cases with grouping information.
Validate prognostic models built on the TCGA cohort using the external ICGC-LIRI-JP cohort of 243 samples.
Investigate associations between somatic mutation data and specific clinical outcomes or transcriptomic patterns.
Strengths
Includes 424 samples from TCGA (374 tumor, 50 normal) providing a solid sample size for primary analysis.
Contains multiple integrated data types: transcriptomics, somatic mutations, tumor mutation burden, and detailed clinical data.
Provides an independent external validation cohort of 243 HCC samples from ICGC for model robustness testing.
Limitations
The validation cohort from ICGC has only 24 cases with complete subgroup and survival information, limiting statistical power for external validation.
Clinical data granularity may be limited, as specific details on treatments or comorbidities are not mentioned.
Potential geographic bias as the validation cohort (ICGC-LIRI-JP) is exclusively from a Japanese population.
Provenance
Source
Primary data from The Cancer Genome Atlas (TCGA) LIHC project and the International Cancer Genome Consortium (ICGC) LIRI-JP cohort.
Collection Method
Data downloaded from official portals (portal.gdc.cancer.gov, dcc.icgc.org); transcriptomics generated via high-throughput sequencing.
Freshness
Data was last updated on the platform in April 2026, though the underlying TCGA and ICGC data collection periods are not specified.
Geography
TCGA data is multinational; ICGC validation cohort is from Japan (ICGC-LIRI-JP).
Data is packaged in a ZIP archive; specific file formats and structures within are not described. License is CC-BY-4.0, requiring attribution.