Sign in to view source links and access this dataset
Description
Aaliah Aly prepared Python pipeline input files from the TCGA Lower Grade Glioma database on figshare in May 2026. The 100 MB dataset contains selected clinical, sample-level, mutation, copy number alteration, and hypoxia-related source files from cBioPortal. These files are intended for downstream data cleaning, harmonization, validation, and database construction.
Use Cases
Clean and harmonize clinical and sample-level data based on the combined patient and sample information file.
Validate mutation and copy number alteration records based on the provided CNA and mutation data files.
Construct a relational database for glioma research based on the structured input files prepared for SQL table generation.
Analyze hypoxia-related clinical factors based on the supplemental hypoxia data file.
Strengths
Files are specifically prepared for a Python data processing workflow, suggesting a structured and reproducible pipeline.
Clinical and sample-level information is combined into a single input file to ensure consistent linking between records.
Source files are derived from the authoritative TCGA LGG dataset via cBioPortal.
Limitations
Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
The dataset does not represent the full original TCGA LGG download, containing only selected files for a specific project.
Provenance
Source
TCGA Lower Grade Glioma dataset via cBioPortal.
Collection Method
Selected source files were processed to create Python pipeline input files.
Freshness
Last updated 2026-05-07 02:15:03; freshness should be verified.
License is CC-BY-4.0. File formats are TXT and XLSX.