Oral Submucosal Fibrosis Multi-Omics Data for Biomarker Discovery via Machine Learning
by Chinmay Nitin Mokal·Updated 1mo ago
3.6 MB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
A multi-omics dataset for oral submucosal fibrosis (OSF) research, integrating RNA-seq, microarray, epigenomic, and single-cell RNA-seq data from the Gene Expression Omnibus. The dataset was created by Chinmay Nitin Mokal and last updated in May 2026. It contains results from differential expression analysis, immune cell infiltration, and machine learning models identifying 11 potential diagnostic biomarkers.
Use Cases
Developing diagnostic biomarkers for oral submucosal fibrosis based on the 11 identified gene signatures.
Training machine learning models for disease classification using the described differentially expressed genes (DEGs).
Investigating immune cell infiltration patterns in OSF based on CIBERSORTx analysis results.
Performing integrative transcriptomic and methylome analysis to identify therapeutic targets.
Strengths
Integrates data from 7 distinct sources, including 3 RNA-seq and 2 microarray datasets.
Machine learning models using top 5 features achieved reported AUROC and AUPRC scores of 0.99.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The primary data file is a 3.6 MB DOCX document, which may not be a standard structured data format.
Provenance
Source
Gene Expression Omnibus (GEO) database
Collection Method
Data downloaded from GEO and analyzed using DESeq2, CIBERSORTx, and machine learning models.
Time Range
null
Freshness
Last updated 2026-05-04 05:30:18; freshness should be verified.
Geography
null
The dataset is provided as a DOCX file; users may need to extract structured data from the document. All codes and ML models are provided in a linked GitHub repository.