Mycobacterium Tuberculosis Respiratory Chain Mutations Linked to Drug Resistance
by Qiang Ji·Updated 18d ago
43.3 KB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
A genomic analysis of 13,402 Mycobacterium tuberculosis isolates, including 4,051 multidrug-resistant (MDR) and 1,044 single-drug resistant (SDR) strains, conducted by Qiang Ji and last updated in May 2026. The study identifies specific single nucleotide polymorphisms (SNPs) in respiratory chain genes associated with the phylogenetic clustering and development of MDR tuberculosis. It uses whole-genome sequencing data analyzed with random forest, gradient boosting decision tree, and generalized linear mixed models.
Use Cases
Identifying genetic markers for multidrug-resistant tuberculosis based on respiratory chain gene mutations.
Training predictive models for drug resistance classification using the identified SNPs.
Investigating the phylogenetic clustering of resistant strains based on whole-genome sequencing data.
Prioritizing potential therapeutic targets by analyzing mutations in genes like atpH, cydA, and qcrB.
Strengths
Analysis is based on a large sample of 13,402 Mycobacterium tuberculosis isolates.
Identifies specific SNPs in genes like atpH A428G and qcrB G1250T with statistical significance.
Data and findings are shared under a permissive CC-BY-4.0 license.
Limitations
The dataset is very small at 43.3 KB, suggesting it may contain summary results rather than raw sequence data.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment for direct machine learning application.
Provenance
Source
figshare, author Qiang Ji.
Collection Method
Whole-genome sequencing performed on clinical isolates, with analysis using random forest, gradient boosting decision tree, and generalized linear mixed models.
Freshness
Last updated 2026-05-20 04:38:18; freshness should be verified.
The primary data file is a DOCX document, which may contain formatted text and tables rather than a machine-readable data table.