run_dbcan: CAZyme and CGC Annotation Database for Microbiome Analysis
Available on 1 platform
Sign in to view source links and access this dataset
Description
A database for the run_dbcan tool annotating carbohydrate-active enzymes and gene clusters. The data includes annotations for CAZymes, Transporters, Transcription factors, and other functional proteins. It is hosted on AWS Open Data and was contributed by researchers Xinpeng Zhang, Haidong Yi, and Yanbin Yin.
Use Cases
Predict carbohydrate-active enzyme families based on CAZyme annotations.
Identify Polysaccharide Utilization Loci (PULs) for functional genomics studies.
Annotate auxiliary microbial proteins like Transporters and Peptidases.
Benchmark gene cluster prediction tools using the CGC annotation data.
Strengths
Data is hosted on AWS Open Data, facilitating cloud-based access and processing.
Covers multiple protein families including CAZymes, Transporters, and Transcription factors.
License has no restrictions on data use.
Limitations
Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count and dataset size are unknown, which may limit suitability assessment.
Provenance
Source
aws_open_data
Collection Method
Likely compiled for use with the run_dbcan bioinformatics tool.
Time Range
null
Freshness
Last update date is unknown; freshness unverified.
Geography
null
Data is stored in S3 format; requires AWS or compatible tools for access.