DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

CATH: Protein Domain Classification Database | DataSalon

Home Computer VisionCATH: Protein Domain Classification Database

Computer Vision

CATH: Protein Domain Classification Database

Name: CATH: Protein Domain Classification Database
Creator: LiteFold
Published: 2026-05-13T08:13:42
Keywords: Bioinformatics, Protein Structure, Tabular, Domain Classification, Cath Database

by LiteFold·Updated 26d ago

Available on 1 platform

Description

601,328 protein domain structures hierarchically classified by class, architecture, topology, and homologous superfamily. The dataset, created by LiteFold, includes a deterministic, S35-cluster-aware split with 541,123 rows for training and 60,205 for testing. This split ensures domains from the same homologous superfamily and S35 cluster are kept together.

Use Cases

Training protein structure prediction models based on hierarchical domain classification.
Benchmarking clustering algorithms based on the S35-cluster-aware data splits.
Studying protein domain evolution and relationships based on homologous superfamily labels.

Strengths

601,328 total classified protein domains provides a substantial scale.
The deterministic, S35-cluster-aware split with 541,123 training and 60,205 test rows helps prevent data leakage.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count for the full dataset is unknown, which may limit suitability assessment.

Provenance

Source: CATH hierarchical classification database.
Freshness: Last updated 2026-05-27 13:02:55; freshness should be verified.

Tabular Bioinformatics Protein Structure Domain Classification Cath Database

Related Datasets

Quality Score

D37

Description

Source

Reputation

Quality Score

D37

Description

Source

Reputation

Access

Community

74 downloads

1 likes

0 views

Dataset Info

Author: LiteFold
Created: May 13, 2026
Updated: May 27, 2026
Last synced: Jun 3, 2026

Access

Community

74 downloads

1 likes

0 views

Dataset Info

Author: LiteFold
Created: May 13, 2026
Updated: May 27, 2026
Last synced: Jun 3, 2026

CATH: Protein Domain Classification Database

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info