Name: TEDBench: Large-Scale Protein Fold Classification Benchmark with 965 Topology Classes
Creator: TEDBench
Published: 2026-05-13T09:16:19
Keywords: Cath Topology, Machine Learning Benchmark, Protein Fold Classification, Structural Bioinformatics, Alphafold Database, Benchmark, Tabular, Large Scale

Description

369,740 protein structures for training, with 46,217 for validation and 46,218 for testing, form this benchmark for protein fold classification. The dataset, named TEDBench, is built from TED annotations projected onto the Foldseek-clustered AlphaFold Database. It was presented in the paper 'Protein Fold Classification at Scale: Benchmarking and Pretraining'.

Use Cases

Benchmarking protein fold classification models based on the 965 CATH topology classes.
Pretraining models for structural bioinformatics using the large-scale, non-redundant set of protein structures.
Analyzing the distribution and characteristics of rare protein topologies mentioned in the description.

Strengths

Large scale with 369,740 training structures, 46,217 validation structures, and 46,218 test structures.
Covers 965 distinct CATH topology (T-level) classes, including rare topologies.
Designed as a non-redundant benchmark, which may reduce data bias.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Last updated 2026-05-20 06:39:14; freshness should be verified.

Provenance

Source: TEDBench, built from Encyclopedia of Domains (TED) annotations and the AlphaFold Database.
Collection Method: Annotations were projected onto the Foldseek-clustered AlphaFold Database.
Freshness: Last updated 2026-05-20 06:39:14.

License is unknown; users should verify terms before use.

Tabular Cath Topology Machine Learning Benchmark Protein Fold Classification Structural Bioinformatics Alphafold Database Benchmark Large Scale

TEDBench: Large-Scale Protein Fold Classification Benchmark with 965 Topology Classes

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info