AlphaFoldDB provides over 246 million predicted protein 3D structures, massively expanding structural coverage for known protein sequences. The dataset, created by LiteFold, is split into deterministic train and test sets based on UniProt accession hashes. It was last updated on May 27, 2026.
Use Cases
- Train machine learning models for protein structure prediction based on the provided train-test split.
- Benchmark protein folding algorithms using the predicted structures and confidence scores.
- Analyze protein function and interactions based on predicted 3D structural data.
- Develop tools for structural biology and drug discovery using the expanded structural coverage.
Strengths
- Contains 246,689,516 predicted protein structures, offering massive scale.
- Provides confidence scores for each predicted structure, indicating reliability.
- Uses a deterministic split (hash-based) for consistent train/test separation.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- AlphaFoldDB, an open database of predicted protein structures.
- Collection Method
- Predictions generated by AlphaFold, a protein structure prediction AI system.
- Freshness
- Last updated 2026-05-27 13:02:12; freshness should be verified.