Sign in to view source links and access this dataset
Description
Over 230,000 experimentally-determined 3D structures of biological macromolecules, established in 1971. The Protein Data Bank archive contains atomic coordinates for proteins, nucleic acids, and their complexes, determined by methods including X-ray crystallography, cryo-EM, and NMR. The dataset is hosted by LiteFold on Hugging Face.
Use Cases
Training protein structure prediction models based on atomic coordinate data.
Analyzing macromolecular complexes and interactions based on 3D structural data.
Benchmarking computational methods for structural biology against experimental ground truth.
Studying the relationship between protein sequence and 3D structure.
Strengths
Contains over 230,000 entries, representing a large-scale global archive.
Includes structures determined by multiple experimental methods (X-ray crystallography, cryo-EM, NMR).
Covers a wide range of biological macromolecules (proteins, nucleic acids, complexes).
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count and specific file formats are unknown, which may limit suitability assessment.
Last updated 2026-05-27 12:40:27; freshness should be verified.
Provenance
Source
Protein Data Bank (PDB), via LiteFold on Hugging Face.
Collection Method
Experimental determination via X-ray crystallography, cryo-EM, NMR, micro-electron diffraction, and integrative methods.
Time Range
Archive established in 1971; entries span from 1971 to present.
Freshness
Last updated 2026-05-27 12:40:27.
Geography
Global archive.
License is unknown; users should verify terms of use for the PDB data and the Hugging Face distribution.