Sign in to view source links and access this dataset
Description
IEDB is a curated database of experimentally characterized immune epitopes, including B cell, T cell, and MHC-binding data across infectious, allergic, autoimmune, and transplant contexts. The dataset, published by LiteFold on Hugging Face, contains a total of 7,450,811 records, split into 6,705,467 training and 745,344 test rows. It was last updated on 2026-05-27.
Use Cases
Predict MHC-binding affinity based on curated epitope sequences.
Train models for B-cell or T-cell epitope classification across disease contexts.
Analyze immune epitope characteristics in infectious, allergic, or autoimmune diseases.
Benchmark epitope prediction algorithms using the deterministic, epitope-aware train-test split.
Strengths
Large scale with over 7.4 million total records.
Deterministic, epitope-aware split ensuring non-leakage between 6.7 million training and 745 thousand test records.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is known, but specific data features and file formats are unknown from the provided metadata.
Freshness should be verified as the last update timestamp is in the future (2026-05-27).
Provenance
Source
LiteFold on Hugging Face, sourcing from the Immune Epitope Database (IEDB).
Collection Method
Curated database export of experimental assays.
Freshness
Last updated 2026-05-27 13:01:05
The dataset uses an epitope-aware split (sha256(epitope_id) % 10 == 0 for test). Users should refer to the full description on the Hugging Face page for details on handling assays lacking an epitope ID.