Designed for fine-tuning language models on protein-ligand binding affinity and contact prediction. It contains molecular data tagged with categories such as Molecules, SMILES, and Chemistry. The dataset was authored by jglaser and last updated in May 2022.
Use Cases
- Fine-tune a language model on SMILES string representations to predict protein-ligand binding affinity.
- Train a model for contact prediction between proteins and ligands using molecular data tagged with Chemistry labels.
- Analyze binding patterns across the Molecules category to identify common structural features.
Strengths
- Dataset is tagged with specific, relevant categories including Molecules, SMILES, and Chemistry.
- Last update recorded on May 14, 2022, providing a snapshot of the data at that time.
Limitations
- Key metadata such as row count, column names, and file formats are unknown, limiting usability assessment.
- The dataset's age may result in temporal staleness for rapidly evolving fields like computational chemistry.
Provenance
- Source
- huggingface
- Collection Method
- null
- Time Range
- null
- Freshness
- Last updated 2022-05-14.
- Geography
- null