A dataset for fine-tuning language models on protein-ligand binding affinity prediction. It is associated with tags for molecules, SMILES strings, and chemistry, indicating a focus on molecular data. The dataset was last updated in March 2022.
Use Cases
- Fine-tune a language model on SMILES string representations to predict binding affinity scores.
- Train a molecular property prediction model using features from the Chemistry and Molecules tags.
- Analyze relationships between molecular structures and binding affinity for drug discovery applications.
Strengths
- Dataset is tagged for molecular data including SMILES, a standard chemical notation.
- Dataset was last updated on March 12, 2022, providing a historical snapshot.
Limitations
- The dataset's size, row count, and specific column structure are unknown, limiting assessment of scope.
- The data may be temporally stale for rapidly evolving computational chemistry research.
Provenance
- Source
- huggingface, author jglaser
- Collection Method
- null
- Time Range
- null
- Freshness
- Last updated 2022-03-12.
- Geography
- null