Datasets used in developing and evaluating the ATOMICA model for learning universal representations of intermolecular interactions. The data was authored by Ada Fang and is hosted on Harvard Dataverse. The dataset was last updated on April 13, 2026.
Use Cases
- Train machine learning models for intermolecular interaction prediction based on the dataset's purpose.
- Benchmark new molecular representation methods based on the evaluation datasets.
- Develop universal representations of molecular interactions based on the model's training data.
Strengths
- Data is directly linked to a published preprint (biorxiv.org/10.1101/2025.04.02.646906).
- Dataset is hosted on the authoritative Harvard Dataverse platform.
- Last updated date is explicitly recorded (2026-04-13).
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- Harvard Dataverse
- Freshness
- Last updated 2026-04-13 01:05:44.