1,819 drug compounds refined from an initial set of 12,457 DrugBank entries using Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction. The dataset was created by Rui Zhang and last updated on 2026-05-22. It prioritizes molecular diversity by eliminating structurally redundant compounds.
Use Cases
- Similarity matching with FOLH1 ligands based on the dataset's stated purpose
- Training or benchmarking molecular property prediction models based on structural diversity
- Analyzing chemical space coverage for drug discovery based on the UMAP selection process
Strengths
- 1,819 compounds provide a focused set for analysis
- Selection from 12,457 initial entries suggests a rigorous filtering process
- Explicit use of UMAP dimensionality reduction for diversity selection
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Row count is unknown, which may limit suitability assessment
Provenance
- Source
- DrugBank database
- Collection Method
- Refined using UMAP dimensionality reduction to prioritize molecular diversity
- Time Range
- null
- Freshness
- Last updated 2026-05-22 17:42:53
- Geography
- null