Loading...
Loading...
Available on 2 platforms
Sign in to view source links and access this dataset
6,004,131 chemical compounds from a February 2026 PubChem snapshot have been filtered using the Rule of Three (Ro3) for drug-like properties. Each entry includes a PubChem ID, canonical SMILES string, and 43 molecular descriptors calculated with the RDKit cheminformatics toolkit. The dataset represents a curated subset of the original 123 million PubChem compounds, focused on lead-like chemical space.
Licensed under CC-BY-4.0, requiring attribution to author Antonis Asiminas. Available in CSV and Parquet formats.