Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
7 billion small molecules are represented in SMILES notation, accompanied by 28 billion molecular fingerprints including MACCS, ECFP4, FCFP4, and PubChem types. The collection includes pre-constructed USearch indexes for efficient similarity search. It is hosted on AWS Open Data and published by Ash Vardanian under an Apache-2.0 license.
Data is stored in S3 format; requires tools compatible with AWS S3 and cheminformatics libraries to process SMILES and fingerprints.