Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
An unsupervised learning algorithm based on the Minimum Message Length principle was used to search for compressing substructures in around three million biologically relevant molecules. The discovered substructures contain most human-curated functional groups as well as novel larger patterns. The research was authored by Ruben Sharma and last updated in March 2026.
License is CC-BY-NC-4.0, which restricts commercial use.