Sign in to view source links and access this dataset
Description
Processed small molecule datasets contain chemical structures represented as SMILES strings. The data is derived from sources including CMAP, Tahoe, and CIGS, and has been standardized into a consistent CSV format. Author Zhenghang04 uploaded it to Hugging Face, with a last recorded update in April 2026.
Use Cases
Train molecular property prediction models based on SMILES string representations.
Develop generative models for novel small molecule design based on standardized chemical structure data.
Benchmark computational chemistry algorithms based on processed datasets from multiple sources.
Strengths
Data is processed and standardized into a consistent CSV format, as stated in the description.
Derived from multiple named sources (CMAP, Tahoe, CIGS), suggesting a degree of curation.
Last updated date is recorded as 2026-04-19 10:43:07.
Limitations
Row count, file size, and column definitions are unknown, which limits suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
License information is unknown, which may restrict usage.
Provenance
Source
Derived from CMAP, Tahoe, and CIGS sources.
Collection Method
Processed and standardized by author Zhenghang04.
Time Range
null
Freshness
Last updated 2026-04-19 10:43:07; freshness should be verified.
Geography
null
License restrictions are unknown and must be verified before use.