Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Four benchmark datasets contain images of chemical Markush structures from patents and their corresponding CXSMILES string representations. The largest subset, 'uspto-mol-m-54k-new', includes 54,785 training samples. The datasets were created by docling-project and were last updated in March 2026.
License information is not provided; users should verify terms of use. The full dataset description is hosted externally on Hugging Face.