Name: SA-MolNMR-SI-240K: Molecular Structures with NMR Spectral Data
Creator: SpectrumWorld
Published: 2026-06-05T07:05:43
Keywords: Spectral Data, SMILES, Tabular, Chemistry, Molecular Structures, Nmr Spectroscopy

Description

212,440 examples of molecular structures paired with NMR spectral information, split into training, validation, and test sets. The dataset includes SMILES strings, molecular formulas, atom counts, and tokenized NMR data for both proton and carbon NMR. It was created by SpectrumWorld and last updated on Hugging Face in June 2026.

Use Cases

Train machine learning models to predict NMR spectra based on molecular structure features like SMILES strings.
Validate computational chemistry models for spectral assignment using the provided proton and carbon NMR data.
Develop multi-task learning models that jointly predict molecular formula and spectral properties from structural inputs.

Strengths

Contains 212,440 total examples, providing a substantial corpus for model training.
Explicitly split into 169,863 training, 21,279 validation, and 21,298 test examples, facilitating machine learning workflows.
Includes multiple molecular representations such as SMILES strings, formulas, and atom counts.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: SpectrumWorld on Hugging Face.
Collection Method: Likely compiled from computational chemistry simulations or public spectral databases, but the exact gathering method is not specified.
Freshness: Last updated 2026-06-05 07:06:53; freshness should be verified.

License information is unknown, which may restrict commercial or redistribution use.

Tabular Spectral Data SMILES Chemistry Molecular Structures Nmr Spectroscopy

SA-MolNMR-SI-240K: Molecular Structures with NMR Spectral Data

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info