Sign in to view source links and access this dataset
Description
1,171,262 processed experimental molecular NMR records, including proton and carbon NMR data, were compiled by SpectrumWorld. The dataset is split into 937,022 training, 117,035 validation, and 117,205 test examples. It contains SMILES strings, molecular formulas, and atom counts for each entry.
Use Cases
Train machine learning models to predict NMR spectra based on SMILES strings and molecular formulas.
Validate computational chemistry methods by comparing predicted NMR data to the provided experimental records.
Develop models for molecular structure elucidation using the combined proton and carbon NMR data.
Benchmark algorithms for chemical data processing and tokenization using the provided splits.
Strengths
Contains over 1.17 million individual molecular records.
Includes a predefined split of 937,022 training, 117,035 validation, and 117,205 test examples.
Provides both proton NMR and carbon NMR data for each molecule.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
SpectrumWorld on Hugging Face.
Collection Method
Processed experimental molecular NMR records.
Freshness
Last updated 2026-06-05 07:24:41; freshness should be verified.
License is unknown; terms of use must be verified before application.