Sign in to view source links and access this dataset
Description
533,595 chemical compounds with experimental NMR (Nuclear Magnetic Resonance) peak sequences, including both H-NMR and C-NMR data. The dataset, created by SpectrumWorld, provides SMILES representations and molecular formulas for each compound. It was last updated on December 8, 2025.
Use Cases
Train machine learning models to predict NMR spectra based on SMILES representations.
Validate computational chemistry methods by comparing predicted spectra to experimental peak sequences.
Analyze correlations between molecular structure (via SMILES) and observed NMR chemical shifts.
Build searchable databases for matching unknown experimental spectra against known compounds.
Strengths
Contains 533,595 compounds, which suggests a substantial collection for analysis.
Includes both H-NMR and C-NMR experimental peak data, providing complementary structural information.
Pairs spectral data with SMILES strings and molecular formulas, enabling multi-modal analysis.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect source bias inherent to the OpenDataLab experimental spectra database.
Provenance
Source
OpenDataLab experimental spectra database
Collection Method
Extracted experimental NMR peak sequences
Freshness
Last updated 2025-12-08 09:38:15
License is unknown; terms of use must be verified before application.