DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

SMILES-IUPAC(PubChem): 1 Million Chemical Compound Identifiers | DataSalon

Home ChemistrySMILES-IUPAC(PubChem): 1 Million Chemical Compound Identifiers

Chemistry

SMILES-IUPAC(PubChem): 1 Million Chemical Compound Identifiers

Available on 1 platform

Description

smiles-iupac(PubChem)-1-million is a dataset from Kaggle. The title suggests it contains one million chemical compounds, likely linking SMILES strings with IUPAC names sourced from the PubChem database. The specific columns, file format, and other metadata are not provided.

Use Cases

Train a model to translate between SMILES and IUPAC chemical notations (inferred from domain, verify after download)
Build a molecular embedding model using large-scale identifier data (inferred from domain, verify after download)
Validate chemical name standardization pipelines (inferred from domain, verify after download)

Strengths

Published on Kaggle, a major platform for data science.
Platform tags indicate a focus on large-scale chemistry data.

Limitations

Metadata is minimal; actual content requires verification after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unconfirmed; the '1-million' scale is inferred from the title.

Provenance

Source: PubChem
Collection Method: Likely extracted or aggregated from the PubChem database.
Time Range: null
Freshness: Last update date is unknown; freshness unverified.
Geography: null

null

Tabular Pubchem SMILES Iupac Chemistry Molecular Structures Large Scale

Related Datasets

Quality Score

D17

Description

Source

Reputation

Quality Score

D17

Description

Source

Reputation

Access

Community

0 views

Access

Community

0 views

SMILES-IUPAC(PubChem): 1 Million Chemical Compound Identifiers

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Community