Sign in to view source links and access this dataset
Description
A corpus of 2,000,115 inorganic crystal structures curated from public DFT and experimental databases. It includes conditioning metadata such as space group, crystal system, magnetic ordering, formation energy, and band gap. The dataset was created by willgbryan13 and last updated on 2026-05-13.
Use Cases
Predict formation energy of new inorganic compounds based on crystal structure features.
Classify magnetic ordering in crystals based on structural metadata.
Generate novel crystal structures conditioned on space group and crystal system.
Train models to estimate band gap from structural and compositional data.
Balance training data for ML models by mitigating the heavy class skew present in raw source databases.
Strengths
Contains 2,000,115 rows of inorganic crystal structures.
Includes 21 columns of full CIF data and metadata such as formation energy and band gap.
Curated to remove heavy class skew present in the raw merged source data.
Stored in Apache Parquet format across 41 chunks for efficient access.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
The specific public DFT and experimental databases used as sources are not named in the provided description.
Last updated 2026-05-13 06:26:20; freshness should be verified for current research.
Provenance
Source
Multiple public DFT and experimental databases.
Collection Method
Drawn from public databases and curated to remove class skew.
Freshness
Last updated 2026-05-13 06:26:20.
License is unknown and should be verified before use.