Name: NIST-Aligned Synthetic Dataset for Causal AI and Graph Modeling
Creator: MENG, WEI
Published: 2026-03-26T12:49:21
Keywords: Mathematical Sciences

Description

MENG, WEI provides a synthetic dataset supporting the Structured Variable-Relationship Modelling framework. It includes variable-level data, node and edge tables for graph construction, and a ground-truth causal graph for evaluation. The dataset is designed for research in causal AI, explainable AI, and topological analysis.

Use Cases

Train graph neural networks using the nodes.csv and edges.csv files for structural link prediction.
Benchmark causal discovery algorithms against the ground-truth causal graph in ground_truth.json.
Perform topological data analysis on the structured variable-network data from data.csv and edges.csv.
Conduct information entropy analysis on variable values from data.csv to identify signal variables.
Validate structural equation modeling extensions using the node types from nodes.csv and relationship weights from edges.csv.

Strengths

Dataset follows a NIST-aligned synthetic data generation logic for auditability.
Includes multiple dedicated files for data, nodes, edges, and ground truth, supporting full analytical workflows.
Designed to represent key variable classes including independent, dependent, mediator, moderator, latent, hub, breakpoint, and signal variables.

Limitations

Dataset is fully synthetic, limiting direct applicability to real-world observational or experimental data.
Unknown row count, column count, and sample size prevent assessment of statistical power for modeling.
Evaluation relies on simulated ground truth, which may not capture the complexity of real causal systems.

Provenance

Source: WEI MENG Dataverse
Collection Method: NIST-aligned synthetic data generation to simulate complex systems with structured inter-variable relationships.
Time Range: null
Freshness: Last updated on 2026-03 26.
Geography: null

null

Mathematical Sciences

NIST-Aligned Synthetic Dataset for Causal AI and Graph Modeling

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info