MENG, WEI provides a synthetic dataset supporting the Structured Variable-Relationship Modelling framework. It includes variable-level data, node and edge tables for graph construction, and a ground-truth causal graph for evaluation. The dataset is designed for research in causal AI, explainable AI, and topological analysis.
Use Cases
- Train graph neural networks using the nodes.csv and edges.csv files for structural link prediction.
- Benchmark causal discovery algorithms against the ground-truth causal graph in ground_truth.json.
- Perform topological data analysis on the structured variable-network data from data.csv and edges.csv.
- Conduct information entropy analysis on variable values from data.csv to identify signal variables.
- Validate structural equation modeling extensions using the node types from nodes.csv and relationship weights from edges.csv.
Strengths
- Dataset follows a NIST-aligned synthetic data generation logic for auditability.
- Includes multiple dedicated files for data, nodes, edges, and ground truth, supporting full analytical workflows.
- Designed to represent key variable classes including independent, dependent, mediator, moderator, latent, hub, breakpoint, and signal variables.
Limitations
- Dataset is fully synthetic, limiting direct applicability to real-world observational or experimental data.
- Unknown row count, column count, and sample size prevent assessment of statistical power for modeling.
- Evaluation relies on simulated ground truth, which may not capture the complexity of real causal systems.
Provenance
- Source
- WEI MENG Dataverse
- Collection Method
- NIST-aligned synthetic data generation to simulate complex systems with structured inter-variable relationships.
- Time Range
- null
- Freshness
- Last updated on 2026-03 26.
- Geography
- null