Replication data and code for the research paper 'Same Words, Different Laws: Computational Analysis of Semantic Inconsistency in Global AI Regulation'. The dataset contains the primary regulatory corpus from the EU AI Act, US Executive Orders, NIST AI RMF, and Chinese CAC regulations. It was authored by Harish Gupta and uploaded to Harvard Dataverse in April 2026.
Use Cases
- Measure cosine similarity of six regulatory terms across three jurisdictions using sentence embeddings from the all-MiniLM-L6-v2 model.
- Analyze semantic alignment of terms like 'risk' or 'bias' between EU, US, and Chinese regulatory texts via computed results in JSON.
- Replicate bootstrapped confidence intervals and robustness checks for semantic inconsistency findings using the provided Jupyter notebook.
- Extract and compare sentence-level embeddings from the EU AI Act, US Executive Orders, NIST AI RMF, and CAC regulations.
Strengths
- Includes primary regulatory texts from four major sources: EU AI Act, US Executive Orders, NIST AI RMF, and Chinese CAC regulations.
- Provides a full analysis pipeline in a Jupyter notebook implementing text extraction, embedding, similarity calculation, and statistical checks.
- Uses a specific, documented sentence-transformers model (all-MiniLM-L6-v2) for consistent embedding generation.
Limitations
- Dataset size, row count, and specific file formats are not provided in the description.
- The corpus is limited to official English translations of non-English regulations, which may introduce translation bias.
- The analysis focuses on a specific set of six regulatory terms, limiting broader linguistic analysis of the full texts.
Provenance
- Source
- Harvard Dataverse, authored by Harish Gupta.
- Collection Method
- Text extraction from official regulatory documents; computational analysis using sentence embeddings and cosine similarity.
- Time Range
- null
- Freshness
- Last updated on April 7, 2026.
- Geography
- Covers regulations from the European Union, United States, and China.