Description

Replication data and code for the research paper 'Same Words, Different Laws: Computational Analysis of Semantic Inconsistency in Global AI Regulation'. The dataset contains the primary regulatory corpus from the EU AI Act, US Executive Orders, NIST AI RMF, and Chinese CAC regulations. It was authored by Harish Gupta and uploaded to Harvard Dataverse in April 2026.

Use Cases

Measure cosine similarity of six regulatory terms across three jurisdictions using sentence embeddings from the all-MiniLM-L6-v2 model.
Analyze semantic alignment of terms like 'risk' or 'bias' between EU, US, and Chinese regulatory texts via computed results in JSON.
Replicate bootstrapped confidence intervals and robustness checks for semantic inconsistency findings using the provided Jupyter notebook.
Extract and compare sentence-level embeddings from the EU AI Act, US Executive Orders, NIST AI RMF, and CAC regulations.

Strengths

Includes primary regulatory texts from four major sources: EU AI Act, US Executive Orders, NIST AI RMF, and Chinese CAC regulations.
Provides a full analysis pipeline in a Jupyter notebook implementing text extraction, embedding, similarity calculation, and statistical checks.
Uses a specific, documented sentence-transformers model (all-MiniLM-L6-v2) for consistent embedding generation.

Limitations

Dataset size, row count, and specific file formats are not provided in the description.
The corpus is limited to official English translations of non-English regulations, which may introduce translation bias.
The analysis focuses on a specific set of six regulatory terms, limiting broader linguistic analysis of the full texts.

Provenance

Source: Harvard Dataverse, authored by Harish Gupta.
Collection Method: Text extraction from official regulatory documents; computational analysis using sentence embeddings and cosine similarity.
Time Range: null
Freshness: Last updated on April 7, 2026.
Geography: Covers regulations from the European Union, United States, and China.

null

Text Computational Law Natural Language Processing Legal Text Analysis Ai Regulation Policy Comparison

Global AI Regulation Corpus with Semantic Similarity Analysis

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info