Vox Dei is a coded dataset of pre-presidential populist discourse on Twitter/X. It contains a corpus of tweets from Andrés Manuel López Obrador of Mexico and Nayib Bukele of El Salvador from 2009 to 2019. The dataset, created by Julio César Chavelas and hosted on Harvard Dataverse, adds automated coding via Gemini 2.5 Flash API in R and inter-coder validation with Cohen's κ to a raw corpus.
Use Cases
- Analyze populist discourse features based on automated coding of tweet content.
- Compare political communication styles between leaders based on a pre-presidential social media corpus.
- Train or validate NLP models for political text classification based on a validated dataset.
- Study the evolution of populist rhetoric over time based on a decade-long time series.
Strengths
- Dataset covers a 10-year time range from 2009 to 2019.
- Automated coding was validated using Cohen's κ, suggesting a measure of reliability.
- Corpus is derived from a raw dataset available on Harvard Dataverse, providing traceable provenance.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Data may reflect geographic bias inherent to its focus on two Latin American leaders.
Provenance
- Source
- Harvard Dataverse
- Collection Method
- Automated coding with Gemini 2.5 Flash via API in R, applied to a raw Twitter/X corpus.
- Time Range
- 2009-2019
- Freshness
- Last updated 2026-05-13 17:00:59; freshness should be verified.
- Geography
- Mexico, El Salvador