A collection of four annotated datasets submitted to ICWSM'25, created by Marc Riven Herrera and hosted on Harvard Dataverse. It analyzes political and non-political content from Philippine Facebook pages, providing insights into user engagement, sentiment, civility, and misinformation. The datasets support applications in social media analysis, political communication, and machine learning.
Use Cases
- Classifying political versus non-political content based on the described content characteristics.
- Analyzing user engagement patterns on Philippine Facebook pages as mentioned in the description.
- Detecting misinformation in online political discussions based on the annotation labels.
- Studying sentiment and civility in bilingual (English-Tagalog) social media discourse.
- Training machine learning models for multi-label classification of social media posts.
Strengths
- Four distinct annotated datasets providing multiple analytical dimensions.
- Focus on a bilingual (English-Tagalog) social media context, which is less common.
- Submitted for peer review to a major conference (ICWSM'25).
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count, file formats, and license information are unknown.
- Data may reflect geographic and platform bias inherent to its source.
Provenance
- Source
- Harvard Dataverse
- Collection Method
- Collected and annotated from Philippine Facebook pages.
- Freshness
- Last updated 2026-03-22 09:34:43; freshness should be verified.
- Geography
- Philippines