12,800 human-labeled short statements were collected from politifact.com's API. Each statement was evaluated by a PolitiFact editor for its truthfulness, and the labeler provides a detailed analysis report to ground each judgment. The dataset was created by DomLoyer and last updated in April 2026.
Use Cases
- Classify statement truthfulness using labels like 'pants-fire' and editor-provided analysis reports.
- Analyze patterns in political speech and misinformation across the labeled statement corpus.
- Train text classifiers to detect fake news based on the statement text and associated metadata.
- Study the relationship between statement content and the editor's justification within the analysis reports.
Strengths
- 12,800 human-labeled statements provide a substantial text corpus.
- Label distribution is relatively balanced, with instances for most labels ranging from 2,063 to 2,638.
Limitations
- The 'pants-fire' label has only 1,050 instances, indicating a potential class imbalance for that specific category.
- The dataset's temporal coverage and geographic scope are not specified, limiting analysis of trends over time or region.
Provenance
- Source
- Politifact.com API
- Collection Method
- Statements were evaluated and labeled by PolitiFact.com editors, with each judgment supported by an analysis report.
- Time Range
- null
- Freshness
- Last updated in April 2026.
- Geography
- null