87,026 claims are annotated with supporting or refuting evidence from Wikipedia sentences and table cells. Each claim is labeled as supported, refuted, or not enough information based on this evidence. The dataset was created by Rami Aly at the University of Cambridge for fact verification research.
Use Cases
- Train fact verification models based on annotated claims and evidence labels.
- Benchmark information retrieval systems on extracting evidence from both text and tables.
- Develop multimodal reasoning models based on the combination of unstructured text and structured tabular data.
- Study the challenge of verifying claims where evidence is insufficient.
Strengths
- 87,026 verified claims provide a substantial scale for model training.
- Evidence is drawn from both unstructured sentences and structured table cells in Wikipedia.
- Each claim has a clear verdict label (supports, refutes, not enough information).
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Last update date is unknown; freshness unverified.
- Data may reflect the temporal and topical bias inherent to the Wikipedia snapshot used.
Provenance
- Source
- University of Cambridge
- Collection Method
- Claims were annotated with evidence from Wikipedia.