Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Created by JHU-CLSP and released in 2024, this multilingual dataset provides a structured view of Wikipedia articles across 20 languages. It integrates article text with external web citations, source quality estimates, and cross-lingual translations based on May 2024 Wikipedia dumps.
Users must use the rehydrate-citations.py script provided by the authors to download the actual text of the cited web sources, as the dataset only contains the metadata and links for these citations.