Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A cleaned Wikipedia corpus combines Serbian and Croatian Wikipedia articles. Croatian text has been transliterated to Cyrillic script, and wiki markup, infoboxes, and stub articles have been removed. The corpus was compiled by RafaelUI and is available on Hugging Face.
Source data is under CC BY-SA 4.0; corpus compilation is under CC BY 4.0.