Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
4.1 GB of cleaned Arabic text from a May 2026 Wikipedia dump, processed by raj2708. The dataset is a preprocessed version of articles from ar.wikipedia.org, filtered for length and converted from wikitext to plain text. It is intended for use in large language model pretraining.
License is CC-BY-SA 4.0, requiring attribution and share-alike terms for derivative works.