Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Wikipedia German — Preprocessed is a cleaned version of the German Wikipedia dump, totaling 13.5 GB. The dataset was processed by raj2708 from a dump dated May 2026, sourced from de.wikipedia.org. It is intended for use in large language model pretraining.
License is CC-BY-SA 4.0; derivative works must be shared under the same license.