Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A curated collection of 4.79 million Wikipedia articles from the 2008 and 2010 snapshot releases. The dataset is cleaned and compressed for efficient large-scale language model pretraining. It was created by the author 'adhyanshaa' and last updated on the platform in May 2026.
License is unknown; terms of use must be verified before application.