Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
7.21 billion tokens of Kazakh-dominant text, curriculum-ordered so quality rises toward the end. The corpus, created by TilQazyna, is split into a bulk tier of 4.83 billion tokens and a premium anneal tier of 2.38 billion tokens.
License is unknown; terms of use must be verified before application.