Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Pretokenized chunks of text formatted as packed sequences of 513 tokens each, with no cross-document bleeding. The dataset was created by Beetle-Data and its metadata was last updated on May 18, 2026. It is sharded incrementally, with a marker file committed upon finalization.
License is unknown, which may restrict commercial or research use.