Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Long-Data-Col-rp_pile_pretrain is a subset of the togethercomputer/Long-Data-Collections dataset, specifically the pretrain split files rp_sub.jsonl.zst and pile_sub.jsonl.zst. The dataset was created by BEE-spoke-data and last updated on December 29, 2025. To focus on long text, rows where the text contains fewer than 250 characters were dropped.
License details are unknown; refer to the source dataset and its source datasets for licensing information.