Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Pre-tokenized binary evaluation splits used in the HiCI paper for hierarchical construction-integration in long-context LLMs. The dataset, authored by ZengXiangyu and last updated on 2026-04-21, contains PG19 test and validation sets tokenized with both Llama-2 and Llama-3 tokenizers.
Data is in pre-tokenized binary format (.bin files) and requires compatible tools for loading and interpretation.