Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Parameter Golf FineWeb Export contains 15,318,808 training and 50,000 validation documents of English text. The dataset is a Parquet-converted fork of a ~10-billion-token subset derived from the 100-billion-token FineWeb corpus, intended for parameter-golf experiments. It was created by mishig and last updated in March 2026.
The full dataset structure and column details are only available on the Hugging Face dataset page; the specific content of each document is not described in the provided input.