Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A script for streaming large language model training, authored by uv-scripts and last updated in January 2026. It demonstrates training a Qwen model on Latin using 1.47 million texts streamed directly from the FineWeb-2 dataset on Hugging Face Hub. The associated blog post details the method for training on massive datasets without local downloads.
This appears to be a script or tutorial resource, not a dataset in the traditional sense; users should expect code and instructions rather than a direct data download.