Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
The dataset is a curated collection of English mathematics text documents, created by OctoThinker and last updated in July 2025. It is derived from the MegaMath-Web corpus and annotated using the Llama-3.1-70B-instruct model. The dataset is intended for natural language processing and large language model training, with content filtered based on a quality scoring threshold.
The full description is available only on the Hugging Face dataset page. The specific filtering threshold and detailed preprocessing steps are not provided in the input.