Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
MegaMath is an open math pretraining dataset containing over 300 billion tokens, curated from diverse, math-focused sources. It was created by the LLM360 Team as part of TxT360, with data re-extracted from Common Crawl using math-oriented optimizations and filtering.
The full description is hosted externally on the Hugging Face dataset page; users should review it for complete details. License information is unknown and should be verified before use.