Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
MegaScience contains 1.25 million instances for scientific reasoning post-training, released by GAIR-NLP in July 2025. The collection aggregates multiple public datasets using data selection methods optimized through ablation studies to improve model performance in STEM domains.
Released under CC BY-NC-SA 4.0 license; users should refer to Arxiv paper 250716812 for methodology details.