Sign in to view source links and access this dataset
Description
GLM-5.1-Reasoning-1M-Cleaned is a cleaned and reformatted derivative of the Kassadin88/GLM-5.1-1000000x dataset, prepared by EngMuhammadAtef. It preserves the original four-subset layout (main, PHD-Science, Multilingual-STEM, Math) while converting examples into a unified schema with explicit conversation, input, output, domain, and meta fields.
Use Cases
Fine-tuning language models for instruction-following based on the unified SFT-ready schema.
Training models on specialized reasoning tasks based on the PHD-Science and Math subsets.
Developing multilingual STEM question-answering systems based on the Multilingual-STEM subset.
Strengths
Derived from a source dataset containing 1,000,000 examples.
Organized into four distinct subsets: main, PHD-Science, Multilingual-STEM, and Math.
Data is reformatted into a unified schema with explicit fields for model training.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Freshness should be verified as the last update timestamp is 2026-04-21 00:35:22.
Provenance
Source
Derivative of Kassadin88/GLM-5.1-1000000x dataset.
Collection Method
Cleaned and reformatted from the original source.
Freshness
Last updated 2026-04-21 00:35:22.
License is unknown; terms of use must be verified.