Sign in to view source links and access this dataset
Description
GLM-5.1-Reasoning-1M-Cleaned is a cleaned and reformatted derivative of the Kassadin88/GLM-5.1-1000000x dataset, containing examples for language model reasoning. It was prepared by ansulev and last updated on 2026-04-19. The dataset preserves an original four-subset layout covering main, PHD-Science, Multilingual-STEM, and Math topics, with examples converted into a unified schema for supervised fine-tuning.
Use Cases
Supervised fine-tuning of language models based on the unified SFT-ready conversation schema.
Training models for scientific reasoning based on the PHD-Science subset mentioned in the description.
Improving multilingual STEM question-answering capabilities based on the Multilingual-STEM subset.
Enhancing mathematical problem-solving in language models based on the Math subset.
Strengths
Data is explicitly cleaned and reformatted into a unified SFT-ready schema with explicit fields.
Preserves a structured four-subset layout covering distinct reasoning domains.
Derived from a known source dataset (Kassadin88/GLM-5.1-1000000x), providing traceable lineage.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
Derivative of Kassadin88/GLM-5.1-1000000x dataset, published on Hugging Face.
Collection Method
Cleaned and reformatted from the original dataset.
Freshness
Last updated 2026-04-19 23:06:31; freshness should be verified.
License is unknown; users should verify permissions before use.