Name: GLM-5.1-Reasoning-1M-Cleaned: Instruction-Tuning Data for Language Model Reasoning
Creator: ansulev
Published: 2026-04-19T23:06:31
Keywords: Text, Multilingual, Language Model, Reasoning, Sft Data

Description

GLM-5.1-Reasoning-1M-Cleaned is a cleaned and reformatted derivative of the Kassadin88/GLM-5.1-1000000x dataset, containing examples for language model reasoning. It was prepared by ansulev and last updated on 2026-04-19. The dataset preserves an original four-subset layout covering main, PHD-Science, Multilingual-STEM, and Math topics, with examples converted into a unified schema for supervised fine-tuning.

Use Cases

Supervised fine-tuning of language models based on the unified SFT-ready conversation schema.
Training models for scientific reasoning based on the PHD-Science subset mentioned in the description.
Improving multilingual STEM question-answering capabilities based on the Multilingual-STEM subset.
Enhancing mathematical problem-solving in language models based on the Math subset.

Strengths

Data is explicitly cleaned and reformatted into a unified SFT-ready schema with explicit fields.
Preserves a structured four-subset layout covering distinct reasoning domains.
Derived from a known source dataset (Kassadin88/GLM-5.1-1000000x), providing traceable lineage.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: Derivative of Kassadin88/GLM-5.1-1000000x dataset, published on Hugging Face.
Collection Method: Cleaned and reformatted from the original dataset.
Freshness: Last updated 2026-04-19 23:06:31; freshness should be verified.

License is unknown; users should verify permissions before use.

Text Multilingual Language Model Reasoning Sft Data

GLM-5.1-Reasoning-1M-Cleaned: Instruction-Tuning Data for Language Model Reasoning

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info