Name: GLM-5.1-OpenThoughts3-Distill: A Reasoning Dataset for Science, Code, and Math
Creator: Kassadin88
Published: 2026-04-28T09:20:25
Keywords: Mathematics, Code, Text, Science, Reasoning Distillation, Synthetic

Description

GLM-5.1 generated this distilled reasoning dataset from 1.2 million prompts in the OpenThoughts3 collection. The dataset covers three domains, with the Science split containing 56,974 distilled responses from 100,000 original prompts. Author Kassadin88 last updated the dataset on Hugging Face in April 2026.

Use Cases

Fine-tuning language models for scientific reasoning based on distilled Physics, Chemistry, and Biology prompts.
Evaluating model performance on programming and algorithm problems based on the Code domain subset.
Training models for advanced mathematical reasoning based on prompts covering Competition Math, Proof, and Algebra.
Studying knowledge distillation techniques for reasoning tasks using the dataset's structure of original prompts and distilled responses.

Strengths

The Science domain split is marked as complete with 56,974 distilled responses.
The dataset is derived from a substantial source corpus of 1.2 million original prompts.
Covers three distinct, high-value domains for AI reasoning: Science, Code, and Mathematics.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
The Code and Math splits are listed as in progress or pending, indicating incomplete coverage.
Row count for the full dataset is unknown, which may limit suitability assessment.

Provenance

Source: Distilled by the GLM-5.1 model from the OpenThoughts3-1.2M prompt collection.
Collection Method: Likely involves a knowledge distillation process where a large model generates reasoning traces or responses.
Time Range: null
Freshness: Last updated 2026-04-28 11:47:02; freshness should be verified.
Geography: null

License is unknown; users must verify terms before use.

Text Mathematics Code Science Reasoning Distillation Synthetic

GLM-5.1-OpenThoughts3-Distill: A Reasoning Dataset for Science, Code, and Math

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info