Sign in to view source links and access this dataset
Description
GLM-5.1 generated this distilled reasoning dataset from 1.2 million prompts in the OpenThoughts3 collection. The dataset covers three domains, with the Science split containing 56,974 distilled responses from 100,000 original prompts. Author Kassadin88 last updated the dataset on Hugging Face in April 2026.
Use Cases
Fine-tuning language models for scientific reasoning based on distilled Physics, Chemistry, and Biology prompts.
Evaluating model performance on programming and algorithm problems based on the Code domain subset.
Training models for advanced mathematical reasoning based on prompts covering Competition Math, Proof, and Algebra.
Studying knowledge distillation techniques for reasoning tasks using the dataset's structure of original prompts and distilled responses.
Strengths
The Science domain split is marked as complete with 56,974 distilled responses.
The dataset is derived from a substantial source corpus of 1.2 million original prompts.
Covers three distinct, high-value domains for AI reasoning: Science, Code, and Mathematics.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
The Code and Math splits are listed as in progress or pending, indicating incomplete coverage.
Row count for the full dataset is unknown, which may limit suitability assessment.
Provenance
Source
Distilled by the GLM-5.1 model from the OpenThoughts3-1.2M prompt collection.
Collection Method
Likely involves a knowledge distillation process where a large model generates reasoning traces or responses.
Time Range
null
Freshness
Last updated 2026-04-28 11:47:02; freshness should be verified.
Geography
null
License is unknown; users must verify terms before use.