Name: DeepSeek-V4-Distill-8100x: 7,716 Examples for Reasoning-Oriented Model Distillation
Creator: Jackrong
Published: 2026-04-24T07:17:06
Keywords: Text, Language Model, Reasoning Distillation, Synthetic Data, Synthetic

Description

7,716 high-quality JSONL examples form a supervised fine-tuning dataset for reasoning-oriented distillation. The question prompts originate from the GLM-5.1-Reasoning-1M-Cleaned dataset, with answers generated by the teacher model DeepSeek-V4-Flash. Jackrong authored this dataset, which was last updated on April 24, 2026.

Use Cases

Fine-tuning student language models for reasoning tasks based on the distillation framework described.
Benchmarking model reasoning capabilities using the curated question-answer pairs.
Studying the effectiveness of synthetic data generation for instruction tuning based on the teacher-student methodology.

Strengths

Contains 7,716 high-quality examples after a cleaning process.
Answers were generated by a specific, advanced teacher model (DeepSeek-V4-Flash).
Question prompts are sourced from a known, cleaned reasoning dataset (GLM-5.1-Reasoning-1M-Cleaned).

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is limited to 7,716 examples, which may be a relatively small scale for some distillation tasks.
The description notes the answer pool was cleaned to remove real-time questions, but the specific criteria are not detailed here.

Provenance

Source: Jackrong on Hugging Face.
Collection Method: Questions sourced from Jackrong/GLM-5.1-Reasoning-1M-Cleaned; answers generated synthetically by DeepSeek-V4-Flash model.
Freshness: Last updated 2026-04-24 08:32:56; freshness should be verified.

The full description and details on the cleaning process for real-time questions are on the Hugging Face dataset page.

Text Language Model Reasoning Distillation Synthetic Data Synthetic

DeepSeek-V4-Distill-8100x: 7,716 Examples for Reasoning-Oriented Model Distillation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info