Sign in to view source links and access this dataset
Description
1,739,249 tokens of text data generated by the Qwen3.6-plus model for knowledge distillation. The dataset covers topics including coding, mathematics, finance, medicine, and economics, with a maximum sequence length of 6,500 tokens per row. It was created by author 'ansulev' and last updated on April 8, —.
Use Cases
Training student models via distillation based on the high-reasoning outputs from the Qwen3.6-plus teacher model.
Fine-tuning language models for specialized domains based on the described coverage of coding, mathematics, and finance.
Benchmarking model reasoning capabilities based on the described multi-domain content.
Creating synthetic training data for instruction-following models based on the distillation-focused generation method.
Strengths
Contains 1,739,249 tokens of generated text.
Covers multiple high-reasoning domains: coding, mathematics, finance, medicine, and economics.
Uses a high-performance teacher model (Qwen3.6-plus) for generation.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
huggingface
Collection Method
Prepared for distillation using the Qwen3.6-plus model.
Time Range
null
Freshness
Last updated 2026-04-08 17:08:09; freshness should be verified.
Geography
null
License is unknown; usage restrictions should be verified.