Name: Qwen3.6-Plus Multi-Domain Reasoning Dataset for Distillation
Creator: khazarai
Published: 2026-04-02T23:00:58
Keywords: Task Categoriestext Generation, Librarypolars, Languageen, Text Generation, Size Categoriesn1 K, Modalitytext, Librarymlcroissant, Librarydatasets, Librarypandas, Distillation, Text, Multi Domain, Regionus, Reasoning, JSON, Knowledge Distillation, Licenseapache 20

Description

876,205 tokens of multi-domain text data were prepared by author khazarai for knowledge distillation using the Qwen3.6-plus model. The dataset covers topics including coding, mathematics, finance, medicine, and economics. It was last updated in April 2026.

Use Cases

Distill reasoning capabilities from the Qwen3.6-plus teacher model using the 876,205 token corpus for text generation tasks.
Fine-tune language models on specific domains like coding or mathematics using the categorized text sequences.
Benchmark model performance on reasoning tasks across finance, medicine, and economics domains present in the data.
Train student models to generate long-form content by leveraging examples with a maximum sequence length of 6,500 tokens.

Strengths

Corpus contains 876,205 tokens for distillation training.
Data spans five distinct high-reasoning domains: coding, mathematics, finance, medicine, and economics.
Maximum sequence length of 6,500 tokens supports training for long-context generation.

Limitations

Specific row count, column structure, and sample data are unknown, limiting reproducibility.
Dataset scope is defined by the single teacher model's outputs, potentially introducing model-specific biases.

Provenance

Source: huggingface
Collection Method: Prepared for distillation using outputs from the Qwen3.6-plus teacher model.
Time Range: null
Freshness: Last updated in April 2026.
Geography: null

License is listed as Apache 2.0 in platform tags, but the raw description does not confirm it; verification is recommended. The dataset's internal structure (columns, file formats) is unspecified.

Text JSON Task Categoriestext Generation Librarypolars Languageen Text Generation Size Categoriesn1 K Modalitytext Librarymlcroissant Librarydatasets Librarypandas Distillation Multi Domain Regionus Reasoning Knowledge Distillation Licenseapache 20

Qwen3.6-Plus Multi-Domain Reasoning Dataset for Distillation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info