Jackrong created this cleaned derivative of the ianncity/KIMI-K2.5-1000000x dataset, last updated on April 17, 2026. It preserves the original four-config layout and rewrites each record into a unified reasoning-SFT schema with fields like conversations, input, output, domain, and meta. The dataset is intended for supervised fine-tuning, with the teacher model KIMI-K2.5 recorded in the metadata.
Use Cases
- Supervised fine-tuning of language models based on the unified reasoning-SFT schema.
- Training models on structured conversational reasoning based on the 'conversations' field.
- Analyzing reasoning patterns across different domains based on the 'domain' field.
- Benchmarking model performance against a known teacher model based on the 'meta.teacher_model' metadata.
Strengths
- Derived from a source dataset containing 1,000,000 records.
- Provides a cleaned and unified schema with fields like id, conversations, input, output, domain, and meta.
- Records the specific teacher model (KIMI-K2.5) used in the metadata.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count for the cleaned version is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- ianncity/KIMI-K2.5-1000000x
- Collection Method
- Cleaned derivative preserving original layout and rewritten into a unified schema.
- Freshness
- Last updated 2026-04-17 16:27:02; freshness should be verified.