Name: KIMI-K2.5-550000x: 550,000 Reasoning Traces from a Language Model
Creator: ansulev
Published: 2026-04-03T09:49:47
Keywords: Reasoning Traces, Math Problems, Text, Language Model, Code Generation, Science Qa

Description

550,000 reasoning traces were distilled from the KIMI-K2.5 language model on high-reasoning tasks. The collection includes 2 billion tokens and is distributed across coding (60%), science (15%), math (10%), computer science (5%), logical questions (5%), and creative writing (5%). It was created by ansulev and last updated on Hugging Face in April 2026.

Use Cases

Training or fine-tuning language models for code generation based on the 60% coding subset.
Benchmarking model reasoning capabilities on science problems using the Physics, Chemistry, and Biology traces.
Studying step-by-step logical reasoning processes for math and logical questions.
Analyzing the structure of model-generated reasoning traces across different domains like creative writing.

Strengths

Large scale with 550,000 distinct reasoning traces.
Broad domain coverage across six distinct categories, with coding being the largest at 60%.
Substantial token volume of 2 billion tokens for training or analysis.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: Distilled from the KIMI-K2.5 language model.
Collection Method: Collected using a modified Datagen tool, as referenced in the description.
Freshness: Last updated 2026-04-03 09:49:47; freshness should be verified.

License is unknown; users must verify terms of use before download.

Text Reasoning Traces Math Problems Language Model Code Generation Science Qa

KIMI-K2.5-550000x: 550,000 Reasoning Traces from a Language Model

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info