Name: Qodex Teacher Codex R1 48K: Code Execution Trajectories for Instruction Tuning
Creator: bytkim
Published: 2026-05-06T23:46:26
Keywords: Chatml, Text, Code Execution

Description

528 ChatML-formatted rows contain trajectories of code execution, rendered for instruction tuning with a 48,000-token budget. The dataset, created by user 'bytkim', was last updated on Hugging Face in May 2026. It includes prompt/completion expansions for supervised fine-tuning tooling alongside token-budget and lineage reports.

Use Cases

Fine-tuning language models for code generation based on the provided teacher trajectories.
Training models on structured ChatML-formatted conversations for improved instruction following.
Supervised fine-tuning (SFT) of code-capable models using the prompt/completion expansion files.
Analyzing token budget allocation and model training lineage using the included reports.

Strengths

Contains 528 structured ChatML rows for direct use in training pipelines.
Includes explicit prompt/completion expansions tailored for supervised fine-tuning (SFT) tooling.
Provides auxiliary reports for token-budget analysis and run-lineage tracking.
Artifacts are verified with SHA-256 checksums for integrity.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is limited to 528 ChatML rows, which may be a small sample for some training needs.
Description metadata is limited; actual data quality and content require manual inspection.

Provenance

Source: Canonical archived source referenced but not specified; hosted on Hugging Face by user 'bytkim'.
Collection Method: Rendered from Codex CLI teacher trajectories.
Freshness: Last updated 2026-05-07 01:08:24.

License is unknown and should be verified before use.

Text Chatml Code Execution

Qodex Teacher Codex R1 48K: Code Execution Trajectories for Instruction Tuning

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info