Sign in to view source links and access this dataset
Description
528 ChatML-formatted rows contain trajectories of code execution, rendered for instruction tuning with a 48,000-token budget. The dataset, created by user 'bytkim', was last updated on Hugging Face in May 2026. It includes prompt/completion expansions for supervised fine-tuning tooling alongside token-budget and lineage reports.
Use Cases
Fine-tuning language models for code generation based on the provided teacher trajectories.
Training models on structured ChatML-formatted conversations for improved instruction following.
Supervised fine-tuning (SFT) of code-capable models using the prompt/completion expansion files.
Analyzing token budget allocation and model training lineage using the included reports.
Strengths
Contains 528 structured ChatML rows for direct use in training pipelines.
Includes explicit prompt/completion expansions tailored for supervised fine-tuning (SFT) tooling.
Provides auxiliary reports for token-budget analysis and run-lineage tracking.
Artifacts are verified with SHA-256 checksums for integrity.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is limited to 528 ChatML rows, which may be a small sample for some training needs.
Description metadata is limited; actual data quality and content require manual inspection.
Provenance
Source
Canonical archived source referenced but not specified; hosted on Hugging Face by user 'bytkim'.
Collection Method
Rendered from Codex CLI teacher trajectories.
Freshness
Last updated 2026-05-07 01:08:24.
License is unknown and should be verified before use.