318,000 agent trajectories for instruction tuning of large language models in software engineering. The dataset was synthesized using the Qwen3-Coder-480B-A35B-Instruct model and collected via the OpenHands framework. NVIDIA authored the dataset, which was last updated on May 5, 2026.
Use Cases
- Supervised fine-tuning of software engineering agents based on synthesized agent trajectories.
- Improving code generation and problem-solving capabilities of LLMs based on instruction-tuning data.
- Benchmarking agent performance in software engineering tasks based on trajectory data.
- Training models for autonomous code repair or generation based on the described agentic trajectories.
Strengths
- Contains 318,000 agent trajectories, providing a substantial volume of training data.
- Specifically curated for supervised fine-tuning (SFT) of LLMs.
- Synthesized using a state-of-the-art, large language model (Qwen3-Coder-480B-A35B-Instruct).
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- NVIDIA via Hugging Face.
- Collection Method
- Synthesized using the Qwen3-Coder-480B-A35B-Instruct model and collected via the OpenHands framework.
- Time Range
- null
- Freshness
- Last updated 2026-05-05 19:34:05; freshness should be verified.
- Geography
- null