3,000 reasoning-focused coding prompts and responses generated by the Deepseek V3.2 model. This dataset covers the initial subset of the nvidia/OpenCodeReasoning collection, totaling 47 million tokens of input and output text.
Use Cases
- Fine-tune large language models on complex coding logic using the reasoning traces provided in the output tokens
- Benchmark the performance of Deepseek V3.2 against other models using the 3,000 prompts from the nvidia/OpenCodeReasoning source
- Analyze token efficiency and cost-per-reasoning-step based on the $19.2 USD generation cost and 47M token count
Strengths
- Contains 3,000 unique prompts sourced from the nvidia/OpenCodeReasoning dataset
- Includes 47 million tokens of combined input and output data
- Features reasoning-heavy responses generated by the Deepseek V3.2 model
- Generated at a total cost of $19.2 USD