3 datasets containing reasoning and math problems paired with Chain-of-Thought (CoT) traces generated by Llama 3.1 8B Instruct. The collection includes step-level correctness annotations across arithmetic, boolean logic, and math domains to support the training of reasoning verifiers.
Use Cases
- Train a step-level verifier using the correctness annotations to identify specific points of failure in reasoning chains
- Analyze error propagation in LLMs by comparing the Llama 3.1 8B CoT traces with the provided step-level labels
- Develop reward models for Reinforcement Learning from Human Feedback (RLHF) using the annotated reasoning steps
Strengths
- Includes step-level correctness annotations for every intermediate reasoning step in the CoT traces
- Features model-generated outputs specifically from the Llama 3.1 8B Instruct model
- Covers three distinct reasoning domains including arithmetic and boolean logic