Stitched-Reasoning-Trajectories-7M is a massive-scale, synthetic dataset containing 7 million multi-hop reasoning trajectories. It was algorithmically constructed by stitching together discrete reasoning traces from the glaiveai/reasoning-v1-20m dataset to form continuous, coherent, and logically structured multi-agent trajectories. The dataset was created by author ajibawa-2023 and was last updated on May 6, 2026.
Use Cases
- Training language models for multi-step reasoning based on the described stitched trajectories.
- Benchmarking model performance on complex, logically structured reasoning chains.
- Studying the properties of synthetic reasoning data generated from algorithmic stitching of discrete traces.
- Developing multi-agent reasoning systems based on the continuous trajectory format.
Strengths
- Massive scale of 7 million reasoning trajectories.
- Algorithmically constructed for coherence and logical structure from a source dataset of 20 million traces.
- Focus on continuous multi-agent trajectories, a distinctive format for reasoning data.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Data is synthetic, which may introduce artifacts not present in human-generated reasoning.
Provenance
- Source
- Derived from the glaiveai/reasoning-v1-20m dataset.
- Collection Method
- Algorithmically stitched from discrete reasoning traces by mapping keyword overlaps and extracting sub-questions.
- Time Range
- null
- Freshness
- Last updated 2026-05-06 08:45:07; freshness should be verified.
- Geography
- null