Name: Stitched Reasoning Trajectories 7M: Algorithmically Stitched Multi-Hop Reasoning Data
Creator: ajibawa-2023
Published: 2026-05-06T06:57:30
Keywords: Language Model Training, Text, Large Scale, Multi Hop Reasoning, Synthetic Data, Reasoning Trajectories, Synthetic

Description

Stitched-Reasoning-Trajectories-7M is a massive-scale, synthetic dataset containing 7 million multi-hop reasoning trajectories. It was algorithmically constructed by stitching together discrete reasoning traces from the glaiveai/reasoning-v1-20m dataset to form continuous, coherent, and logically structured multi-agent trajectories. The dataset was created by author ajibawa-2023 and was last updated on May 6, 2026.

Use Cases

Training language models for multi-step reasoning based on the described stitched trajectories.
Benchmarking model performance on complex, logically structured reasoning chains.
Studying the properties of synthetic reasoning data generated from algorithmic stitching of discrete traces.
Developing multi-agent reasoning systems based on the continuous trajectory format.

Strengths

Massive scale of 7 million reasoning trajectories.
Algorithmically constructed for coherence and logical structure from a source dataset of 20 million traces.
Focus on continuous multi-agent trajectories, a distinctive format for reasoning data.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data is synthetic, which may introduce artifacts not present in human-generated reasoning.

Provenance

Source: Derived from the glaiveai/reasoning-v1-20m dataset.
Collection Method: Algorithmically stitched from discrete reasoning traces by mapping keyword overlaps and extracting sub-questions.
Time Range: null
Freshness: Last updated 2026-05-06 08:45:07; freshness should be verified.
Geography: null

null

Text Language Model Training Large Scale Multi Hop Reasoning Synthetic Data Reasoning Trajectories Synthetic

Stitched Reasoning Trajectories 7M: Algorithmically Stitched Multi-Hop Reasoning Data

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info