SWE-bench Trajectory Quality Subsets are curated for ablation studies in software engineering evaluation. The subsets, created by davongluck, were last updated on March 28, 2026. They are derived from a parent dataset using a v3 quality scoring framework.
Use Cases
- Conducting ablation studies on trajectory quality metrics based on the described subsets like Ablation-NoB2-500.
- Evaluating fine-tuned models for software problem-solving based on the resolved rate and mean score metrics.
- Analyzing the impact of specific efficiency criteria (B2, B3) on trajectory performance based on the subset selection methodology.
Strengths
- Subsets are curated using a defined v3 quality scoring framework.
- Specific subsets have reported mean scores, such as 0.7253 for the Ablation-NoB3-500 subset.
- Subsets like Ablation-NoB2-500 have a 100% resolved rate for their specific selection criteria.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- huggingface
- Collection Method
- Curated subsets constructed from nebius/SWE-rebench-openhands-trajectories.
- Time Range
- null
- Freshness
- Last updated 2026-03-28 13:23:01; freshness should be verified.
- Geography
- null