DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

SWE-bench Trajectory Quality Subsets for Fine-Tuning Evaluation | DataSalon

Home Government & LegalSWE-bench Trajectory Quality Subsets for Fine-Tuning Evaluation

Government & Legal

SWE-bench Trajectory Quality Subsets for Fine-Tuning Evaluation

Name: SWE-bench Trajectory Quality Subsets for Fine-Tuning Evaluation
Creator: davongluck
Published: 2026-02-25T21:12:46
Keywords: Software Engineering, Benchmarking, Benchmark, Tabular, Fine Tuning

by davongluck·Updated 2mo ago

Available on 1 platform

Description

SWE-bench Trajectory Quality Subsets are curated for ablation studies in software engineering evaluation. The subsets, created by davongluck, were last updated on March 28, 2026. They are derived from a parent dataset using a v3 quality scoring framework.

Use Cases

Conducting ablation studies on trajectory quality metrics based on the described subsets like Ablation-NoB2-500.
Evaluating fine-tuned models for software problem-solving based on the resolved rate and mean score metrics.
Analyzing the impact of specific efficiency criteria (B2, B3) on trajectory performance based on the subset selection methodology.

Strengths

Subsets are curated using a defined v3 quality scoring framework.
Specific subsets have reported mean scores, such as 0.7253 for the Ablation-NoB3-500 subset.
Subsets like Ablation-NoB2-500 have a 100% resolved rate for their specific selection criteria.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface
Collection Method: Curated subsets constructed from nebius/SWE-rebench-openhands-trajectories.
Time Range: null
Freshness: Last updated 2026-03-28 13:23:01; freshness should be verified.
Geography: null

null

Tabular Software Engineering Benchmarking Benchmark Fine Tuning

Related Datasets

Quality Score

C44

Description

Source

Reputation

Quality Score

C44

Description

Source

Reputation

Access

Community

102 downloads

1 likes

0 views

Dataset Info

Author: davongluck
Created: Feb 25, 2026
Updated: Mar 28, 2026
Last synced: Apr 17, 2026

Access

Community

102 downloads

1 likes

0 views

Dataset Info

Author: davongluck
Created: Feb 25, 2026
Updated: Mar 28, 2026
Last synced: Apr 17, 2026

SWE-bench Trajectory Quality Subsets for Fine-Tuning Evaluation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info