SWE-bench Verified: 500 Human-Validated GitHub Issue-PR Pairs

Name: SWE-bench Verified: 500 Human-Validated GitHub Issue-PR Pairs
Creator: SWE-bench
Published: 2025-04-29T20:42:32
Keywords: Librarypolars, Benchmarkofficial, Size Categoriesn1 K, Modalitytext, Librarymlcroissant, Librarydatasets, Librarypandas, Parquet, Benchmarkeval Yaml, Regionus

by SWE-benchUpdated 3mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

500 human-validated GitHub Issue-Pull Request pairs from popular Python repositories, curated by the SWE-bench team. This subset of the original benchmark focuses on high-quality samples verified for evaluation accuracy through manual review. It serves as a rigorous test for autonomous systems attempting to solve real-world software engineering tasks.

Use Cases

Evaluating autonomous agents on resolving GitHub issues using the issue description and PR pairs
Benchmarking code generation models against unit test verification suites
Analyzing software engineering task success rates across different Python repositories

Strengths

500 human-validated samples ensuring high-quality ground truth
Unit test verification using post-PR behavior as a reference
Sourced from popular, real-world Python repositories

Limitations

Small sample size of 500 records compared to the full benchmark
Restricted exclusively to the Python programming language
Snapshot-based data may not reflect the current state of the source repositories

Provenance

Source: SWE-bench
Collection Method: Human-validated subset of scraped GitHub Issue-Pull Request pairs
Freshness: Last updated February 2026.

Evaluation requires a specific unit test verification environment as described in the SWE-bench documentation to execute the post-PR behavior checks.

Parquet Librarypolars Benchmarkofficial Size Categoriesn1 K Modalitytext Librarymlcroissant Librarydatasets Librarypandas Benchmarkeval Yaml Regionus

Related Datasets

Quality Score

C44

Description

51

Source

39

Reputation

56

Access

22

Community

117.9K downloads

24 likes

0 views

Dataset Info

Author: SWE-bench
Created: Apr 29, 2025
Updated: Feb 27, 2026
Last synced: Jun 6, 2026

Access

22

Community

117.9K downloads

24 likes

0 views

Dataset Info

Author: SWE-bench
Created: Apr 29, 2025
Updated: Feb 27, 2026
Last synced: Jun 6, 2026

SWE-bench Verified: 500 Human-Validated GitHub Issue-PR Pairs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info