Sign in to view source links and access this dataset
Description
4,926,930 pull requests from AI code assistants like Claude, Codex, and Copilot, as well as human developers, provide a large-scale view of software development activity. The dataset includes metadata such as repository snapshots, modified files, and standardized pull-request exports. It was created by AISE-TUDelft and last updated in May 2026.
Use Cases
Analyze AI assistant adoption and impact based on the volume of pull requests from different sources.
Compare code change patterns between human and AI-generated contributions based on the provided activity metadata.
Study repository evolution and refactoring trends using the included repository snapshots and modified file data.
Benchmark the performance of different AI coding tools based on metrics like merge rates and code additions.
Strengths
Large scale with 4,926,930 total pull requests.
Includes contributions from multiple named AI assistants (Claude, Codegen, Codex, Copilot) and humans for comparison.
Contains rich activity metadata such as repository snapshots and modified files.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count for specific cohorts is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
AISE-TUDelft via Hugging Face.
Collection Method
Likely collected from version control platforms like GitHub, given the focus on pull requests.
Freshness
Last updated 2026-05-13 23:10:32; freshness should be verified.
License is unknown; terms of use must be verified before application.