Apache Java Projects with Code, Dependencies, and Pull Requests for Technical Debt Study
by Ciprian-Viorel Stupinean·Updated 1mo ago
1.2 GB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
A 1.2 GB dataset of mature open-source Java projects from the Apache Software Foundation, selected for their diverse use of third-party libraries. It includes source code, dependency information, and GitHub pull request discussions, with static analysis results from PMD and Designite to capture code- and design-level technical debt indicators. The dataset was created by Ciprian-Viorel Stupinean and last updated on 2026-04-25.
Use Cases
Identify code-level technical debt indicators based on static analysis results from PMD.
Analyze design-level technical debt indicators based on static analysis results from Designite.
Study the relationship between third-party library usage and technical debt based on dependency information.
Examine socio-technical signals related to library-induced debt based on GitHub pull request discussions.
Strengths
Includes source code, dependency information, and pull request discussions for a multi-faceted analysis.
Static analysis results from two tools (PMD and Designite) provide both code- and design-level debt indicators.
Projects are mature and from the Apache Software Foundation, suggesting a degree of quality and real-world relevance.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect bias inherent to the selected Apache projects and the figshare platform.
Provenance
Source
figshare
Collection Method
Projects selected from the Apache Software Foundation; static analysis performed with PMD and Designite.
Freshness
Last updated 2026-04-25 08:28:21; freshness should be verified.
Data is provided in a 1.2 GB ZIP file; license is CC-BY-4.0.