21.7 million rows of development metadata from 17 public GitHub repositories, fetched via the GitHub REST and GraphQL APIs. The data is structured across 8 tables covering issues, pull requests, comments, and other events, totaling 1.5 GB in compressed Parquet format. It was created by open-index and last updated on 2026-04 10.
Use Cases
- Analyze issue resolution patterns based on timeline events and comments.
- Study pull request review processes based on code reviews and file changes.
- Model CI/CD pipeline outcomes based on status check data.
- Investigate developer collaboration based on comment and review activity.
Strengths
- 21.7 million rows provide substantial scale for analysis.
- Data is structured across 8 specific tables (e.g., issues, comments, timeline events) for focused queries.
- 1.5 GB of Zstd-compressed Parquet offers efficient storage and access.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- The dataset covers only 17 repositories, which may limit generalizability.
- Freshness should be verified as the last update timestamp is 2026-04-10.
Provenance
- Source
- 17 public GitHub repositories via GitHub REST API and GraphQL API.
- Collection Method
- Fetched from APIs and converted to Parquet.
- Time Range
- null
- Freshness
- Last updated 2026-04-10 20:47:35.
- Geography
- null