Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
40 million GitHub repository records aggregated from GH Archive public event streams by ibragim-bad. The dataset provides per-repository statistics including stars, forks, and pull requests as of early 2026. It is formatted for large-scale analysis using tools like Polars and Dask.
The dataset is provided in Parquet format and is licensed under the MIT license. It is optimized for use with the Polars and Dask libraries.