Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
CommitPack contains 4TB of GitHub commit data scraped from permissively licensed repositories by the BigCode project. Released in conjunction with the OctoPack research (arXiv:2308.07124), it provides a massive-scale collection of code changes and their corresponding natural language descriptions.
The dataset is released under the MIT license. Users should refer to the OctoPack paper (arXiv:2308.07124) for specific filtering and preprocessing steps used by the authors.