Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
CommitPackFT is a 2GB collection of high-quality code commit messages filtered to resemble natural language instructions, containing between 100,000 and 1,000,000 records. Developed by BigCode and released in August 2023, it serves as a fine-tuning variant of the larger CommitPack dataset for instruction-following tasks. The data is linked to the research findings in Arxiv paper 2308.07124.
The dataset is released under the MIT license and is intended for use with the Hugging Face datasets library. Users should refer to the OctoPack paper for specific filtering heuristics used to define 'high-quality' messages.