Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Encompassing raw data extracted from GitHub, GitLab, and Bitbucket repositories, used to train models for the CoCoNuT program repair paper. The data is not shuffled or tokenized, and the newest commit in the dataset is from 2005.
The dataset is intended for research on program repair; users must handle the raw, unstructured C code and be prepared for necessary preprocessing steps like tokenization.