Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Raw Java source code extracted from GitHub, GitLab, and Bitbucket repositories for training program repair models. It was used in the CoCoNuT research paper and includes commits up to the year 2006. The data has not been shuffled or tokenized.
The dataset is intended for research on program repair; users must handle raw, untokenized source code. The full description is available on the Hugging Face dataset page.