Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Encompassing 115 million code files from GitHub, totaling 1 TB of text data. It includes code in 32 programming languages across 60 file extensions and was created from the GitHub dataset on BigQuery by codeparrot.
The license is unknown, which is a critical consideration for usage. The specific file formats and internal structure of the 1 TB of data are not detailed.