Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
16 terabytes of uncompressed source code data across 22 programming languages and 23 file extensions. The collection originates from the public GitHub dataset on Google BigQuery and targets large-scale code modeling tasks.