Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
BigCode released The Stack in late 2022, a 3TB collection of source code spanning 30+ programming languages and 193 permissive licenses. The dataset contains between 100 million and 1 billion near-deduplicated records of public code files scraped from the web.
Data is provided in Parquet format; developers can check if their code is included or request removal via the BigCode 'Am I in The Stack' tool.