Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
LangMap-TheStack-python-100M is a dataset of 100 million tokens of sanitized Python source code, intended for code finetuning. It was created by MultilingualUnigramLM and streamed from the bigcode/the-stack repository. The dataset was last updated on April 13, 2026.
License restrictions are unknown.