Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
LangMap-TheStack-csharp-100M is a dataset containing 100 million tokens of C# source code, intended for code finetuning. The data was streamed from the bigcode/the-stack repository and tokenized using the allenai/OLMo-3-1025-7B tokenizer. It was created by MultilingualUnigramLM and last updated on Hugging Face on April 13, 2026.
License is unknown, which may restrict commercial or research use.