Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
DATA1 is a large-scale domain-specific code dataset containing over 1.1 billion lines of code. It was collected from GitHub repositories by SciCodePile and covers 178 interdisciplinary topics in fields like biology, chemistry, and materials science. The dataset was last updated on March 13, 2026.
License is unknown, which may restrict usage.