Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Dclm Baseline 1B is a 1 billion token sample created by codelion from the mlfoundations/dclm-baseline-1.0 dataset. It was generated using reservoir sampling to ensure statistical representativeness of the source's filtered, diverse web content. The dataset was last updated on November 2, 2025.
License is unknown; users must verify terms before use.