Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
1M to 10M text records for supervised fine-tuning (SFT) of the OLMo language model, released by the Allen Institute for AI (allenai) in early 2026. The collection is distributed in optimized-parquet format for high-performance processing using libraries like Polars and Dask.
Licensed under Open Data Commons Attribution License v1.0 (ODC-By); users must follow AllenAI's Responsible Use Guidelines and provide proper attribution.