Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
OpenDataArena released this collection of 25 million instruction-following samples across 63 source datasets in March 2026. Each record is annotated with 30 distinct quality metrics to support fine-grained data selection for Supervised Fine-Tuning (SFT).
Data is provided in Parquet format and licensed under Apache 2.0; users should utilize Polars or Dask to manage the 25 million records efficiently.