Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Prompt Variations and LLM Responses contains prompt variants and model outputs used to evaluate the Stability-Generalization Score (SGS). The dataset was created by author naghamo and was last updated on June 5, 2026. It includes data from six QA and instruction benchmarks, such as TruthfulQA and Natural Questions, and covers responses from eleven large language models.
License is unknown; terms of use must be verified before application.