Name: Prompt Variations and LLM Responses for Stability-Generalization Score Evaluation
Creator: naghamo
Published: 2026-06-05T08:59:15
Keywords: Prompt Engineering, Question Answering, Llm Evaluation, Text, Instruction Tuning

Description

Prompt Variations and LLM Responses contains prompt variants and model outputs used to evaluate the Stability-Generalization Score (SGS). The dataset was created by author naghamo and was last updated on June 5, 2026. It includes data from six QA and instruction benchmarks, such as TruthfulQA and Natural Questions, and covers responses from eleven large language models.

Use Cases

Benchmarking LLM robustness based on stylistic prompt perturbations mentioned in the description
Analyzing response consistency across open-source and closed-source models based on the described evaluation
Studying the Stability-Generalization Score (SGS) metric across different question-answering tasks

Strengths

Includes responses from eleven large language models, comprising eight open-source and three closed-source models
Covers six distinct QA and instruction benchmarks, including TruthfulQA and Natural Questions
Applies six families of stylistic perturbations to prompts for evaluation

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download

Provenance

Source: huggingface
Collection Method: Likely compiled from model inference outputs on benchmark datasets under controlled prompt variations.
Freshness: Last updated 2026-06-05 09:07:18; freshness should be verified

License is unknown; terms of use must be verified before application.

Text Prompt Engineering Question Answering Llm Evaluation Instruction Tuning

Prompt Variations and LLM Responses for Stability-Generalization Score Evaluation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info