Per-question data collected using the google/gemma-3-12b-it model on the reward_bench dataset. The dataset contains structured outputs including reward scores and model completions generated across a range of temperatures. It was authored by 'wtd' and last updated on 2026-05-23.
Use Cases
- Benchmarking reward model performance based on the structured reward scores.
- Analyzing the effect of generation temperature on model outputs based on the listed temperature schedule.
- Comparing LLM completions across different model configurations using the provided completions lists.
Strengths
- Includes a structured temperature schedule for generation, ranging from 0.0 to 1.0.
- Data was collected using a specific, named model (google/gemma-3-12b-it) on a known benchmark (reward_bench).
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- huggingface
- Collection Method
- Data collected by running the google/gemma-3-12b-it model on the reward_bench dataset.
- Freshness
- Last updated 2026-05-23 09:48:18; freshness should be verified.