Description

A factor-controlled benchmark for studying evaluation awareness in language models, where eight psychology-grounded trigger factors can be independently manipulated. The dataset was created by researchers from ETH Zürich, the Max Planck Institute for Intelligent Systems, and other institutions, and was last updated on the platform in May 2026.

Use Cases

Studying evaluation awareness in language models based on the described psychology-grounded trigger factors.
Conducting controlled experiments on model behavior based on the factor-controlled benchmark design.
Benchmarking language model robustness to specific psychological triggers as described in the dataset summary.

Strengths

Designed with eight psychology-grounded trigger factors that can be independently manipulated, suggesting a structured experimental design.
Created by a multi-institutional research team including ETH Zürich and the Max Planck Institute for Intelligent Systems.
Last updated on the platform in May 2026, indicating recent maintenance.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment for large-scale training.
The full description is hosted externally, requiring a click-through for complete documentation.

Provenance

Source: aisa-group on Hugging Face
Collection Method: Likely constructed as a controlled benchmark for research purposes, as described.
Freshness: Last updated 2026-05-20 14:01:40

The full dataset description is hosted on an external page; users must visit the provided URL for complete details.

Text Language Models Evaluation Awareness Psychology Benchmark Factor Controlled

EvalAwareBench: A Factor-Controlled Benchmark for Language Model Evaluation Awareness

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info