Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A factor-controlled benchmark for studying evaluation awareness in language models, where eight psychology-grounded trigger factors can be independently manipulated. The dataset was created by researchers from ETH Zürich, the Max Planck Institute for Intelligent Systems, and other institutions, and was last updated on the platform in May 2026.
The full dataset description is hosted on an external page; users must visit the provided URL for complete details.