Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A benchmark dataset for evaluating graders across text, multimodal, and agent scenarios. It supports the OpenJudge framework with labeled preference pairs for quality-assured grader development. The dataset was created by agentscope-ai and last updated on March 4, —.
License is unknown; terms of use must be verified before application.