Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
199 challenging mathematical problems designed to be at the limit of current LLM abilities. The dataset, named DAFT Math (Difficult Automatically-scorable Free-response Tasks for Math), was created by metr-evals and last updated on July 17, 2025. It is presented as a research artifact for a niche use-case.
The dataset description strongly recommends reading the limitations section before use.