Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
1,100 unpublished university-level math problems sourced from real teaching materials, designed to evaluate the mathematical reasoning of Large Language Models. The benchmark is balanced across six core topics and includes 20% multimodal problems with visual elements. It was created by Toloka and last updated on 2026-01-30.
License information is unknown and should be verified before use.