Sign in to view source links and access this dataset
Description
MMTutorBench is the first multimodal benchmark for AI math tutoring, containing 770 carefully curated problems paired with 1,414 images. The dataset provides structured reference answers and per-instance rubrics for evaluating large language models along three pedagogical axes: Insight, Operation Formulation, and Operation Execution. It was created by Tangchiu and last updated on May 22, 2026.
Use Cases
Benchmarking AI tutoring models based on multimodal math problems and images
Evaluating pedagogical reasoning based on the Insight, Operation Formulation, and Operation Execution rubrics
Training or fine-tuning multimodal AI assistants using structured reference answers
Conducting research on LLM-as-judge evaluation methods for educational content
Strengths
Contains 770 carefully curated multimodal math tutoring problems
Includes 1,414 images paired with the problems
Provides structured reference answers and per-instance evaluation rubrics
Limitations
Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download
Provenance
Source
Tangchiu via Hugging Face
Collection Method
Curated benchmark collection, likely for research purposes as described in the associated paper.
Freshness
Last updated 2026-05-22 15:07:21
License is unknown; users should verify terms before use.