Name: MM-MathInstruct: Multimodal Math Problem-Solving Dataset
Creator: MathLLMs
Published: 2025-05-19T03:28:15
Keywords: Task Categoriestext Generation, Task Categoriesmultiple Choice, Task Categoriesquestion Answering, Figure Qa, Size Categories1 Mn10 M, Languageen, Task Categoriesvisual Question Answering, Math Reasoning, Arxiv250510557, Textbook Qa, Mathematics, Math Qa, Geometry Diagram, Multi Modal Qa, Computer Vision, Geometry Qa, Synthetic Scene, Regionus, Reasoning, Math Word Problem, Geometry, Vqa, Licenseapache 20, Visual Question Answering, Multimodal Math, Multimodal

Description

MathCoder-VL is a series of open-source large multimodal models tailored for general math problem-solving. The dataset likely contains multimodal math problems combining visual and textual elements. It was created by MathLLMs and last updated on October 11, 2025.

Use Cases

Train multimodal models for math problem-solving based on the described combination of vision and code.
Benchmark visual question answering performance on geometry diagrams and synthetic scenes.
Evaluate reasoning capabilities on math word problems and textbook QA.
Develop image-to-code models for mathematical figures based on the FigCodifier-8B model mentioned.
Fine-tune models for multi-modal QA tasks involving mathematics.

Strengths

Dataset is associated with a published paper (arXiv:2505.10557).
Dataset is linked to a GitHub repository (https://github.com/mathllm/MathCoder).
Dataset was last updated on October 11, 2025.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: MathLLMs
Freshness: Last updated 2025-10-11 05:14:13; freshness should be verified.

MM-MathInstruct: Multimodal Math Problem-Solving Dataset

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info