We-Math is a benchmark dataset of 6,500 visual math problems, spanning 67 hierarchical knowledge concepts and 5 layers of knowledge granularity, introduced at ACL 2025. It was created by the We-Math team and last updated on the Hugging Face platform in August 2025. The dataset is designed to explore problem-solving principles beyond end-to-end performance.
Use Cases
- Benchmarking AI models' structured reasoning capabilities based on the 67 knowledge concepts.
- Analyzing model performance across different layers of knowledge granularity.
- Studying the relationship between visual problem presentation and solution strategies.
- Developing educational tools that target specific mathematical knowledge concepts.
Strengths
- Contains 6,500 visual math problems, providing a substantial evaluation corpus.
- Problems are categorized across 67 hierarchical knowledge concepts.
- Organized into 5 distinct layers of knowledge granularity for detailed analysis.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- We-Math team, presented at ACL 2025.
- Collection Method
- Meticulously collected and categorized, as described in the associated paper.
- Time Range
- null
- Freshness
- Last updated 2025-08-13 13:46:29; freshness should be verified.
- Geography
- null