ProcessBench: A Benchmark for Identifying Process Errors in Mathematical Reasoning

Name: ProcessBench: A Benchmark for Identifying Process Errors in Mathematical Reasoning
Creator: Qwen
Published: 2024-12-11T05:10:14
Keywords: Mathematical Reasoning, Evaluation, Benchmark, Text, Process Errors

by QwenUpdated 1y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

ProcessBench is a benchmark dataset proposed by the Qwen Team for evaluating the identification of process errors in mathematical reasoning. The dataset is hosted on Hugging Face and was last updated on December 27, 2024. The associated GitHub repository contains evaluation code and prompt templates used in the work.

Use Cases

Benchmarking AI models on mathematical reasoning error detection based on the dataset's stated purpose
Training models to identify logical or procedural flaws in problem-solving steps based on the benchmark's focus
Analyzing common error patterns in mathematical reasoning processes based on the dataset's content

Strengths

Dataset is associated with a specific benchmark and research paper, providing a clear purpose
Last update timestamp (2024-12-27 14:05:30) is provided, indicating recent activity

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count, file formats, and sample data are unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download

Provenance

Source: Qwen Team
Collection Method: Proposed as a benchmark; specific collection method is not detailed in the provided description.
Time Range: null
Freshness: Last updated 2024-12-27 14:05:30
Geography: null

null

Text Mathematical Reasoning Evaluation Benchmark Process Errors

Related Datasets

Quality Score

D37

Description

39

Source

36

Reputation

45

Access

26

Community

6.6K downloads

59 likes

0 views

Dataset Info

Author: Qwen
Created: Dec 11, 2024
Updated: Dec 27, 2024
Last synced: Apr 18, 2026

Access

26

Community

6.6K downloads

59 likes

0 views

Dataset Info

Author: Qwen
Created: Dec 11, 2024
Updated: Dec 27, 2024
Last synced: Apr 18, 2026

ProcessBench: A Benchmark for Identifying Process Errors in Mathematical Reasoning

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info