PutnamBench comprises over 1300 manual formalizations of problems from the William Lowell Putnam Mathematical Competition between 1965 and 2023. The benchmark supports three formal languages: Lean 4, Isabelle, and Coq. It was created by amitayusht and last updated on Hugging Face in June 2024.
Use Cases
- Benchmarking theorem-proving algorithms based on competition-level mathematics problems.
- Evaluating the performance of formal language systems like Lean, Isabelle, and Coq.
- Training or testing AI models for automated mathematical reasoning.
- Studying the formalization process for complex mathematical statements.
Strengths
- Over 1300 manual formalizations provide a substantial corpus for evaluation.
- Problems sourced from a prestigious competition (William Lowell Putnam) spanning 1965-2023.
- Supports three major formal languages: Lean 4, Isabelle, and Coq.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Last updated 2024-06-11 21:42:44; freshness should be verified.
Provenance
- Source
- William Lowell Putnam Mathematical Competition
- Collection Method
- Manual formalization
- Time Range
- 1965-2023
- Freshness
- 2024-06-11 21:42:44