Name: NuminaMath-LEAN: 100K Mathematical Competition Problems Formalized in Lean 4
Creator: AI-MO
Published: 2025-07-31T13:52:57
Keywords: Arxiv250411354, Librarypolars, Modalitytext, Size Categories100 Kn1 M, Mathematics, Librarymlcroissant, Librarydatasets, Librarypandas, Text, Parquet, Formal Verification, Regionus, Large Scale, Theorem Proving, Licenseapache 20, Competition Problems

Description

NuminaMath-LEAN is a large-scale dataset of 100,000 mathematical competition problems formalized in the Lean 4 theorem prover language. It was created by AI-MO and is derived from a challenging subset of the NuminaMath 1.5 dataset, focusing on problems from competitions like the IMO and USAMO. The dataset was last updated on July 31, 2025.

Use Cases

Training automated theorem provers based on the formal statements and proofs.
Evaluating the performance of formal reasoning models on competition-level problems.
Fine-tuning large language models for mathematical reasoning using formalized Lean 4 code.
Studying the structure of high-difficulty mathematical proofs from competitions like the IMO.

Strengths

Contains 100,000 formalized problems, described as the largest collection of its kind.
Focuses on a challenging subset of problems from prestigious competitions.
Data is human-annotated for formal statements and proofs.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is known, but specific file formats and data structure details are unknown.
Data may reflect bias inherent to its source, focusing on specific competition problems.

Provenance

Source: AI-MO, derived from the NuminaMath 1.5 dataset.
Collection Method: Problems are formalized in the Lean 4 theorem prover language.
Time Range: null
Freshness: Last updated 2025-07-31 15:18:45; freshness should be verified.
Geography: null

null

Text Parquet Arxiv250411354 Librarypolars Modalitytext Size Categories100 Kn1 M Mathematics Librarymlcroissant Librarydatasets Librarypandas Formal Verification Regionus Large Scale Theorem Proving Licenseapache 20 Competition Problems

NuminaMath-LEAN: 100K Mathematical Competition Problems Formalized in Lean 4

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info