Sign in to view source links and access this dataset
Description
Herald Proofs is a dataset of 45,000 natural language to formal logic (NL-FL) proofs, constituting the proof part of the larger Herald dataset. The dataset was created by authors including Guoxiong Gao and Yutong Wang and presented at the International Conference on Learning Representations in 2025. It is associated with the Lean 4 theorem prover, specifically version v4.11.0.
Use Cases
Training models for natural language to formal proof translation based on the described NL-FL pairs.
Benchmarking automated theorem provers on a corpus of annotated formal proofs.
Developing AI assistants for interactive theorem proving in Lean 4 based on the provided proof structures.
Studying the alignment between informal mathematical language and formal logic representations.
Strengths
Contains 45,000 proof instances, providing a substantial corpus for model training.
Associated with a peer-reviewed publication (ICLR 2025), indicating academic scrutiny.
Specifically tied to the Lean 4 theorem prover version v4.11.0, ensuring version compatibility.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
HuggingFace user 'FrenzyMath', associated with the academic paper 'Herald: A Natural Language Annotated Lean 4 Dataset'.
Collection Method
Likely extracted or generated from the Lean 4 theorem proving environment.
Freshness
Last updated 2025-05-13 11:23:36; freshness should be verified.
License is unknown; users should verify terms before use.