FormalRx-Test: Diagnostic Evaluation Framework for Autoformalization

Name: FormalRx-Test: Diagnostic Evaluation Framework for Autoformalization
Creator: LARK-Lab
Published: 2026-05-05T18:27:07
Keywords: Autoformalization, Error Taxonomy, Benchmark, Lean 4, Text, Natural Language, Diagnostic Evaluation

by LARK-LabUpdated 1mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

FormalRx-Test is the official test split of the FormalRx diagnostic evaluation framework (Wang et al., 2025). It contains 7,030 natural-language / Lean 4 statement pairs annotated under the SCI Error Taxonomy (Semantic, Constraint, Implementation). The dataset was created by LARK-Lab and last updated on HuggingFace in May 2026.

Use Cases

Evaluate autoformalization model performance based on alignment verdicts.
Diagnose and categorize errors in formalized statements based on the SCI Error Taxonomy.
Localize errors within natural-language / Lean 4 statement pairs.
Benchmark diagnostic capabilities of formalization tools.
Train models to generate actionable feedback for formalization tasks.

Strengths

Provides 7,030 annotated statement pairs for evaluation.
Supports four diagnostic capabilities: alignment verdicts, error categorization, error localization, and one unspecified capability.
Annotated under a structured SCI Error Taxonomy (Semantic, Constraint, Implementation).

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is known (7,030), but file formats, size, and license are unknown.
The dataset is a test split; the full training or development data is not included here.

Provenance

Source: LARK-Lab
Collection Method: Likely created as part of the FormalRx framework research (Wang et al., 2025).
Freshness: Last updated 2026-05-07 03:34:00; freshness should be verified.

Text Autoformalization Error Taxonomy Benchmark Lean 4 Natural Language Diagnostic Evaluation

Related Datasets

Quality Score

D38

Description

39

Source

41

Reputation

38

Access

26

Community

7 downloads

1 likes

0 views

Dataset Info

Author: LARK-Lab
Created: May 5, 2026
Updated: May 7, 2026
Last synced: Jun 7, 2026

Access

26

Community

7 downloads

1 likes

0 views

Dataset Info

Author: LARK-Lab
Created: May 5, 2026
Updated: May 7, 2026
Last synced: Jun 7, 2026

FormalRx-Test: Diagnostic Evaluation Framework for Autoformalization

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info