Sign in to view source links and access this dataset
Description
14,056 records of research-level mathematical problems extracted from academic papers, open-problem lists, and workshop sheets. Each record contains an original question, a rewritten self-contained statement, taxonomy labels, and open-status metadata. The dataset was created by 'amphora' and was last updated on the Hugging Face platform in May 2026.
Use Cases
Training or evaluating mathematical reasoning agents based on research-level problem statements.
Classifying mathematical problems by topic based on the provided taxonomy labels.
Studying the structure of open research problems based on the open-status metadata.
Generating self-contained problem statements based on the rewritten versions of extracted questions.
Strengths
Contains 14,056 distinct research-level mathematical problem records.
Each problem includes both an original extracted question and a rewritten self-contained statement.
Provides structured metadata including taxonomy labels and open-status information.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is known, but the specific data format and file size are unknown.
Data may reflect the temporal and source bias inherent to the academic papers and lists from which it was extracted.
Provenance
Source
Extracted from academic papers, open-problem lists, and workshop sheets.
Collection Method
Extraction and rewriting process described in the associated paper 'ResearchMath-14k: Leveraging Internet-Search Agents to Scale Research-Level Mathematical Reasoning'.
Freshness
Last updated 2026-05-28 02:11:42.
License information is unknown and should be verified before use.