Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
CritPt contains fewer than 1,000 research-level physics reasoning tasks covering domains such as condensed matter, quantum physics, and astrophysics. Developed by the CritPt-Benchmark team and released in late 2024, it serves as a frontier benchmark for testing LLM capabilities on unpublished scientific problems.
The dataset is provided in Parquet format and is associated with Arxiv paper 2509.26574; users should refer to the publication for specific evaluation metrics and methodology.