Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
HLE-Verified is a refined benchmark containing between 1,000 and 10,000 high-difficulty reasoning items across scientific and technical domains. Released by skylenage in early 2026, this version provides a systematic revision of the original Humanity’s Last Exam (HLE) to address community-reported reliability issues.
The dataset is associated with Arxiv paper 2602.13964 and is provided in Parquet format.