HLE-Verified: A Systematically Verified Revision of the Humanity's Last Exam Benchmark

Name: HLE-Verified: A Systematically Verified Revision of the Humanity's Last Exam Benchmark
Creator: skylenage-ai
Published: 2026-02-13T15:24:56
Keywords: Verification, Ai Benchmark, Benchmark, Text, Scientific Domains, Reasoning Evaluation

by skylenage-aiUpdated 3mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Humanity’s Last Exam (HLE) is a high-difficulty, multi-domain benchmark for evaluating advanced reasoning. This dataset, created by skylenage-ai and last updated on 2026-02-27, represents a structured revision and verification of the original benchmark items based on community feedback.

Use Cases

Benchmarking AI reasoning capabilities based on the high-difficulty, multi-domain nature described
Training models for scientific and technical problem-solving based on the benchmark's domain coverage
Studying dataset verification and revision processes based on the dataset's stated purpose

Strengths

Dataset is focused on a high-difficulty, multi-domain benchmark for advanced reasoning
Represents a systematic verification and revision process based on community feedback
Last updated on 2026-02-27, indicating recent maintenance

Limitations

Description metadata is limited; actual data quality requires manual inspection after download
Column-level documentation is absent; field semantics must be inferred after download
Row count, file formats, and license are unknown, which may limit suitability assessment

Provenance

Source: skylenage-ai on Hugging Face
Collection Method: Structured revision and verification of the original Humanity's Last Exam benchmark
Time Range: null
Freshness: Last updated 2026-02-27 10:58:01
Geography: null

null

Text Verification Ai Benchmark Benchmark Scientific Domains Reasoning Evaluation

Related Datasets

Quality Score

C41

Description

42

Source

39

Reputation

55

Access

26

Community

17.4K downloads

16 likes

0 views

Dataset Info

Author: skylenage-ai
Created: Feb 13, 2026
Updated: Feb 27, 2026
Last synced: Jun 23, 2026

Access

26

Community

17.4K downloads

16 likes

0 views

Dataset Info

Author: skylenage-ai
Created: Feb 13, 2026
Updated: Feb 27, 2026
Last synced: Jun 23, 2026

HLE-Verified: A Systematically Verified Revision of the Humanity's Last Exam Benchmark

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info