SGI-DeepResearch Gold: A Scientist-Aligned Benchmark for LLM Evaluation

Name: SGI-DeepResearch Gold: A Scientist-Aligned Benchmark for LLM Evaluation
Creator: InternScience
Published: 2026-03-10T15:58:13
Keywords: Agentic Framework, Scientific Benchmark, Benchmark, Llm Evaluation, Text, Multidisciplinary

by InternScienceUpdated 1mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

SGI-Bench is a benchmark for evaluating Scientific General Intelligence in large language models across the full inquiry cycle. It spans 10 scientific disciplines and contains more than 1,000 expert-curated samples inspired by Science's 125 Big Questions. The dataset was created by InternScience and last updated in June 2026.

Use Cases

Benchmarking LLM performance on scientific deliberation tasks based on the described inquiry cycle
Evaluating model conception abilities across 10 scientific disciplines
Testing agentic action planning within a scientist-aligned framework
Assessing perception and interpretation of complex scientific questions

Strengths

More than 1,000 expert-curated samples provide a substantial evaluation corpus
Benchmark spans 10 distinct scientific disciplines for broad coverage
Evaluation framework is structured around the four-stage inquiry cycle: Deliberation, Conception, Action, and Perception

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download

Provenance

Source: InternScience
Collection Method: Expert-curated samples inspired by Science's 125 Big Questions
Freshness: Last updated 2026-06-02 12:04:08; freshness should be verified

License is unknown; terms of use must be verified before application.

Text Agentic Framework Scientific Benchmark Benchmark Llm Evaluation Multidisciplinary

Related Datasets

Quality Score

D39

Description

42

Source

39

Reputation

44

Access

22

Community

451 downloads

1 likes

0 views

Dataset Info

Author: InternScience
Created: Mar 10, 2026
Updated: Jun 2, 2026
Last synced: Jun 26, 2026

Access

22

Community

451 downloads

1 likes

0 views

Dataset Info

Author: InternScience
Created: Mar 10, 2026
Updated: Jun 2, 2026
Last synced: Jun 26, 2026

SGI-DeepResearch Gold: A Scientist-Aligned Benchmark for LLM Evaluation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info