Name: SGI-Bench: Scientist-Aligned Benchmark for Evaluating LLMs Across 10 Disciplines
Creator: InternScience
Published: 2026-03-25T06:59:43
Keywords: Science Benchmark, Agentic Framework, Benchmark, Llm Evaluation, Scientific General Intelligence, Text

Description

SGI-Bench is a scientist-aligned benchmark for evaluating Scientific General Intelligence (SGI) in large language models. It spans 10 disciplines and contains more than 1,000 expert-curated samples inspired by Science's 125 Big Questions. The dataset was created by InternScience and was last updated on Hugging Face in June 2026.

Use Cases

Benchmarking LLM performance on scientific deliberation tasks based on the described inquiry cycle.
Evaluating LLM conception abilities for generating scientific ideas across 10 disciplines.
Testing agentic frameworks for scientific action and perception as described in the benchmark.
Analyzing model performance on expert-curated samples inspired by foundational scientific questions.

Strengths

More than 1,000 expert-curated samples provide a substantial evaluation corpus.
Benchmark spans 10 distinct scientific disciplines for broad coverage.
Evaluation framework is structured around the full agentic inquiry cycle: Deliberation, Conception, Action, and Perception.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment for large-scale training.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: Hugging Face dataset repository created by author 'InternScience'.
Collection Method: Expert-curated samples inspired by Science's 125 Big Questions.
Freshness: Last updated 2026-06-02 12:04:48; freshness should be verified.

Text Science Benchmark Agentic Framework Benchmark Llm Evaluation Scientific General Intelligence

SGI-Bench: Scientist-Aligned Benchmark for Evaluating LLMs Across 10 Disciplines

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info