Name: SGI-Bench: Scientist-Aligned LLM Benchmark Across 10 Disciplines
Creator: InternScience
Published: 2026-03-25T07:00:08
Keywords: Agentic Framework, Scientific Benchmark, Benchmark, Llm Evaluation, Text, Science Questions

Description

A benchmark for evaluating Scientific General Intelligence across the full inquiry cycle, spanning 10 disciplines and more than 1,000 expert-curated samples inspired by Science's 125 Big Questions. The dataset, SGI-Reasoning-Lite, was created by InternScience and last updated on 2026-06-02. It employs an agentic evaluation framework for probing LLMs.

Use Cases

Benchmarking LLM performance on scientific reasoning tasks based on the described Deliberation, Conception, Action, and Perception inquiry cycle
Evaluating agentic AI systems using the described scientist-aligned, multi-disciplinary framework
Training or fine-tuning models for scientific question-answering based on the 1,000+ expert-curated samples
Analyzing model capabilities across 10 different scientific disciplines as mentioned in the description

Strengths

More than 1,000 expert-curated samples provide a substantial evaluation corpus
Covers 10 distinct scientific disciplines for broad assessment
Structured around Science's 125 Big Questions, providing a high-level conceptual foundation

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download

Provenance

Source: InternScience via Hugging Face
Collection Method: Expert-curated samples inspired by Science's 125 Big Questions
Freshness: Last updated 2026-06-02 12:05:27

License is unknown; terms of use must be verified before application.

Text Agentic Framework Scientific Benchmark Benchmark Llm Evaluation Science Questions

SGI-Bench: Scientist-Aligned LLM Benchmark Across 10 Disciplines

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info