SGI-Bench is a scientist-aligned benchmark for evaluating Scientific General Intelligence (SGI) in large language models. It spans 10 disciplines and contains more than 1,000 expert-curated samples inspired by Science's 125 Big Questions. The dataset was created by InternScience and was last updated on Hugging Face in June 2026.
Use Cases
- Benchmarking LLM performance on scientific deliberation tasks based on the described inquiry cycle.
- Evaluating LLM conception abilities for generating scientific ideas across 10 disciplines.
- Testing agentic frameworks for scientific action and perception as described in the benchmark.
- Analyzing model performance on expert-curated samples inspired by foundational scientific questions.
Strengths
- More than 1,000 expert-curated samples provide a substantial evaluation corpus.
- Benchmark spans 10 distinct scientific disciplines for broad coverage.
- Evaluation framework is structured around the full agentic inquiry cycle: Deliberation, Conception, Action, and Perception.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment for large-scale training.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- Hugging Face dataset repository created by author 'InternScience'.
- Collection Method
- Expert-curated samples inspired by Science's 125 Big Questions.
- Freshness
- Last updated 2026-06-02 12:04:48; freshness should be verified.