Sign in to view source links and access this dataset
Description
A benchmark for evaluating Scientific General Intelligence across the full inquiry cycle, spanning 10 disciplines and more than 1,000 expert-curated samples inspired by Science's 125 Big Questions. The dataset, SGI-Reasoning-Lite, was created by InternScience and last updated on 2026-06-02. It employs an agentic evaluation framework for probing LLMs.
Use Cases
Benchmarking LLM performance on scientific reasoning tasks based on the described Deliberation, Conception, Action, and Perception inquiry cycle
Evaluating agentic AI systems using the described scientist-aligned, multi-disciplinary framework
Training or fine-tuning models for scientific question-answering based on the 1,000+ expert-curated samples
Analyzing model capabilities across 10 different scientific disciplines as mentioned in the description
Strengths
More than 1,000 expert-curated samples provide a substantial evaluation corpus
Covers 10 distinct scientific disciplines for broad assessment
Structured around Science's 125 Big Questions, providing a high-level conceptual foundation
Limitations
Column-level documentation is absent; field semantics must be inferred after download
Row count is unknown, which may limit suitability assessment
Description metadata is limited; actual data quality requires manual inspection after download
Provenance
Source
InternScience via Hugging Face
Collection Method
Expert-curated samples inspired by Science's 125 Big Questions
Freshness
Last updated 2026-06-02 12:05:27
License is unknown; terms of use must be verified before application.