Name: SciR: A Scientific Reasoning Benchmark for LLMs
Creator: sci-reason
Published: 2026-05-08T08:08:00
Keywords: Llm Benchmark, Causal Discovery, Benchmark, Text, Inductive Discovery, Scientific Reasoning, Deductive Logic, Synthetic

Description

SciR is a multi-domain benchmark for evaluating large language models on three forms of scientific reasoning: deductive logic, inductive rule discovery, and causal discovery. It includes parametric difficulty curves and a controlled natural-language vs. scientific-prose-obfuscation contrast. The dataset was created by sci-reason and was last updated on Hugging Face in June 2026.

Use Cases

Benchmarking deductive reasoning performance in LLMs based on the described logic tasks.
Evaluating inductive rule discovery abilities based on the described pattern-finding tasks.
Testing causal discovery and inference capabilities based on the described causal reasoning tasks.
Contrasting model performance on natural language versus obfuscated scientific prose as described.
Studying the scaling of reasoning difficulty with parametric curves as mentioned in the description.

Strengths

Benchmark covers three distinct scientific reasoning domains: deductive, inductive, and causal.
Includes parametric difficulty curves for controlled evaluation scaling.
Provides a contrast between natural language and scientific-prose-obfuscation formats.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: sci-reason
Collection Method: Likely synthetically generated for benchmarking purposes.
Freshness: Last updated 2026-06-12 12:16:18; freshness should be verified.

License is unknown; terms of use must be verified.

Text Llm Benchmark Causal Discovery Benchmark Inductive Discovery Scientific Reasoning Deductive Logic Synthetic

SciR: A Scientific Reasoning Benchmark for LLMs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info