ChemCoTBench-V2: 5,620-Sample Benchmark for Chemical Reasoning in LLMs

Name: ChemCoTBench-V2: 5,620-Sample Benchmark for Chemical Reasoning in LLMs
Creator: fresnellll
Published: 2026-06-01T14:21:27
Keywords: Chemical Reasoning, Benchmark, Llm Evaluation, Text, Science Qa

by fresnellllUpdated 1mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

ChemCoTBench-V2 is a public 5,620-sample active benchmark for evaluating chemical reasoning in large language models. The dataset, created by fresnellll, evaluates both final-answer correctness and process-level reasoning, pairing model-facing inputs with verified formal reasoning traces. It was last updated on June 3, 2026.

Use Cases

Benchmarking the final-answer accuracy of LLMs on chemical problems based on the verified answers.
Evaluating the step-by-step reasoning process of LLMs based on the provided formal reasoning traces.
Training or fine-tuning models for improved scientific reasoning based on the model-facing inputs and verified traces.

Strengths

Contains 5,620 benchmark samples for evaluation.
Each item includes a verified formal reasoning trace for process-level assessment.
Specifically designed for evaluating both answer correctness and reasoning process in chemical domains.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface user fresnellll
Collection Method: Likely curated for the purpose of model evaluation, as described in the associated research.
Freshness: Last updated 2026-06-03 06:23:18; freshness should be verified.

License is unknown; terms of use must be verified before application.

Text Chemical Reasoning Benchmark Llm Evaluation Science Qa

Related Datasets

Quality Score

D38

Description

39

Source

39

Reputation

40

Access

26

Community

30 downloads

1 likes

0 views

Dataset Info

Author: fresnellll
Created: Jun 1, 2026
Updated: Jun 3, 2026
Last synced: Jun 11, 2026

Access

26

Community

30 downloads

1 likes

0 views

Dataset Info

Author: fresnellll
Created: Jun 1, 2026
Updated: Jun 3, 2026
Last synced: Jun 11, 2026

ChemCoTBench-V2: 5,620-Sample Benchmark for Chemical Reasoning in LLMs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info