Name: Chimera Bench: 8,503 Multi-Step Reasoning Problems Across Four Domains
Creator: vectionlabs
Published: 2026-04-14T13:53:33
Keywords: Computer Science, Mathematics, Text, Reasoning Benchmark, Physics, Multi Step Problems

Description

VectionLabs created Chimera Bench V1, a benchmark dataset containing 8,503 articulated multi-step problems. The dataset is organized into four domains: 3,803 problems in MATH, 1,500 in CODE, 1,500 in SCIENCE, and an unspecified number in REASONING. It was last updated on the Hugging Face platform in April 2026.

Use Cases

Benchmarking large language model performance on multi-step word problems based on the MATH domain's focus.
Evaluating code reasoning and algorithm design capabilities based on the CODE domain's tasks.
Testing scientific and physics-based reasoning based on the SCIENCE domain's content.
Training models for hybrid intelligence tasks based on the dataset's articulated multi-step problem structure.

Strengths

Contains 8,503 total problems, which is described as larger than the GSM8K benchmark.
Covers four distinct reasoning domains: MATH, CODE, SCIENCE, and REASONING.
Includes 3,803 problems specifically in the MATH domain, focusing on multi-step word problems.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count per domain beyond MATH and CODE is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: VectionLabs via Hugging Face.
Collection Method: Likely curated or synthesized for benchmarking purposes.
Time Range: null
Freshness: Last updated 2026-04-15 19:18:01; freshness should be verified.
Geography: null

License information is unknown; check the dataset page for usage restrictions.

Text Computer Science Mathematics Reasoning Benchmark Physics Multi Step Problems

Chimera Bench: 8,503 Multi-Step Reasoning Problems Across Four Domains

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info