Sign in to view source links and access this dataset
Description
VectionLabs created Chimera Bench V1, a benchmark dataset containing 8,503 articulated multi-step problems. The dataset is organized into four domains: 3,803 problems in MATH, 1,500 in CODE, 1,500 in SCIENCE, and an unspecified number in REASONING. It was last updated on the Hugging Face platform in April 2026.
Use Cases
Benchmarking large language model performance on multi-step word problems based on the MATH domain's focus.
Evaluating code reasoning and algorithm design capabilities based on the CODE domain's tasks.
Testing scientific and physics-based reasoning based on the SCIENCE domain's content.
Training models for hybrid intelligence tasks based on the dataset's articulated multi-step problem structure.
Strengths
Contains 8,503 total problems, which is described as larger than the GSM8K benchmark.
Covers four distinct reasoning domains: MATH, CODE, SCIENCE, and REASONING.
Includes 3,803 problems specifically in the MATH domain, focusing on multi-step word problems.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count per domain beyond MATH and CODE is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
VectionLabs via Hugging Face.
Collection Method
Likely curated or synthesized for benchmarking purposes.
Time Range
null
Freshness
Last updated 2026-04-15 19:18:01; freshness should be verified.
Geography
null
License information is unknown; check the dataset page for usage restrictions.