Name: Qwen3.5-2B-Base: Multi-Hop Reasoning Blind Spots Evaluation
Creator: chayma-rhaiem
Published: 2026-03-08T14:46:48
Keywords: Multihop Reasoning, Ai Benchmark, Benchmark, Text, Language Model, Graph, Reasoning Evaluation

Description

An evaluation dataset probing 18 Knowledge Graph-style reasoning tasks on the Qwen/Qwen3.5-2B-Base model. It was created by chayma-rhaiem and last updated on March 8, 2026. The dataset tests the model in its raw base form across parametric memory, standard grounded reasoning, and advanced grounded reasoning tasks.

Use Cases

Benchmarking model performance on parametric memory tasks based on probes 1-10 with no passage provided.
Evaluating standard grounded reasoning capabilities based on probes 11-15 which include a source passage.
Testing advanced reasoning requiring implicit inference or contradiction based on probes 16-18 with a provided passage.
Identifying specific blind spots in the Qwen3.5-2B-Base model's multi-hop reasoning across 18 distinct tasks.

Strengths

Contains 18 distinct reasoning probes designed to test specific model capabilities.
Explicitly tests the model in its raw base form with no external knowledge graph attached.
Covers three reasoning categories: parametric memory, standard grounded, and advanced grounded.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface
Collection Method: Likely generated to evaluate the Qwen/Qwen3.5-2B-Base model's reasoning capabilities.
Freshness: Last updated 2026-03-08 15:22:27; freshness should be verified.

License is unknown; terms of use must be verified before application.

Text Graph Multihop Reasoning Ai Benchmark Benchmark Language Model Reasoning Evaluation

Qwen3.5-2B-Base: Multi-Hop Reasoning Blind Spots Evaluation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info