Name: MPW-Bench: Tool-Integrated Reasoning Agent Benchmark in a Parallel World
Creator: LiAutoAISiliconLab
Published: 2026-04-09T09:22:35
Keywords: Agent Benchmark, Benchmark, Text, Tool Integrated Reasoning, Search Simulation, Synthetic Data

Description

LiAutoAISiliconLab created a benchmark for evaluating Tool-Integrated Reasoning (TIR) Search Agents, last updated on 2026-04 09. The benchmark uses a ParaWorld Engine that simulates a search engine grounded in fictional, future-situated facts to ensure isolation from model parametric memory. The dataset is designed to eliminate data contamination concerns during agent evaluation.

Use Cases

Benchmarking agent reasoning performance based on interactions with a simulated search engine.
Evaluating tool-integrated search agents based on a controlled, fictional knowledge environment.
Testing agent isolation from parametric memory based on the use of a parallel world simulation.

Strengths

Designed to eliminate data contamination concerns by using a fictional, future-situated fact base.
Provides a controlled environment for agent evaluation, ensuring complete isolation from model parametric memory.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: LiAutoAISiliconLab via Hugging Face
Collection Method: Likely synthetically generated by a ParaWorld Engine to simulate a search environment.
Time Range: Future-situated fictional facts.
Freshness: Last updated 2026-04-09 09:49:14; freshness should be verified.
Geography: Not applicable; based on a fictional parallel world.

License restrictions are unknown and should be checked before use.

Text Agent Benchmark Benchmark Tool Integrated Reasoning Search Simulation Synthetic Data

MPW-Bench: Tool-Integrated Reasoning Agent Benchmark in a Parallel World

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info