Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
LiAutoAISiliconLab created a benchmark for evaluating Tool-Integrated Reasoning (TIR) Search Agents, last updated on 2026-04 09. The benchmark uses a ParaWorld Engine that simulates a search engine grounded in fictional, future-situated facts to ensure isolation from model parametric memory. The dataset is designed to eliminate data contamination concerns during agent evaluation.
License restrictions are unknown and should be checked before use.