Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
15,282 behavioral annotations of LLM and VLM reasoning traces were collected by neulab. The dataset covers responses from 15 models across 6 benchmarks, with each row containing correctness and a JSON-encoded behavioral annotation. It was last updated on 2026-05-08.
License information is unknown and should be verified before use.