Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
ShahzebKhoso hosts raw evaluation metrics, execution telemetry logs, and structural syntax outputs from running the Mostly Basic Python Problems (MBPP) benchmark against the StarCoder2 15B base model. The dataset captures telemetry from conversational evaluation loops to establish a baseline for unaligned foundational weights. It was last updated on May 28, 2026.
License is unknown, which may restrict usage.