Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Raw evaluation metrics, execution telemetry logs, and structural syntax outputs from running the Mostly Basic Python Problems (MBPP) benchmark against the StarCoder2 7B base model. The dataset documents behavioral dynamics of mid-tier foundational weights in automated conversational evaluation workflows. It was authored by ShahzebKhoso and last updated on May 28, 2026.
License is unknown; terms of use must be verified before application.