Binary Build Prediction Metrics from TravisTorrent and GHALogs, 2013-2023
175,706 builds from two open-source CI/CD platforms, TravisTorrent (100,000 builds, 2013–2017) and GHALogs (75,706 workflows, 2023), form the basis for evaluating machine learning models that predict build success. The dataset, created by Lalit Narayan Mishra and last updated in May 2026, provides metrics for a study on preventing temporal data leakage in build prediction. It demonstrates that removing leaky features reduces reported accuracy by up to 15.07 percentage points, revealing a more realistic performance baseline.