Binary Build Prediction Metrics from TravisTorrent and GHALogs, 2013-2023
by Lalit Narayan Mishra·Updated 9d ago
5.5 KB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
175,706 builds from two open-source CI/CD platforms, TravisTorrent (100,000 builds, 2013–2017) and GHALogs (75,706 workflows, 2023), form the basis for evaluating machine learning models that predict build success. The dataset, created by Lalit Narayan Mishra and last updated in May 2026, provides metrics for a study on preventing temporal data leakage in build prediction. It demonstrates that removing leaky features reduces reported accuracy by up to 15.07 percentage points, revealing a more realistic performance baseline.
Use Cases
Benchmarking build prediction models based on static pre-build project metadata.
Studying the impact of temporal data leakage on reported model accuracy.
Comparing predictive features across different CI/CD platforms like Travis CI and GitHub Actions.
Investigating the relationship between project maturity, build history, and build success.
Strengths
Contains 175,706 build records spanning a 10-year period from 2013 to 2023.
Validates a methodology on two distinct data sources: TravisTorrent and GHALogs.
Provides concrete accuracy figures (e.g., 82.73% and 83.30%) for models using only legitimate pre-build features.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count for the provided evaluation metrics file is unknown, which may limit suitability assessment.
The 5.5 KB file size suggests the dataset contains summary metrics, not the raw build data.
Provenance
Source
figshare, authored by Lalit Narayan Mishra.
Collection Method
Build data aggregated from TravisTorrent and GHALogs CI/CD platforms.
Time Range
2013 to 2023
Freshness
Last updated 2026-05-27 17:39:24; freshness should be verified.
Data is provided in XLS format; users will need compatible spreadsheet software or libraries.