RavenBuild: Context, Relevance, and Dependency Aware Build Outcome Prediction

Authors - Gengyi Sun, Sarra Habchi, Shane McIntosh
Venue - International Symposium on the Foundations of Software Engineering, pp. 45:1–45:22, 2024

Related Tags - FSE 2024 continuous integration build performance

Abstract - Continuous Integration (CI) is a common practice adopted by modern software organizations. It plays an especially important role for large corporations like Ubisoft, where thousands of build jobs are submitted daily. Indeed, the cadence of development progress is constrained by the pace at which CI services process build jobs. To provide faster CI feedback, recent work explores how build outcomes can be anticipated. Although early results show plenty of promise, the distinct characteristics of Project X—a AAA video game project at Ubisoft—present new challenges for build outcome prediction. In the Project X setting, changes that do not modify source code also incur build failures. We also observe that the code changes that have an impact that crosses the source-data boundary are more prone to build failures than code changes that do not impact data files. Since such changes are not fully characterized by the existing set of features for build outcome prediction, state-of-the-art models tend to underperform.

To incorporate the data context, we propose RavenBuild—a novel approach to build outcome prediction that leverages context-, relevance-, and dependency-aware features. In the Project X context, we observe that RavenBuild improves the F1-score of the failing class by 46%, the recall of the failing class by 76%, and the AUC by 28% with respect to the state-of-the-art BuildFast approach. To ease adoption in settings with heterogeneous project sets, we also provide a simplified alternative RavenBuild-CR, which excludes dependency-aware features. We observe across-the-board improvements when RavenBuild-CR is applied to 22 open-source projects and Project X. On the other hand, we find that a naïve Parrot approach, which simply echoes the previous build outcome as its prediction, is surprisingly competitive with BuildFast and RavenBuild. Though Parrot fails to predict when the build outcome differs from their immediate predecessor, Parrot serves well as a tendency indicator of the sequences in build outcome datasets. Thus, we recommend that future studies also compare to the Parrot approach as a baseline when evaluating build outcome prediction models.

Preprint - PDF

Bibtex

@inproceedings{sun2024fse,
  Author = {Gengyi Sun and Sarra Habchi and Shane McIntosh},
  Title = {{RavenBuild: Context, Relevance, and Dependency Aware Build Outcome Prediction}},
  Year = {2024},
  Booktitle = {Proc. of the International Symposium on the Foundations of Software Engineering (FSE)},
  Pages = {45:1–45:22}
}