Abstract - Continuous Integration (CI) is a common practice adopted by modern software organizations. It plays an especially important role for large corporations like Ubisoft, where thousands of build jobs are submitted daily. The CI process of video games, which are developed by studios like Ubisoft, involves assembling artifacts that are produced by personnel with various types of expertise, such as source code produced by developers, graphics produced by artists, and audio produced by musicians and sound experts. To weave these artifacts into a cohesive system, the build system—a key component in CI—processes each artifacts while respecting their intraand inter-artifact dependencies. In such projects, a change produced by any team can impact artifacts from other teams, and may cause defects if the transitive impact of changes is not carefully considered.
Therefore, to better understand the potential challenges and opportunities presented by multidisciplinary software projects, we conduct an empirical study of a recently launched video game project, which reveals that code files only make up 2.8% of the nodes in the build dependency graph, and code-to-code dependencies only make up 4.3% of all dependencies. We also observe that the impact of 44% of the studied source code changes crosses disciplinary boundaries, highlighting the importance of analyzing inter-artifact dependencies. A comparative analysis of cross-boundary changes with changes that do not cross boundaries indicates that cross-boundary changes are: (1) impacting a median of 120,368 files; (2) with a 51% probability of causing build failures; and (3) a 67% likelihood of introducing defects. All three measurements are larger than changes that do not cross boundaries to statistically significant degrees. We also find that cross-boundary changes are: (4) more commonly associated with gameplay functionality and feature additions that directly impact the game experience than changes that do not cross boundaries, and (5) disproportionately produced by a single team (74% of the contributors of cross-boundary changes are associated with that team).
Next, we set out to explore whether analysis of cross-boundary changes can be leveraged to accelerate CI. Indeed, the cadence of development progress is constrained by the pace at which CI services process build jobs. To provide faster CI feedback, recent work explores how build outcomes can be anticipated. Although early results show plenty of promise, prior work on build outcome prediction has largely focused on open-source projects that are code-intensive, while the distinct characteristics of a AAA video game project at Ubisoft presents new challenges and opportunities for build outcome prediction. In the video game setting, changes that do not modify source code also incur build failures. Moreover, we find that the code changes that have an impact that crosses the source-data boundary are more prone to build failures than code changes that do not impact data files. Since such changes are not fully characterized by the existing set of build outcome prediction features, state-of-the-art models tend to underperform.
Therefore, to accommodate the data context into build outcome prediction, we propose RavenBuild, a novel approach that leverages context, relevance, and dependency-aware features. We apply the state-of-the-art BuildFast model and RavenBuild to the video game project, and observe that RavenBuild improves the F1-score of the failing class by 46%, the recall of the failing class by 76%, and AUC by 28%. To ease adoption in settings with heterogeneous project sets, we also provide a simplified alternative RavenBuild-CR, which excludes dependency-aware features. We apply RavenBuild-CR on 22 open-source projects and the video game project, and observe across-the-board improvements as well. On the other hand, we find that a na ̈ıve Parrot approach, which simply echoes the previous build outcome as its prediction, is surprisingly competitive with BuildFast and RavenBuild. Though Parrot fails to predict when the build outcome differs from their immediate predecessor, Parrot serves well as a tendency indicator of the sequences in build outcome datasets. Therefore, future studies should also consider comparing to the Parrot approach as a baseline when evaluating build outcome prediction models.
Preprint - PDF
Bibtex