Repeated Builds During Code Review: An Empirical Study of the OpenStack Community

Authors - Rungroj Maipradit, Dong Wang, Patanamon Thongtanunam, Raula Gaikovina Kula, Yasutaka Kamei, Shane McIntosh
Venue - International Conference on Automated Software Engineering, pp. 153–165, 2023

Related Tags - ASE 2023 continuous integration code review resource waste

Abstract - Code review is a popular practice where developers critique each others' changes. Since automated builds can identify low-level issues (e.g., syntactic errors, regression bugs), it is not uncommon for software organizations to incorporate automated builds in the code review process. In such code review deployment scenarios, submitted change sets must be approved for integration by both peer code reviewers and automated build bots. Since automated builds may produce an unreliable signal of the status of a change set (e.g., due to 'flaky' or non-deterministic execution behaviour), code review tools, such as Gerrit, allow developers to request a 'recheck', which repeats the build process without updating the change set. We conjecture that an unconstrained recheck command will waste time and resources if it is not applied judiciously. To explore how the recheck command is applied in a practical setting, in this paper, we conduct an empirical study of 66,932 code reviews from the OpenStack community.

We quantitatively analyze (i) how often build failures are rechecked; (ii) the extent to which invoking recheck changes build failure outcomes; and (iii) how much waste is generated by invoking recheck. We observe that (i) 55% of code reviews invoke the recheck command after a failing build is reported; (ii) invoking the recheck command only changes the outcome of a failing build in 42% of the cases; and (iii) invoking the recheck command increases review waiting time by an average of 2,200% and equates to 187.4 compute years of waste—enough compute resources to compete with the oldest land living animal on earth.

Our observations indicate that the recheck command is frequently used after the builds fail, but does not achieve a high likelihood of build success. Based on a developer survey and our history-based quantitative findings, we encourage reviewer teams to think twice before rechecking and be considerate of waste. While recheck currently generates plenty of wasted computational resources and bloats waiting times, it also presents exciting future opportunities for researchers and tool builders to propose solutions that can reduce waste.

Preprint - PDF

Bibtex

@inproceedings{maipradit2023ase,
  Author = {Rungroj Maipradit and Dong Wang and Patanamon Thongtanunam and Raula Gaikovina Kula and Yasutaka Kamei and Shane McIntosh},
  Title = {{Repeated Builds During Code Review: An Empirical Study of the OpenStack Community}},
  Year = {2023},
  Booktitle = {Proc. of the International Conference on Automated Software Engineering (ASE)},
  Pages = {153–165}
}