Studying Practical Challenges of Automated Code Review Suggestions

Authors - Farshad Kazemi
Venue - University of Waterloo, pp. 1-155, 2024

Related Tags - Theses 2024 code review software quality

Abstract - Code review is a critical step in software development, focusing on systematic source code inspection. It identifies potential defects and enhances code quality, maintainability, and knowledge sharing among developers. Despite its benefits, it is time-consuming and error-prone. Therefore, approaches such as Code Reviewer Recommendation (CRR) have been proposed to streamline the process. However, when deployed in real-world scenarios, they often fail to account for various complexities, making them impractical or even harmful. This thesis aims to identify and address challenges at various stages of the code review process: validity of recommendations, quality of the recommended reviewers, and the necessity and usefulness of CRR approaches considering emerging alternative automation. We approach these challenges in three empirical studies presented in three chapters of this thesis.

First, we empirically explore the validity of the recommended reviewers by measuring the rate of stale reviewers, i.e., those who no longer contribute to the project. We observe that stale recommendations account for a considerable portion of the suggestions provided by CRR approaches, accounting for up to 33.33% of the recommendations with a median share of 8.30% of all the recommendations. Based on our analysis, we suggest separating the reviewer contribution recency from the other factors used by the CRR objective function. The proposed filter reduces the staleness of recommendations, i.e., the Staleness Reduction Ratio (SRR) improves between 21.44%–92.39%.

While the first study assesses the validity of the recommendations, it does not measure their quality or potential unintended impacts. Therefore, we next probe the potential unintended consequences of assigning recommended reviewers. To this end, we study the impact of assigning recommended reviewers without considering the safety of the submitted changeset. We observe existing approaches tend to improve one or two quantities of interest while degrading others. We devise an enhanced approach, Risk Aware Recommender (RAR), which increases the project safety by predicting changeset bug proneness.

Given the evolving landscape of automation in code review, our final study examines whether human reviewers and, hence, recommendation tools are still beneficial to the review process. To this end, we focus on the behaviour of Review Comment Generators (RCGs), models trained to automate code review tasks, as a potential way to replace humans in the code review process. Our quantitative and qualitative study of the ACR-generated interrogative comments shows that ACR-generated and human-submitted comments differ in mood, i.e., whether the comment is declarative or interrogative. Our qualitative analysis of sampled comments demonstrates that ACR-generated interrogative comments suffer from limitations in the ACR capacity to communicate. Our observations show that neither task-specific ACRs nor LLM-based ones can fully replace humans in asking questions. Therefore, practitioners can still benefit from using code review tools.

In conclusion, our findings highlight the need for further support of human participants in the code review process. Thus, we advocate for the improvement of code review tools and approaches, particularly code review recommendation approaches. Furthermore, tool builders can use our observations and proposed methods to address two critical aspects of existing CRR approaches.

Preprint - PDF

Bibtex

@phdthesis{kazemi2024phd,
  Author = {Farshad Kazemi},
  Title = {{Studying Practical Challenges of Automated Code Review Suggestions}},
  Year = {2024},
  School = {University of Waterloo},
  Address = {200 University Ave. W., Waterloo, ON, Canada},
  Month = {September}
}