Interrogative Comments Posed by Review Comment Generators: An Empirical Study of Gerrit

Authors - Farshad Kazemi, Maxime Lamothe, Shane McIntosh
Venue - International Symposium on Empirical Software Engineering and Measurement, pp. To appear, 2025

Related Tags - ESEM 2025 code review

Abstract - Background: Review Comment Generators (RCGs) are models trained to automate code review tasks. Prior work shows that RCGs can generate review comments to initiate discussion threads; however, their ability to interact with author responses is unclear. This can be especially problematic if RCGs pose interrogative comments, i.e., comments that ask questions of other review participants.

Aims: We set out to study the prevalence of RCG-generated interrogative code review comments, their similarity with the interrogative comments of humans, and the predictability of the generation of interrogative comments.

Method: We study three task-specific RCGs and three RCGs based on Large Language Models (LLMs) on data from the Gerrit project using quantitative and qualitative methods.

Results: We find that RCGs: (1) generate interrogative comments at a rate of 15.6%–65.26%; (2) differ from humans in generating such comments, which can stifle conversations if RCGs dissuade human reviewers from commenting deeply; and (3) produce interrogative comments with low predictability. Finally, we find that (4) the interrogative comments posed by LLM-based RCGs can differ even more substantially from human behaviour than those of task-specific RCGs. For example, the studied LLM-based RCGs pose rhetorical questions 3.16% of the time, whereas human-submitted interrogative comments pose rhetorical questions 8.74% of the time.

Conclusions: Our results suggest that neither task-specific nor LLM-based RCGs can replace human reviewers yet; however, we note opportunities for synergies. For example, RCGs tend to raise pertinent questions about exception handling of common APIs more frequently than human reviewers. Putting greater emphasis on technical comments generated by RCGs (rather than conversational ones, such as interrogative ones) will likely improve their perceived usefulness.

Preprint - PDF

Bibtex

@inproceedings{kazemi2025esem,
  Author = {Farshad Kazemi and Maxime Lamothe and Shane McIntosh},
  Title = {{Interrogative Comments Posed by Review Comment Generators: An Empirical Study of Gerrit}},
  Year = {2025},
  Booktitle = {Proc. of the International Symposium on Empirical Software Engineering and Measurement (ESEM)},
  Pages = {To appear}
}