Can Duplicate Questions on Stack Overflow Benefit the Software Development Community?

 Best Student Presentation of the Mining Challenge track 
Authors - Durham Abric, Oliver E. Clark, Matthew Caminiti, Keheliya Gallaba, Shane McIntosh
Venue - International Conference on Mining Software Repositories, Mining challenge, pp. 230–234, 2019

Related Tags - MSR 2019 anti-patterns

Abstract - Duplicate questions on Stack Overflow are questions that are flagged as being conceptually equivalent to a previously posted question. Stack Overflow suggests that duplicate questions should not be discussed by users, but rather that attention should be redirected to their previously posted counterparts. Roughly 53% of closed Stack Overflow posts are closed due to duplication. Despite their supposed overlapping content, user activity suggests duplicates may generate additional or superior answers. Approximately 9% of duplicates receive more views than their original counterparts despite being closed.

In this paper, we analyze duplicate questions from two perspectives. First, we analyze the experience of those who post duplicates using activity and reputation-based heuristics. Second, we compare the content of duplicates both in terms of their questions and answers to determine the degree of similarity between each duplicate pair. Through analysis of the MSR challenge dataset, we find that although duplicate questions are more likely to be created by inexperienced users, they often receive dissimilar answers to their original counterparts. Indeed, supplementary textual analysis using Natural Language Processing (NLP) techniques suggests duplicate questions provide additional information about the underlying concepts being discussed. We recommend that the Stack Overflow’s duplication policy be revised to account for the benefits that leaving duplicate questions open may have for the developer community.

Preprint - PDF


  Author = {Durham Abric and Oliver E. Clark and Matthew Caminiti and Keheliya Gallaba and Shane McIntosh},
  Title = {{Can Duplicate Questions on Stack Overflow Benefit the Software Development Community?}},
  Year = {2019},
  Booktitle = {Proc. of the International Conference on Mining Software Repositories (MSR)},
  Pages = {230–234}