Identifying and Understanding Header File Hotspots in C/C++ Build Processes

Authors - Shane McIntosh, Bram Adams, Meiyappan Nagappan, Ahmed E. Hassan
Venue - Automated Software Engineering, Vol. 23, No. 4, pp. 619-647, 2016

Abstract - Software developers rely on a fast build system to incrementally compile their source code changes and produce modified deliverables for testing and deployment. Header files, which tend to trigger slow rebuild processes, are most problematic if they also change frequently during the development process, and hence, need to be rebuilt often. In this paper, we propose an approach that analyzes the build dependency graph (i.e., the data structure used to determine the minimal list of commands that must be executed when a source code file is modified), and the change history of a software system to pinpoint header file hotspots --- header files that change frequently and trigger long rebuild processes. Through a case study on the GLib, PostgreSQL, Qt, and Ruby systems, we show that our approach identifies header file hotspots that, if improved, will provide greater improvement to the total future build cost of a system than just focusing on the files that trigger the slowest rebuild processes, change the most frequently, or are used the most throughout the codebase. Furthermore, regression models built using architectural and code properties of source files can explain 32%-57% of these hotspots, identifying subsystems that are particularly hotspot-prone and would benefit the most from architectural refinement.

Preprint - PDF

Bibtex

@article{mcintosh2016ause,
  Author = {Shane McIntosh and Bram Adams and Meiyappan Nagappan and Ahmed E. Hassan},
  Title = {{Identifying and Understanding Header File Hotspots in C/C++ Build Processes}},
  Year = {2016},
  Journal = {Automated Software Engineering},
  Volume = {23},
  Number = {4},
  Pages = {619-647}
}