[修士論文] Incomplete change detection based on co-change graph structures

小林研M2のKaminagaさんが修士論文を提出しました.

題目:Incomplete change detection based on co-change graph structures
論文概要:

The increasing scale of the software makes it difficult for developers to track the scope of the relevant change propagation, which leads to some changes being overlooked, thus creating bugs. To help developers reduce such overlooks and eliminate these bugs at an early stage, many dynamic analysis-based techniques for detecting the impact of changes have been proposed. These approaches tend to find and recommend possible locations of change propagation by the strength of the association rules extracted from the change history.

However, it has also been shown that when developers make changes, they often bundle unrelated changes in a single change, thus creating the so-called composite commits. So, changes with different intents will be mixed in these commits. Hence, if we directly recommend changes based on the strength of the association rules extracted from the overall changes without considering the relationships within the changes, the accuracy of the recommendations may be affected.

To solve this problem, in this study, we first used an actual incomplete change dataset, converted these commits into co-change graphs and investigated their graph structures. And then proposed an approach to improve the recommendation accuracy based on the location features of the overlooked files obtained from the investigation of the graph structures.

From the investigation of the 446 graphs constructed from the 18 projects, we observed that incomplete changes have a loose internal structure and in about half of the graphs, overlooked changes were associated with only a few portion of the commit, while in the other half of the cases, overlooked changes are associated with multiple portion of the commit of different intents. And we can find most of the overlooked files by checking around the central file calculated by centrality. In addition, overlooked files do not show a tendency to appear in large changes nor a small change, but we can often find one in the largest change with high probability.

Based on the above findings, we proposed a recommendation approach that focuses only on the major changes and an approach that integrates the superior recommendation results from multiple parts of the commit with different intents to obtain the recommendation results. We examined the effectiveness of the two approaches, and the results showed that both approaches give better results than the existing ones in terms of precision.