[修士論文] Deep Learning based Semantics Model for Software Defect Prediction

2/4 に小林研M2の黎さんが修士論文の発表を行いました.

論文題目:Deep Learning based Semantics Model for Software Defect Prediction
論文概要:

Software defect prediction (SDP) is an important technique that can help developers to focus on modules which are more likely to occur failure. However, traditional methods mainly focus on statistical metrics and ignore the semantics and syntax of source code. Although existing deep learning-based can generate semantics features, they can not learn the semantics of identical token under different contexts. Besides, Due to the limited amount of data, the deep learning-based models may not be able to perform their corresponding prediction capabilities.
In this paper, to deeply extract the semantics of source code, we propose a technique to extract semantic feature by using BERT model for SDP. We pretrain BERT model with a large code corpus collected from BigCode. Then we convert token sequences into embeded token vectors with our BERT model and utilize Bidirectional Long-Short Term Memory Network (BiLSTM) to learn contexts information among tokens. After that, we adopt max pooling and average pooling to generate features. Both methods can reduce the complexity of our model to relieve the overfitting problem. Besides attention mechanism is adopted to extract information of important nodes. To evaluate our approach, we divide the experiment into two types; one is Within-Project Defect Prediction (WPDP), another is Cross-Project Defect Prediction (CPDP). We compare with two other deep learning-based models, including Convolutional Neural Networks (CNN) and BiLSTM. To train the model, we adopt the logistics regression classifier. Besides, we also compare our model with two statistical-based models ? all the comparisons conducted with both WPDP and CPDP type experiments. To relieve the overfitting problem, we adopted a data augmentation strategy to generate more training dataset. Apart from that, we also compare our pretrained BERT model and Word2vec model. Finally, we evaluate two data processing methods, full-token and AST-node. We compare two processing method by conducting the length of coverage on each project from 50% to 90% in both WPDP and CPDP experiment
The results show that the average F1 score on our ten projects of our proposed BERT-based fine-tuning model is 8.8%, 2.3%, 9%, and 7% higher than BiLSTM, CNN and two other traditional models, respectively in WPDP. For CPDP, the results are 12.6%, 4.7%, 10.2% and 7% in F1 score. The data augmentation strategy can improve 0.54 in the F1 score. However, our BERT embedding does not outperform Word2vec. Finally, we found that full-token data type can achieve better performance than AST-node as it can present more semantics of sequences.