RUS  ENG
Full version
JOURNALS // Proceedings of the Institute for System Programming of the RAS // Archive

Proceedings of ISP RAS, 2024 Volume 36, Issue 6, Pages 19–38 (Mi tisp937)

Class balancing approaches to improve for software defect prediction estimations

Á. J. Sánchez-García, X. Limón, S. Domínguez-Isidro, D. J. Olvera-Villeda, J. C. Pérez-Arriaga

Universidad Veracruzana

Abstract: Addressing software defects is an ongoing challenge in software development, and effectively managing and resolving defects is vital for ensuring software reliability, which is in turn a crucial quality attribute of any software system. Software defect prediction supported by Machine Learning (ML) methods offers a promising approach to address the problem of software defects. However, one common challenge in ML-based software defect prediction is the issue of data imbalance. In this paper, we present an empirical study aimed at assessing the impact of various class balancing methods on the issue of class imbalance in software defect prediction. We conducted a set of experiments that involved nine distinct class balancing methods across seven different classifiers. We used datasets from the PROMISE repository, provided by the NASA software project. We also employed various metrics including AUC, Accuracy, Precision, Recall, and the F1 measure to gauge the effectiveness of the different class balancing methods. Furthermore, we applied hypothesis testing to determine any significant differences in metric results between datasets with balanced and unbalanced classes. Based on our findings, we conclude that balancing the classes in software defect prediction yields significant improvements in overall performance. Therefore, we strongly advocate for the inclusion of class balancing as a pre-processing step in this domain.

Keywords: software defect prediction, statistical analysis, imbalanced class, PROMISE, datasets, metrics, oversampling, undersampling

Language: English

DOI: 10.15514/ISPRAS-2024-36(6)-2



© Steklov Math. Inst. of RAS, 2025