Software Quality Prediction Based on K-means and Fuzzy C-means Clustering

Harish Mittal, Rashi Gupta


Software metrics and fault data belonging to a previous software version are used to build the software fault prediction model for the next release of the software. Until now, different classification algorithms have been used to build this kind of models. However, there are cases when previous fault data are not present; and hence, supervised learning approaches cannot be applied. This paper advocates the use of unsupervised learning (i.e., clustering techniques) to build a software fault prediction model. Our technique first applies K-means and Fuzzy C-means clustering method to cluster hundreds of software modules into a small number of coherent groups. After this step, a cluster is predicted as fault-prone if at least one metric of the mean vector is higher than the threshold value of that metric. And then measure the quality of each cluster based on fault-prone or not fault-prone. Two datasets, collected from NASA software projects, have been used for the validation.


Software Fault Prediction, Clustering, Metrics thresholds, K-means, Fuzzy c-means

Full Text:



J. C. Bezdek. ‘Pattern recognition with fuzzy objective function algorithms’. Plenum Press, New York, 1981.

C. Catal, U. Sevim and B. Diri Catal. ‘Clustering and metrics thresholds based software fault prediction of unlabeled program modules’. Proceedings International Conference on Information Technology, Las Vegas, Nevada, 2009, pp: 199-204.

G. Gan, C. Ma and J. Wu. ‘Data clustering: theory, algorithms, and applications’. Society for Industrial & Applied Mathematics, Philadelphia, 2007.

T.M. Khoshgoftaar and N. Seliya. ‘Tree-based software quality models for fault prediction’. Proceedings International Conference on Software Metrics Symposium, Ottawa, Ontario, Canada, June 2002, pp: 203-214.

T.M. Khoshgoftaar and N. Seliya. ‘Software quality classification modeling using the SPRINT decision tree algorithm’. Artificial Intelligence Tools, vol 12, no 3, 2003, pp: 207-225.

T.M. Khoshgoftaar and N. Seliya. ‘Analogy-based practical classification rules for software quality estimation’. Empirical Software Engineering , vol 8, no 3, 2003, pp: 325-350.

R. Kumar, S. Rai and J. L. Trahan Kumar. ‘Neural-network techniques for software-quality evaluation’. Proceedings International Conference on Annual Reliability and Maintainability, Anaheim, CA, USA, January 1998, pp: 155-161.

J. MacQueen. ‘Some methods for classification and analysis of multivariate observations’. Proceedings International Conference on Math. Statistics and Probability, March 1967, pp: 281-297.

M.C. Ohlsson and P. Runeson. ‘Experience from replicating empirical studies on prediction models’. Proceedings International Conference on Software Metrics Symposium, Ottawa, Ontario, Canada, June 2002, pp: 217-226.

N. Seliya and T. M. Khoshgoftaar. ‘Software quality analysis of unlabeled program modules with semi-supervised clustering’. Systems and Humans, vol 37, no

, 2007, pp: 201-211.

R. Xu and D. Wunsch. ‘Survey of clustering algorithms’. Neural Networks, vol 16, no 3, 2005, pp: 645-678.

X. Yuan, T. M. Khoshgoftaar, E. Allen and K. Ganesan. ‘An application of fuzzy clustering to software quality prediction’. Proceedings International Conference on Software Engineering Technology, Richardson, TX, March 2000, pp: 85-90.

M-S.Yang, Yu-Jen Hua, Karen Chia-Ren Linb and Charles Chia-Lee Linc. ‘Segmentation techniques for tissue differentiation in MRI of Ophthalmology using fuzzy clustering algos’. Magnetic Resonance Imaging, vol 20, no 1, 2002, pp: 173-179.

S. Zhong and Joydeep Ghosh. ‘A unified framework for model-based clustering’. Machine Learning Research, vol 4, no 1, 2003, pp: 1001-1037.

S. Zhong, T. M. Khoshgoftaar and N. Seliya. ‘Unsupervised learning for expert-based software quality estimation’. Proceedings International Conference on High Assurance Systems Eng., Tampa, FL, 2004, pp: 149-155.

S. Zhong, T. M. Khoshgoftaar and N. Seliya. ‘Analyzing software measurement data with clustering techniques’. Intelligent Systems, vol 19, no 2, 2004, pp: 20-27.

Pedrycz, W., G. Succi, M. Reformat, P. Musilek and X. Bai, 2001. Self organizing maps as a tool for software analysis. In: Electrical and Computer Engineering, Toronto, Canada, May 2001. IEEE Computer Society, pp: 93-97


  • There are currently no refbacks.