Classification of Breast Cancer Using Logistic Regression

Ude, Anthony Anene (2019-06-16)


Breast cancer is a prevalent disease that affects mostly women, an early diagnosis will expedite the treatment of this ailment. In recent times, Machine Learning (ML) techniques have been employed in biomedical and informatics to help fight breast cancer. This research work proposed an ML model for the classification of breast cancer. To achieve this we employed logistic regression (LR) and also compared our model’s performance with other extant ML models namely, Support Vector Machine (SVM), Naïve Bayes (NB), and Multilayer Perceptron (MLP). The original Wisconsin Diagnostic Breast Cancer dataset (WDBC) was used. Our performance evaluation was done for two phases, i.e. Phase 1: when the WBCD is scaled (feature scaling) and Phase 2: when the dataset is not scaled. All models excluding MLP performed well when there is no feature scaling of dataset with f1-scores of (LR=97%, SVM = 97%, NB = 95%, MLP= 52%). However, when feature scaling is applied on dataset, the four models have f1-scores above 90% (SVM = 98%, LR = 97%, NB = 97%, MLP = 97%). Notably, the f1-score for LR in both cases did not change, hence to the best of our knowledge, we concluded that LR, given its simplicity and low time complexity is a good model to employ for binomial classification .