Machine Learning Techniques for Malaria Incidence and Tuberculosis Prediction
Main Thesis
Thesis
This research proposes machine learning techniques to develop models that would facilitate decision-making in health informatics. It focuses on using efficient machine learning techniques to solve the pressing need in the two main disease burdens of Africa, which are Malaria and Tuberculosis. In 2019, there were an estimated 229 million malaria cases and 409,000 deaths worldwide, with Africa having the 94% of these cases and deaths. In 2019, Nigeria, Niger Republic, DRC, Burkina Faso, Tanzania, and Mozambique accounted for approximately 50% of the malaria deaths worldwide. Climate variability is one of the leading factors that influence malaria prevalence and transmission, especially in Africa. However, the effect of climate variability on malaria varies across geographical locations. Implementation of a surveillance system that could predict possible malaria outbreak is one of the efforts to eradicate malaria. A surveillance system is domain-specific since what works for one location may not work for another. This research employed an eXtreme Gradient Boosting (XGBoost) algorithm, a machine learning-based model, to predict the incidence of malaria in the six malaria-endemic countries of sub-Saharan Africa. XGBoost is scalable and efficient in memory usage and drives fast learning through parallel and distributed computing. It is used here to develop a malaria incidence classification system that enables early detection of malaria outbreak or epidemics and typically helps policymakers to take pre-informed decisions on malaria intervention. Tuberculosis has an estimated 10 million cases and about 1.4 million deaths in 2019 while multidrug-resistant Tuberculosis remains a public health crisis and a threat to health security. Methods of diagnosing Tuberculosis is sometimes invasive, takes much time and demands the presence of an expert. Therefore, this research focuses on applying the Frequent Pattern growth algorithm to discover hidden reoccurring patterns on Drug-Resistant Tuberculosis symptoms and generates relevant association rules used to fit a logistic regression model and classify the patient into two target classes. The system is a knowledge capturing one that assists in the quick diagnosis of Drug Resistance Tuberculosis and leads to breakthroughs in treatments based on correlations found from the collection and integration of Drug-Resistance Tuberculosis data. Accuracy of diagnosis is crucial in the medical field because wrong diagnosis or prediction might lead to severe consequences. The performance of the proposed models was evaluated using some performance metrics such as Area under the curve (AUC) of Receiver operating characteristic (ROC), classification accuracy, precision, recall and F1-score. The model was also compared with other models using the same metrics; an Akaike information criterion was used to select the best model. The comparisons showed that the proposed models performed better than other models for the intended applications; this proved the efficiency of the models. This research work presents a unique knowledge-based decision support system that can aid physicians, governments, and other health policy makers in making the informed clinical decisions.