Ensemble Learning for Url Phishing Detection
Phishing is a social engineering attack that has been perpetuated for long and is still a prominent attack with an attending high number of victims. The adverse effect of this allows phishers easy access to sensitive information about a company or an individual. This research compares the import of features such as lexical features, Domain Name Based features, HTML Features, and tokenization of URLs in detecting phishing URLs. Experimental procedures were designed to compare the efficiency of the four different approaches used separately on three machine learning models and five ensemble learning classifiers. The classification of URLs is done using K-Nearest Neigbour, Decision Tree, Logistic Regression, Random Forest, Bagging, Stacking, Ada Boost, Gradient Boost. The research shows that using URL tokenization performs better for both machine learning and ensemble learning classifiers.