Mobility, ease of accessibility, and portability have continued to grant ease in the adoption rise of smartphones; while, also proliferating the vulnerability of users that are often susceptible to phishing. With some users classified to be more susceptible than others resulting from media presence and personality traits, many studies seek to unveil lures and cues as employed by these attacks that make them more successful. Web content has been often classified as genuine and malicious. Our study seeks to effectively identify cues and lures using the sentiment analysis targeted tree-based gradient boosting algorithm on dataset divided into train/test sets that are scraped from client/user online presence and activity over social networking sites. The dataset is scraped using the Python Google Scrapper. The essence of which is to effectively help users to classify contents from social networking sites as either malicious phishing attacks, or as genuine contents for use using sentiment analysis. The machine learning of choice is the XGBoost. Results show that the ensemble yields a prediction accuracy of 97-percent with an F1-score of 98.19% that effectively correctly classified 2089-instances with 85-incorrectly classified instances for the test-dataset.
Author(s): Rume Elizabeth Yoro (1), Okpako Abugor Ejaita (2) and Edun Ogechi Peace (3)