Machine Learning Techniques, Features, Datasets, and Algorithm Performance Parameters for Sentiment Analysis: A Systematic Review

Ondara, Bernard; Waithaka, Stephen; Kandiri, John; Muchemi, Lawrence

Center for Open Access in Science (COAS)
OPEN JOURNAL FOR INFORMATION TECHNOLOGY (OJIT)
ISSN (Online) 2620-0627 * ojit@centerprode.com

OJIT Home

2022 - Volume 5 - Number 1

Machine Learning Techniques, Features, Datasets, and Algorithm Performance Parameters for Sentiment Analysis: A Systematic Review

Bernard Ondara * ondara.bernard@ku.ac.ke * ORCID: 0000-0002-8125-4082
Kenyatta University, School of Engineering and Technology, Nairobi, KENYA

Stephen Waithaka * waithaka.stephen@ku.ac.ke * ORCID: 0000-0003-2113-3382
Kenyatta University, School of Engineering and Technology, Nairobi, KENYA

John Kandiri * kandiri.john@ku.ac.ke * ORCID: 0000-0002-3641-3603
Kenyatta University, School of Engineering and Technology, Nairobi, KENYA

Lawrence Muchemi * lmuchemi@uonbi.ac.ke * ORCID: 0000-0001-5911-5679
University of Nairobi, Department of Computing and Informatics, Nairobi, KENYA

Open Journal for Information Technology, 2022, 5(1), 1-16 * https://doi.org/10.32591/coas.ojit.0501.01001o
Received: 29 November 2021 ▪ Revised: 20 February 2022 ▪ Accepted: 28 March 2022

LICENCE: Creative Commons Attribution 4.0 International License.

ARTICLE (Full Text - PDF)

ABSTRACT:
The purpose of this paper is to review various studies on current machine learning techniques used in sentiment analysis with the primary focus on finding the most suitable combinations of the techniques, datasets, data features, and algorithm performance parameters used in most applications. To accomplish this, we performed a systematic review of 24 articles published between 2013 and 2020 covering machine learning techniques for sentiment analysis. The review shows that Support Vector Machine as well as Naïve Bayes techniques are the most popular machine learning techniques; word stem and n-grams are the most extensively applied features, and the Twitter dataset is the most predominant. This review further revealed that machine learning algorithms' performance depends on many factors, including the dataset, extracted features, and size of data used. Accuracy is the most commonly used algorithm performance metric. These findings offer important information for researchers and businesses to use when selecting suitable techniques, features, and datasets for sentiment analysis for various business applications such as brand reputation monitoring.

KEY WORDS: sentiment analysis; machine learning technique; machine learning algorithm; sentiment classification technique; sentiment classification algorithm.

CORRESPONDING AUTHOR:
Bernard Ondara, Kenyatta University, School of Engineering and Technology, Nairobi, KENYA. E-mail: ondara.bernard@ku.ac.ke.

REFERENCES:

Adeborna, E., & Siau, K. (2014). An approach to sentiment analysis – The case of airline quality rating. Proceedings - Pacific Asia Conference on Information Systems, PACIS 2014, 363.

Ahmad, M., Aftab, S., Ali, I., & Hameed, N. (2017a). Hybrid tools and techniques for sentiment analysis: A review. International Journal of Multidisciplinary Sciences and Engineering, 8(4), 28-33.

Ahmad, M., Aftab, S., Ali, I., & Hameed, N. (2017b). Tools and techniques for lexicon driven sentiment analysis. International Journal of Multidisciplinary Sciences and Engineering, 8(1), 17-23.

Ahmad, M., Aftab, S., Bashir, M. S., & Hameed, N. (2018). Sentiment analysis using SVM: A systematic literature review. International Journal of Advanced Computer Science and Applications, 9(2), 182-188. https://doi.org/10.14569/IJACSA.2018.090226

Ahmad, M., Aftab, S., Muhammad, S., & Ahmad, S. (2017). Machine Learning Techniques for Sentiment Analysis: A Review. Int. J. Multidiscip. Sci. Eng, 8(3), 27–32.

Ahuja, R., Chug, A., Kohli, S., Gupta, S., & Ahuja, P. (2019). The impact of features extraction on the sentiment analysis. Procedia Computer Science, 152, 341-348. https://doi.org/10.1016/j.procs.2019.05.008

Akaichi, J., Dhouioui, Z., & López, M. (2013). Social networks’ text mining for sentiment classification: the case of Facebook’statuses updates. 17th International Conference, 640-645.

Al-Horaibi, L., & Khan, M. B. (2016). Sentiment analysis of Arabic tweets using text mining techniques. First International Workshop on Pattern Recognition, 10011(July 2016), 100111F. https://doi.org/10.1117/12.2242187

Al Shboul, B., Al-Ayyouby, M., & Jararwehy, Y. (2015). Multi-way sentiment classification of Arabic reviews. In 2015 6th International Conference on Information and Communication Systems, ICICS 2015 (pp. 206-211). https://doi.org/10.1109/IACS.2015.7103228

Aldayel, H. K., & Azmi, A. M. (2015). Arabic tweets sentiment analysis – A hybrid scheme. Journal of Information Science, 42(6), 782-797. https://doi.org/10.1177/0165551515610513

Anjaria, M., & Guddeti, R. M. R. (2014). Influence factor based opinion mining of Twitter data using supervised learning. 2014 6th International Conference on Communication Systems and Networks, COMSNETS 2014. https://doi.org/10.1109/COMSNETS.2014.6734907

Ankit, & Saleena, N. (2018). An ensemble classification system for Twitter sentiment analysis. Procedia Computer Science, 132(Iccids), 937-946. https://doi.org/10.1016/j.procs.2018.05.109

Anwer, F., & Aftab, S. (2017). Latest customizations of XP: A systematic literature review. International Journal of Modern Education and Computer Science, 9(12), 26-37. https://doi.org/10.5815/ijmecs.2017.12.04

Ashraf, S. (2017). Scrum with the spices of agile family: A systematic mapping. International Journal of Modern Education and Computer Science, 9(11), 58-72. https://doi.org/10.5815/ijmecs.2017.11.07

Ashraf, S., & Aftab, S. (2017). Latest transformations in scrum: A state of the art review. International Journal of Modern Education and Computer Science, 9(7), 12-22. https://doi.org/10.5815/ijmecs.2017.07.02

Baccouche, A., Garcia-Zapirain, B., & Elmaghraby, A. (2019). Annotation technique for health-related tweets sentiment analysis. 2018 IEEE International Symposium on Signal Processing and Information Technology, ISSPIT 2018, 382-387. https://doi.org/10.1109/ISSPIT.2018.8642685

Bayoudhi, A., Belguith, L. H., & Ghorbel, H. (2015). Sentiment classification of Arabic documents: Experiments with multi-type features and ensemble algorithms. 29th Pacific Asia Conference on Language, Information and Computation, PACLIC 2015, 196-205.

Boudad, N., Faizi, R., Oulad Haj Thami, R., & Chiheb, R. (2018). Sentiment analysis in Arabic: A review of the literature. Ain Shams Engineering Journal, 9(4), 2479-2490. https://doi.org/10.1016/j.asej.2017.04.007

Brereton, P., Kitchenham, B. A., Budgen, D., Turner, M., & Khalil, M. (2007). Lessons from applying the systematic literature review process within the software engineering domain. Journal of Systems and Software, 80(4), 571-583. https://doi.org/10.1016/j.jss.2006.07.009

Cherif, W., Madani, A., & Kissi, M. (2015). A new modeling approach for Arabic opinion mining recognition. In 2015 Intelligent Systems and Computer Vision, ISCV 2015. https://doi.org/10.1109/ISACV.2015.7105541

Duwairi, R., & El-Orfali, M. (2014). A study of the effects of preprocessing strategies on sentiment analysis for Arabic text. Journal of Information Science, 40(4), 501-513. https://doi.org/10.1177/0165551514534143

Duwairi, R. M., Ahmed, N. A., & Al-Rifai, S. Y. (2015). Detecting sentiment embedded in Arabic social media - A lexicon-based approach. Journal of Intelligent and Fuzzy Systems, 29(1), 107-117. https://doi.org/10.3233/IFS-151574
Duwairi, Rehab M. (2015). Sentiment analysis for dialectical Arabic. 2015 6th International Conference on Information and Communication Systems, ICICS 2015, 166-170. https://doi.org/10.1109/IACS.2015.7103221

Duwairi, Rehab M., & Qarqaz, I. (2014). Arabic sentiment analysis using supervised classification. Proceedings – 2014 International Conference on Future Internet of Things and Cloud, FiCloud 2014, August, 579-583. https://doi.org/10.1109/FiCloud.2014.100

Duwairi, Rehab M., & Qarqaz, I. (2016). A framework for Arabic sentiment analysis using supervised classification. International Journal of Data Mining, Modelling and Management, 8(4), 369. https://doi.org/10.1504/ijdmmm.2016.10002311

El-Beltagy, S. R., & Ali, A. (2013). Open issues in the sentiment analysis of Arabic social media: A case study. 2013 9th International Conference on Innovations in Information Technology, IIT 2013, 215-220. https://doi.org/10.1109/Innovations.2013.6544421

Elawady, R., Barakat, S., & Elrashidy, N. (2015). Sentimentanalysis for Arabic and English datasets. International Journal of Intelligent Computing and Information Sciences, 15(1), 55-70. https://doi.org/10.21608/ijicis.2015.10911

ElSahar, H., & El-Beltagy, S. R. (2015). Building large arabic multi-domain resources for sentiment analysis. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9042, 23-34. https://doi.org/10.1007/978-3-319-18117-2_2

Go, A., Bhayani, R., & Huang, L. (2009). Twitter sentiment classification using distant supervision. Processing, 1-6.

Hammad, M., & Mouhammd, A. (2016). Sentiment analysis for arabic reviews in social networks using machine learning. Apri, 131-139. https://doi.org/10.1007/978-3-319-32467-8_13

Hasan, A., Moin, S., Karim, A., & Shamshirband, S. (2018). Machine learning-based sentiment analysis for Twitter accounts. Mathematical and Computational Applications, 23(1), 11. https://doi.org/10.3390/mca23010011

Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. KDD-2004 – Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, November, 168-177. https://doi.org/10.1145/1014052.1014073

Ibrahim, H. S., Abdou, S. M., & Gheith, M. (2015). Sentiment analysis for modern standard Arabic and colloquial, ArXiv Prepr. International Journal on Natural Language Computing (IJNLC), 4(2), 95-105. https://doi.org/10.1109/ReTIS.2015.7232904

Jain, J., Panchal, P., Suryawanshi, N., & Shinde, A. A. (2016). Sentiment analysis using supervised machine learning. Imperial Journal of Interdisciplinary Research, 2(6), 2454-1362.

Jindal, N., Liu, B., & Street, S. M. (2008). Opinion spam and analysis. Proceedings of First ACM International Conference on Web Search and Data Mining (WSDM-2008). https://doi.org/10.1145/1341531.1341560

Kaur, J. (2016). A review paper on Twitter sentiment analysis techniques. International Journal for Research in Applied Science & Engineering Technology, 4(X), 61-70.

Kharde, V., & Sonawane, S. S. (2016). Sentiment analysis of Twitter data: A survey of techniques. International Journal of Computer Applications, 139(11), 5-15. https://doi.org/10.5120/ijca2016908625

Kitchenham, B., Pearl Brereton, O., Budgen, D., Turner, M., Bailey, J., & Linkman, S. (2009). Systematic literature reviews in software engineering – A systematic literature review. Information and Software Technology, 51(1), 7-15. https://doi.org/10.1016/j.infsof.2008.09.009

Liu. (2015). Sentiment analysis: Mining opinions, sentiments, and emotions. In Cambridge University Press. https://doi.org/10.14569/ijacsa.2018.090981

Mukherjee, A., & Liu, B. (2010). Improving gender classification of blog authors. EMNLP 2010 – Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 207-217.

Nabil, M., Aly, M., & Atiya, A. F. (2015). ASTD: Arabic sentiment tweets dataset. Conference Proceedings – EMNLP 2015: Conference on Empirical Methods in Natural Language Processing, September, 2515-2519. https://doi.org/10.18653/v1/d15-1299

Novalita, N., Herdiani, A., Lukmana, I., & Puspandari, D. (2019). Cyberbullying identification on Twitter using random forest classifier. Journal of Physics: Conference Series, 1192(1). https://doi.org/10.1088/1742-6596/1192/1/012029

Poecze, F., Ebster, C., & Strauss, C. (2018). Social media metrics and sentiment analysis to evaluate the effectiveness of social media posts. Procedia Computer Science, 130, 660-666. https://doi.org/10.1016/j.procs.2018.04.117

Saranya, N., Phil, M., & Gunavathi, R. (2016). A study on various classification techniques for sentiment analysis on social networks. International Research Journal of Engineering and Technology, 3(8), 1332-1337.

Stojanovski, D., Strezoski, G., Madjarov, G., & Dimitrovski, I. (2016). Finki at SemEval-2016 task 4: Deep learning architecture for Twitter sentiment analysis. SemEval 2016 – 10th International Workshop on Semantic Evaluation, Proceedings, 149-154. https://doi.org/10.18653/v1/s16-1022