A Hybrid Model for Text Summarization Using Natural Language Processing

Mugi Karanja , James; Matheka, Abraham

Center for Open Access in Science (COAS)
OPEN JOURNAL FOR INFORMATION TECHNOLOGY (OJIT)
ISSN (Online) 2620-0627 * ojit@centerprode.com

OJIT Home

2022 - Volume 5 - Number 2

A Hybrid Model for Text Summarization Using Natural Language Processing

James Mugi Karanja * karanja.mugi@ku.ac.ke * ORCID: 0000-0002-1016-1962
Kenyatta University, Department of Computing and Information Technology, Nairobi, KENYA

Abraham Matheka * mutua.abraham@ku.ac.ke
Kenyatta University, Department of Computing and Information Technology, Nairobi, KENYA

Open Journal for Information Technology, 2022, 5(2), 65-80 * https://doi.org/10.32591/coas.ojit.0502.03065k
Received: 7 October 2022 ▪ Revised: 20 November 2022 ▪ Accepted: 28 November 2022

LICENCE: Creative Commons Attribution 4.0 International License.

ARTICLE (Full Text - PDF)

ABSTRACT:
Text summarization plays an important role in the area of natural language processing. The need for information all over the world to solve specific problems keeps on increasing daily. This poses a greater challenge as data stored on the internet has gradually increased exponentially over time. Finding out the relevant data and manually summarizing it in a short time is a challenging and tedious task for a human being. Text Summarization aims to compress the source text into a more concise form while preserving its overall meaning. Two major categories of text summarization methods exist namely: extractive and abstractive. The extractive technique concentrates on determining key themes using frequency analysis of sentences in the corpus of the text. Abstractive methods write a new summary with newly generated texts which do not appear in the corpus itself. This paper presents a hybrid model for text summarization using both extractive and abstractive techniques. Term Frequency (TF) – Inverse Document Frequency (IDF) was used for extractive and T5 Transformers for abstractive summarization. Iterative Incremental Methodology was adopted in the study. The hybrid model emerged as not the best choice compared to the extractive and abstractive as it had been hypothesized in the study when the accuracy and execution time of the summary generated was considered.

KEY WORDS: extractive model, abstractive model, hybrid model, natural language processing.

CORRESPONDING AUTHOR:
James Mugi Karanja, Kenyatta University, School of Engineering, Nairobi, KENYA. E-mail: karanja.mugi@ku.ac.ke.

REFERENCES:

Boorugu, R., & Ramesh, G. (2020). A survey on NLP based text summarization for summarizing product reviews. Proceedings of the 2nd International Conference on Inventive Research in Computing Applications, ICIRCA 2020, 352-356. https://doi.org/10.1109/ICIRCA48905.2020.9183355

Celikyilmaz, A., & Hakkani-Tur, D. (2010). A hybrid hierarchical model for multi-document summarization. ACL 2010 - 48th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, (July), 815-824.

Chen, Y.-N., Huang, Y., Yeh, C.-F., & Lee, L.-S. (2011). Spoken lecture summarization by random walk over a graph constructed with automatically extracted key terms. Twelfth Annual Conference of the International Speech Communication Association.

Chu, W.-S., Song, Y., & Jaimes, A. (2015). Video co-summarization: Video summarization by visual co-occurrence. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3584-3592.

Dave, H., & Jaswal, S. (2016). Multiple Text Document Summarization System using hybrid Summarization technique. Proceedings on 2015 1st International Conference on Next Generation Computing Technologies, NGCT 2015, (September), 804-808. https://doi.org/10.1109/NGCT.2015.7375231

Eberts, M., Ulges, A., & Schwanecke, U. (2015). Amigo-automatic indexing of lecture footage. 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 1206-1210.

Garg, R., Hassan, E., & Chaudhury, S. (2015). Document indexing framework for retrieval of degraded document images. 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 1261-1265.

Goyal, P., Behera, L., & McGinnity, T. M. (2018). A context-based word indexing model for document summarization. IEEE Transactions on Knowledge and Data Engineering, 25(8), 1693-1705. https://doi.org/10.1109/TKDE.2012.114

Gupta, A., Chugh, D., Anjum, & Katarya, R. (2021). Automated news summarization using transformers. Retrieved from http://arxiv.org/abs/2108.01064.

Hassel, M. (2007). Resource lean and portable automatic text summarization. https://www.csc.kth.se/utbildning/forskar/avhandlingar/doktor/2007/HasselMartin.pdf.

Li, K., Wang, J., Wang, H., & Dai, Q. (2014). Structuring lecture videos by automatic projection screen localization and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(6), 1233-1246.

Liao, P., Zhang, C., Chen, X., & Zhou, X. (2020). Improving abstractive text summarization with history aggregation. Proceedings of the International Joint Conference on Neural Networks. https://doi.org/10.1109/IJCNN48605.2020.9207502

Mallick, C., Das, A. K., Dutta, M., Das, A. K., & Sarkar, A. (2019). Graph-based text summarization using modified textrank. In J. Nayak, A. Abraham, B. M. Krishna, G. T. Chandra Sekhar & A. K. Das (Eds.), Soft computing in data analytics (pp. 137-146). Singapore: Springer Singapore.

Meena, S. M., Ramkumar, M. P., Asmitha, R. E., & Emil Selvan, G. S. (2020). Text summarization using text frequency ranking sentence prediction. 4th International Conference on Computer, Communication and Signal Processing, ICCCSP 2020, 0-4. https://doi.org/10.1109/ICCCSP49186.2020.9315203

Mihalcea, R. (2004). Graph-based ranking algorithms for sentence extraction, applied to text summarization. (4), 20-es. https://doi.org/10.3115/1219044.1219064

MuraliKrishna, V. R., Pavan, Kumar, Y. S., & Satyananda, R. C. (2013). A hybrid method for query based automatic summarization system. International Journal of Computer Applications, 68(6), 39-43. https://doi.org/10.5120/11587-6925

Nguyen, H., Santos, E., & Russell, J. (2019). Evaluation of the impact of user-cognitive styles on the assessment of text summarization. IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, 41(6), 1038-1051. https://doi.org/10.1109/TSMCA.2011.2116001

Radev, D. ., & Erkan, G. (2015). Proceedings of Document Understanding Conference Workshop. The University of Michigan at Duc, 120–127.

Rani, S. S., Sreejith, K., & Sanker, A. (2017). A hybrid approach for automatic document summarization. 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 663-669. https://doi.org/10.1109/ICACCI.2017.8125917

Shimada, A., Okubo, F., Yin, C., & Ogata, H. (2018). Automatic summarization of lecture slides for enhanced student preview-technical report and user study. IEEE Transactions on Learning Technologies, 11(2), 165-178. https://doi.org/10.1109/TLT.2017.2682086

Song, S., Huang, H., & Ruan, T. (2019). Abstractive text summarization using LSTM-CNN based deep learning. Multimedia Tools and Applications, 78(1), 857-875.

Yao, K., Zhang, L., Du, D., Luo, T., Tao, L., & Wu, Y. (2018). Dual encoding for abstractive text summarization. IEEE Transactions on Cybernetics, 50(3), 985-996. https://doi.org/10.1109/TCYB.2018.2876317