Search. Read. Cite.

Easy to search. Easy to read. Easy to cite with credible sources.

Research Article
Automatic Multi-Document Arabic Text Summarization Using Clustering and Keyphrase Extraction

Hamzah Noori Fejer and Nazlia Omar

Journal of Artificial Intelligence, 2015, 8(1), 1-9.


Automatic text summarization has become important due to the rapid growth of information texts since it is very difficult for human beings to manually summarize large documents of texts. A full understanding of the document is essential to form an ideal summary. However, achieving full understanding is either difficult or impossible for computers. Therefore, selecting important sentences from the original text and presenting these sentences as a summary present the most common techniques in automated text summarization. Arabic natural language processing lacks tools and resources which are essential to advance research in Arabic text summarization. In addition to the limited resources, there has been little attention and research done in this field. Arabic text summarization still suffer from low accuracy as they use simple summarization techniques. The aim of this research is to improve Arabic text summarization by using clustering and keyphrase extraction. This study proposes a combined clustering method to group Arabic documents into several clusters. Keyphrase extraction module is applied to extract important keyphrases from each cluster, which helps to identify the most important sentences and find similar sentences based on several similarity algorithms. These algorithms are applied to extract one sentence from a group of similar sentences while ignoring the other similar sentences. The Recall-Oriented Understudy for Gisting Evaluation (ROGUE) metrics were used for the evaluation. For the summarization dataset the corpus DUC2002 was used. This model achieved an accuracy of 43.4%. The experiments have proved that the proposed model has given better performance in comparison to other work.

ASCI-ID: 33-152

Cited References Fulltext

Related Articles

Automated Clustering of Cancer Cells Using Fuzzy C Means with Repulsions in Ultrasound Images

Journal of Artificial Intelligence, 2012, 5(1), 14-25.

Applicability of Ensemble Clustering and Ensemble Classification Algorithm for User Navigation Pattern Prediction

Journal of Artificial Intelligence, 2013, 6(3), 210-219.

Fuzzy Honey Bees Foraging Optimization: Swarm Intelligence Approach for Clustering

Journal of Artificial Intelligence, 2014, 7(1), 13-23.

Cited By

Arabic web page clustering: a review

Journal of King Saud University - Computer and Information Sciences, 2017, (), . DOI: 10.1016/j.jksuci.2017.06.002

Narzędzia do automatycznego streszczania tekstów w języku polskim. Stan badań naukowych i prac wdrożeniowych

e-mentor, 2021, 89(2), 67. DOI: 10.15219/em89.1513

An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization

Arabian Journal for Science and Engineering, 2021, 46(4), 3925. DOI: 10.1007/s13369-020-05258-z

Extractive Multi-Document Arabic Text Summarization Using Evolutionary Multi-Objective Optimization With K-Medoid Clustering

IEEE Access, 2020, 8(), 228206. DOI: 10.1109/ACCESS.2020.3046494

Automatic Arabic Text Summarization Using Analogical Proportions

Cognitive Computation, 2020, 12(5), 1043. DOI: 10.1007/s12559-020-09748-y

Automatic Text Summarization using Maximum Marginal Relevance for Health Ethics Protocol Document in Bahasa

2021 13th International Conference on Information & Communication Technology and System (ICTS), 2021, (), 324. DOI: 10.1109/ICTS52701.2021.9607951