Arabic document classification using multiword features

Abuaiadah, Diab (2013) Arabic document classification using multiword features. International Journal of Computer and Communication Engineering (IJCCE), 2 (6). pp. 659-664. ISSN 2010-3743

[img] PDF
Restricted to Repository staff only

591Kb

Official URL: http://dx.doi.org/10.7763/IJCCE.2013.V2.269

Abstract or Summary

We investigate the use of multiword features to improve Arabic document classification. The Arabic language is both morphologically rich and highly inflected. Accordingly it presents more challenges when enhancing Arabic information retrieval to a level comparable to English. The multiword features are modeled as a combination of words appearing within windows of varying sizes. Our experiments show multiword features combined with dice similarity distance outperform the cosine similarity function and produce results that are comparable to TF-IDF representation. Multiword features are under-explored and we believe they have the potential to improve Arabic information retrieval and, in particular, Arabic document classification.

Item Type:Journal article
Keywords that describe the item:Information retrieval, TF-IDF, arabic document classification, multiword features, dice similarity function
Subjects:T Technology > T Technology (General)
Divisions:Schools > Centre for Business, Information Technology and Enterprise > School of Information Technology
ID Code:2710
Deposited By:
Deposited On:02 Sep 2013 06:21
Last Modified:02 Sep 2013 06:21

Repository Staff Only: item control page