On the impact of dataset characteristics on Arabic document classification

Abuaiadah, Diab and El - Sana, Jihad and Abusalah, Walid (2014) On the impact of dataset characteristics on Arabic document classification. International Journal of Computer Applications, 101 (7). pp. 31-38. ISSN 973-93-80883-68-9

[img] PDF - Published Version
Restricted to Registered users only


Official URL: http://www.ijcaonline.org/archives/volume101/numbe...

Abstract or Summary

This paper describes the impact of dataset characteristics on the results of Arabic document classification algorithms using TF-IDF representations. The experiments compared different stemmers, different categories and different training set sizes, and found that different dataset characteristics produced widely differing results, in one case attaining a remarkable 99% recall (accuracy). The use of a standard dataset would eliminate this variability and enable researchers to gain comparable knowledge from the published results.

Item Type:Journal article
Keywords that describe the item:Dataset, TF-IDF representation, Arabic Stemmers, Arabic document classification
Subjects:T Technology > T Technology (General)
Divisions:Schools > Centre for Business, Information Technology and Enterprise > School of Information Technology
ID Code:3353
Deposited By:
Deposited On:01 Oct 2014 20:27
Last Modified:01 Oct 2014 20:27

Repository Staff Only: item control page