Clustering Arabic Tweets for Sentiment Analysis

Citation: UNSPECIFIED.

There is a more recent version of this item available.

PDF (Programme)
final-detailed-program-17102017.pdf - Supplemental Material
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Download (1MB)

Official URL: https://www.ieee.org/conferences_events/conference...

Abstract

The focus of this study is to evaluate the impact of linguistic preprocessing and similarity functions for clustering Arabic Twitter tweets. The experiments apply an optimized version of the standard K-Means algorithm to assign tweets into positive and negative categories. The results show that root-based stemming has a significant advantage over light stemming in all settings. The Averaged Kullback-Leibler Divergence similarity function clearly outperforms the Cosine, Pearson Correlation, Jaccard Coefficient and Euclidean functions. The combination of the Averaged Kullback-Leibler Divergence and root-based stemming achieved the highest purity of 0.764 while the second-best purity was 0.719. These results are of importance as it is contrary to normal-sized documents where, in many information retrieval applications, light stemming performs better than root-based stemming and the Cosine function is commonly used.

Item Type:	Paper presented at a conference, workshop or other event, and published in the proceedings
Uncontrolled Keywords:	Sentiment analysis, Arabic stemmers, Clustering algorithms, K-Means clustering algorithm, Bisect K-Means clustering algorithm
Subjects:	Q Science > QA Mathematics > QA76 Computer software
Divisions:	Schools > Centre for Business, Information Technology and Enterprise > School of Information Technology
Depositing User:	Diab Abuaiadah
Date Deposited:	19 Dec 2017 22:37
Last Modified:	21 Jul 2023 04:46
URI:	http://researcharchive.wintec.ac.nz/id/eprint/5537

Available Versions of this Item

Clustering Arabic Tweets for Sentiment Analysis. (deposited 19 Dec 2017 22:37) [Currently Displayed]
- Clustering Arabic Tweets for Sentiment Analysis. (deposited 10 Apr 2018 01:48)

Actions (login required)

: View Item

Search for collections on Wintec Research Archive