Murdoch University Research Repository

Welcome to the Murdoch University Research Repository

The Murdoch University Research Repository is an open access digital collection of research
created by Murdoch University staff, researchers and postgraduate students.

Learn more

R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora

Blamey, B., Crick, T. and Oatley, G. (2012) R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora. In: Bramer, M. and Petridis, M., (eds.) Research and Development in Intelligent Systems XXIX. Springer Verlag, pp. 207-212.

Link to Published Version: http://dx.doi.org/10.1007/978-1-4471-4739-8_16
*Subscription may be required

Abstract

Binary sentiment classification, or sentiment analysis, is the task of computing the sentiment of a document, i.e. whether it contains broadly positive or negative opinions. The topic is well-studied, and the intuitive approach of using words as classification features is the basis of most techniques documented in the literature. The alternative character n-gram language model has been applied successfully to a range of NLP tasks, but its effectiveness at sentiment classification seems to be under-investigated, and results are mixed. We present an investigation of the application of the character n-gram model to text classification of corpora from online social networks, the first such documented study, where text is known to be rich in so-called unnatural language, also introducing a novel corpus of Facebook photo comments. Despite hoping that the flexibility of the character n-gram approach would be well-suited to unnatural language phenomenon, we find little improvement over the baseline algorithms employing the word n-gram language model.

Item Type: Book Chapter
Publisher: Springer Verlag
Copyright: © 2012 Springer-Verlag London
URI: http://researchrepository.murdoch.edu.au/id/eprint/36127
Item Control Page Item Control Page