Káñina ISSN Impreso: 0378-0473 ISSN electrónico: 2215-2636

OAI: https://revistas.ucr.ac.cr/index.php/kanina/oai
Evaluation of potential features present in short texts in spanish in order to classify them by polarity

Palabras clave

sentiment analysis
information gain
feature vectors

Cómo citar

Casasola Murillo, E., Leoni de León, A., & Marín Raventós G. (2017). Evaluation of potential features present in short texts in spanish in order to classify them by polarity. Káñina, 40(4), 21–32. https://doi.org/10.15517/rk.v40i4.30223


This work describes the identification and evaluation process of potential text markers for sentiment analysis. The evaluation of the markers and their use as part of the feature extraction process from plain text that is needed for sentiment analysis is presented. The evaluation of text markers obtained as a result of systematic analysis from a corpus over a second one allowed us to identify that emphasized positive words that tend to appear in positive text posts. The second corpus allowed us to evaluate the relation between the polarity of morphological text markers and the text they appear in. The evaluation of the markers for polarity detection task, in combination with a polarized dictionary, produced polarity classification average precision of 0.56 % using only three markers. These are promising results if we compared them to the top 0.69 % obtained using more features and specialized dictionaries for the same task. 



Arce, J. L. 2012. Medios de Comunicación de Masas en Costa Rica: Entre la digita- lización, la convergencia y el auge de los “New Media”. Hacia la Sociedad de la Información y el Conocimiento, Programa Sociedad de la Información y el Conocimiento, Universidad de Costa Rica, 283-308.

Cabanlit, Mark Anthony and Kurt Junshean Espinosa. 2014, July. Optimizing N-gram based text feature selection in sentiment analysis for commercial products in Twitter through polarity lexicons. IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications, 94-97. IEEE.

Cambria, Erick et al. 2013. New avenues in opi- nion mining and sentiment analysis. IEEE Intelligent Systems, 28(2), 15-21. Obtained from http://sentic.net/new-avenues-in-opi- nion-mining-and-sentiment-analysis.pdf

Chenlo, J. M. & Losada, D. E. 2014. An empirical study of sentence features for subjectivity and polarity classification. Information Sciences, Elsevier,, 280, 275-288.

Feldman, Ronen. 2013. Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82-89. Obtained from http://dl.acm.org/ citation.cfm?doid=2436256.2436274

Forman, G. 2003. An extensive empi- rical study of feature selec- tion metrics for text classification Journal of machine learning research, 3, 1289-1305.

Guo, Liqiang and Wan, Xiaojun. 2012. Exploiting syntactic and semantic relationships bet- ween terms for opinion retrieval. Journal of the American Society for Information Science and Technology, 63(11), 2269- 2282. Obtained from http://onlinelibrary. wiley.com/doi/10.1002/asi.22724/full

Indurkhya, N. & Damerau, F. J. 2010. Handbook of natural language processing CRC Press, 2.

Kouloumpis, E.; Wilson, T. & Moore, J. D. 2011. Twitter sentiment analysis: The good the bad and the omg. Icwsm,11, 538-541.

Martín-Valdivia, María Teresa et al. 2013. Sentiment polarity detection in Spanish reviews combining supervised and unsu- pervised approaches. Expert Systems with Applications, 40(10), 3934-3942. Obtained from http://www.sciencedirect.com/scien- ce/article/pii/S0957417412013267

Melero, M.; Cardús, A.-B.; Moreno, A.; Rehm, G.; de Smedt, K. & Uszkoreit, H. (2012). The Spanish language in the digital age. Springer.

Pang, Bo and Lee, Lillian. 2008. Opinion mining and sentiment analysis. Foundations

and trends in information retrieval, 2(1- 2), 1-135. Obtained from http://dx.doi. org/10.1561/1500000011

Perez-Rosas, Verónica et al. 2012, May. Learning Sentiment Lexicons in Spanish. In LREC, 12, 3077-3081.

Sharma, Anuj and Dey, Shubhamoy. 2012. Performance investigation of feature selection methods and sentiment lexi- cons for sentiment analysis. IJCA Special Issue on Advanced Computing and Communication Technologies for HPC Applications, 3, 15-20.

Stats. 2013. Internet World Users By Language: Top 10 Languages. Electronic site. Obtained from http://www.internetworlds- tats.com/stats7.htm

Turney, Peter D. 2002, July. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th annual meeting on association for computational linguistics, 417-424. Association for Computational Linguistics. Obtained from http://dl.acm. org/citation.cfm?doid=1073083.1073153



Los datos de descargas todavía no están disponibles.