Evaluation of potential features present in short texts in spanish in order to classify them by polarity

Édgar Casasola Murillo; Antonio Leoni de León; Gabriela Marín Raventós

doi:10.15517/rk.v40i4.30223

Vol. 40 No. 4 (2016), Artlcles

Vol. 40 No. 4 (2016)

Evaluation of potential features present in short texts in spanish in order to classify them by polarity

Artlcles

https://doi.org/10.15517/rk.v40i4.30223

Published August 16, 2017

Édgar Casasola Murillo⁺⁻
Antonio Leoni de León⁺⁻
Gabriela Marín Raventós⁺⁻

Édgar Casasola Murillo

Universidad de Costa Rica. Escuela de Ciencias de la Computación, Programa de Posgrado en Computación e Informática y Centro de Investigaciones en Tecnologías de la Información y Comunicación (CITIC).

Antonio Leoni de León

Universidad de Costa Rica, Profesor, Escuela de Filología, Lingüística y Literatura.

Gabriela Marín Raventós

Universidad de Costa Rica. Centro de Investigaciones en Tecnologías de la Información y Comunicación (CITIC).

PDF (Español (España))

How to Cite

Casasola Murillo, E., Leoni de León, A., & Marín Raventós G. (2017). Evaluation of potential features present in short texts in spanish in order to classify them by polarity. Káñina, 40(4), 21–32. https://doi.org/10.15517/rk.v40i4.30223

Abstract

This work describes the identification and evaluation process of potential text markers for sentiment analysis. The evaluation of the markers and their use as part of the feature extraction process from plain text that is needed for sentiment analysis is presented. The evaluation of text markers obtained as a result of systematic analysis from a corpus over a second one allowed us to identify that emphasized positive words that tend to appear in positive text posts. The second corpus allowed us to evaluate the relation between the polarity of morphological text markers and the text they appear in. The evaluation of the markers for polarity detection task, in combination with a polarized dictionary, produced polarity classification average precision of 0.56 % using only three markers. These are promising results if we compared them to the top 0.69 % obtained using more features and specialized dictionaries for the same task.

https://doi.org/10.15517/rk.v40i4.30223

PDF (Español (España))

References

Arce, J. L. 2012. Medios de Comunicación de Masas en Costa Rica: Entre la digita- lización, la convergencia y el auge de los “New Media”. Hacia la Sociedad de la Información y el Conocimiento, Programa Sociedad de la Información y el Conocimiento, Universidad de Costa Rica, 283-308.

Cabanlit, Mark Anthony and Kurt Junshean Espinosa. 2014, July. Optimizing N-gram based text feature selection in sentiment analysis for commercial products in Twitter through polarity lexicons. IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications, 94-97. IEEE.

Cambria, Erick et al. 2013. New avenues in opi- nion mining and sentiment analysis. IEEE Intelligent Systems, 28(2), 15-21. Obtained from http://sentic.net/new-avenues-in-opi- nion-mining-and-sentiment-analysis.pdf

Chenlo, J. M. & Losada, D. E. 2014. An empirical study of sentence features for subjectivity and polarity classification. Information Sciences, Elsevier,, 280, 275-288.

Feldman, Ronen. 2013. Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82-89. Obtained from http://dl.acm.org/ citation.cfm?doid=2436256.2436274

Forman, G. 2003. An extensive empi- rical study of feature selec- tion metrics for text classification Journal of machine learning research, 3, 1289-1305.

Guo, Liqiang and Wan, Xiaojun. 2012. Exploiting syntactic and semantic relationships bet- ween terms for opinion retrieval. Journal of the American Society for Information Science and Technology, 63(11), 2269- 2282. Obtained from http://onlinelibrary. wiley.com/doi/10.1002/asi.22724/full

Indurkhya, N. & Damerau, F. J. 2010. Handbook of natural language processing CRC Press, 2.

Kouloumpis, E.; Wilson, T. & Moore, J. D. 2011. Twitter sentiment analysis: The good the bad and the omg. Icwsm,11, 538-541.

Martín-Valdivia, María Teresa et al. 2013. Sentiment polarity detection in Spanish reviews combining supervised and unsu- pervised approaches. Expert Systems with Applications, 40(10), 3934-3942. Obtained from http://www.sciencedirect.com/scien- ce/article/pii/S0957417412013267

Melero, M.; Cardús, A.-B.; Moreno, A.; Rehm, G.; de Smedt, K. & Uszkoreit, H. (2012). The Spanish language in the digital age. Springer.

Pang, Bo and Lee, Lillian. 2008. Opinion mining and sentiment analysis. Foundations

and trends in information retrieval, 2(1- 2), 1-135. Obtained from http://dx.doi. org/10.1561/1500000011

Perez-Rosas, Verónica et al. 2012, May. Learning Sentiment Lexicons in Spanish. In LREC, 12, 3077-3081.

Sharma, Anuj and Dey, Shubhamoy. 2012. Performance investigation of feature selection methods and sentiment lexi- cons for sentiment analysis. IJCA Special Issue on Advanced Computing and Communication Technologies for HPC Applications, 3, 15-20.

Stats. 2013. Internet World Users By Language: Top 10 Languages. Electronic site. Obtained from http://www.internetworlds- tats.com/stats7.htm

Turney, Peter D. 2002, July. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th annual meeting on association for computational linguistics, 417-424. Association for Computational Linguistics. Obtained from http://dl.acm. org/citation.cfm?doid=1073083.1073153

##plugins.facebook.comentarios##

Downloads

Download data is not yet available.

Most read articles by the same author(s)

Minor Quesada Grosso, Édgar Casasola Murillo, Antonio Leoni de León, Extracción de temas emergentes en microblogs utilizando modelos de temas y discriminación de bitérminos , Káñina: Vol. 40 No. 4 (2016): Káñina número extraordinario
Minor Sandí Salazar, Gabriela Marín Raventós, Edgar Casasola Murillo, Automatización del análisis sintáctico para el español con el fin de crear in treebank estandarizado , Káñina: Vol. 40 No. 4 (2016): Káñina número extraordinario

Evaluation of potential features present in short texts in spanish in order to classify them by polarity

How to Cite

Download Citation

Abstract

References

##plugins.facebook.comentarios##

Downloads

Most read articles by the same author(s)