Evaluation of potential features present in short texts in spanish in order to classify them by polarity

Édgar Casasola Murillo; Antonio Leoni de León; Gabriela Marín Raventós

doi:10.15517/rk.v40i4.30223

Vol. 40 Núm. 4 (2016), Artículos

Vol. 40 Núm. 4 (2016)

Evaluation of potential features present in short texts in spanish in order to classify them by polarity

Artículos

https://doi.org/10.15517/rk.v40i4.30223

Publicado agosto 16, 2017

Édgar Casasola Murillo⁺⁻
Antonio Leoni de León⁺⁻
Gabriela Marín Raventós⁺⁻

Édgar Casasola Murillo

Universidad de Costa Rica. Escuela de Ciencias de la Computación, Programa de Posgrado en Computación e Informática y Centro de Investigaciones en Tecnologías de la Información y Comunicación (CITIC).

Antonio Leoni de León

Universidad de Costa Rica, Profesor, Escuela de Filología, Lingüística y Literatura.

Gabriela Marín Raventós

Universidad de Costa Rica. Centro de Investigaciones en Tecnologías de la Información y Comunicación (CITIC).

PDF

Palabras clave

sentiment analysis
information gain
feature vectors
polarity
classification

Cómo citar

Casasola Murillo, E., Leoni de León, A., & Marín Raventós G. (2017). Evaluation of potential features present in short texts in spanish in order to classify them by polarity. Káñina, 40(4), 21–32. https://doi.org/10.15517/rk.v40i4.30223

Resumen

This work describes the identification and evaluation process of potential text markers for sentiment analysis. The evaluation of the markers and their use as part of the feature extraction process from plain text that is needed for sentiment analysis is presented. The evaluation of text markers obtained as a result of systematic analysis from a corpus over a second one allowed us to identify that emphasized positive words that tend to appear in positive text posts. The second corpus allowed us to evaluate the relation between the polarity of morphological text markers and the text they appear in. The evaluation of the markers for polarity detection task, in combination with a polarized dictionary, produced polarity classification average precision of 0.56 % using only three markers. These are promising results if we compared them to the top 0.69 % obtained using more features and specialized dictionaries for the same task.

https://doi.org/10.15517/rk.v40i4.30223

PDF

Citas

Arce, J. L. 2012. Medios de Comunicación de Masas en Costa Rica: Entre la digita- lización, la convergencia y el auge de los “New Media”. Hacia la Sociedad de la Información y el Conocimiento, Programa Sociedad de la Información y el Conocimiento, Universidad de Costa Rica, 283-308.

Cabanlit, Mark Anthony and Kurt Junshean Espinosa. 2014, July. Optimizing N-gram based text feature selection in sentiment analysis for commercial products in Twitter through polarity lexicons. IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications, 94-97. IEEE.

Cambria, Erick et al. 2013. New avenues in opi- nion mining and sentiment analysis. IEEE Intelligent Systems, 28(2), 15-21. Obtained from http://sentic.net/new-avenues-in-opi- nion-mining-and-sentiment-analysis.pdf

Chenlo, J. M. & Losada, D. E. 2014. An empirical study of sentence features for subjectivity and polarity classification. Information Sciences, Elsevier,, 280, 275-288.

Feldman, Ronen. 2013. Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82-89. Obtained from http://dl.acm.org/ citation.cfm?doid=2436256.2436274

Forman, G. 2003. An extensive empi- rical study of feature selec- tion metrics for text classification Journal of machine learning research, 3, 1289-1305.

Guo, Liqiang and Wan, Xiaojun. 2012. Exploiting syntactic and semantic relationships bet- ween terms for opinion retrieval. Journal of the American Society for Information Science and Technology, 63(11), 2269- 2282. Obtained from http://onlinelibrary. wiley.com/doi/10.1002/asi.22724/full

Indurkhya, N. & Damerau, F. J. 2010. Handbook of natural language processing CRC Press, 2.

Kouloumpis, E.; Wilson, T. & Moore, J. D. 2011. Twitter sentiment analysis: The good the bad and the omg. Icwsm,11, 538-541.

Martín-Valdivia, María Teresa et al. 2013. Sentiment polarity detection in Spanish reviews combining supervised and unsu- pervised approaches. Expert Systems with Applications, 40(10), 3934-3942. Obtained from http://www.sciencedirect.com/scien- ce/article/pii/S0957417412013267

Melero, M.; Cardús, A.-B.; Moreno, A.; Rehm, G.; de Smedt, K. & Uszkoreit, H. (2012). The Spanish language in the digital age. Springer.

Pang, Bo and Lee, Lillian. 2008. Opinion mining and sentiment analysis. Foundations

and trends in information retrieval, 2(1- 2), 1-135. Obtained from http://dx.doi. org/10.1561/1500000011

Perez-Rosas, Verónica et al. 2012, May. Learning Sentiment Lexicons in Spanish. In LREC, 12, 3077-3081.

Sharma, Anuj and Dey, Shubhamoy. 2012. Performance investigation of feature selection methods and sentiment lexi- cons for sentiment analysis. IJCA Special Issue on Advanced Computing and Communication Technologies for HPC Applications, 3, 15-20.

Stats. 2013. Internet World Users By Language: Top 10 Languages. Electronic site. Obtained from http://www.internetworlds- tats.com/stats7.htm

Turney, Peter D. 2002, July. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th annual meeting on association for computational linguistics, 417-424. Association for Computational Linguistics. Obtained from http://dl.acm. org/citation.cfm?doid=1073083.1073153

##plugins.facebook.comentarios##

Descargas

Los datos de descargas todavía no están disponibles.

Artículos más leídos del mismo autor/a

Minor Quesada Grosso, Édgar Casasola Murillo, Antonio Leoni de León, Extracción de temas emergentes en microblogs utilizando modelos de temas y discriminación de bitérminos , Káñina: Vol. 40 Núm. 4 (2016): Káñina número extraordinario
Minor Sandí Salazar, Gabriela Marín Raventós, Edgar Casasola Murillo, Automatización del análisis sintáctico para el español con el fin de crear in treebank estandarizado , Káñina: Vol. 40 Núm. 4 (2016): Káñina número extraordinario

Evaluation of potential features present in short texts in spanish in order to classify them by polarity

Palabras clave

Cómo citar

Descargar cita

Resumen

Citas

##plugins.facebook.comentarios##

Descargas

Artículos más leídos del mismo autor/a