Minería de texto en la Encuesta Nacional de Transparencia 2019

opinion surveys
open questions
text mining
supervised machine learning encuesta de opinión
preguntas abiertas
minería de texto
aprendizaje automático supervisado

How to Cite

Centeno-Mora, O., & Gónzalez-Évora, F. (2022). Text minig in the National Transparency Survey 2019. Revista De Matemática: Teoría Y Aplicaciones, 29(2), 261–287. https://doi.org/10.15517/rmta.v29i2.46379

Abstract

Coding and analyzing open-ended questions from opinion survey is often time consuming. Text mining offers an alternative for this type of problem. Data comes from the 2019 National Survey of Perception on Transparency open-ended questions. Text mining is applied from a descriptive and predictive approach: the latter has a predominant interest in performing the automatic coding of responses or categories using supervised machine learning. Support vector machine algorithms, naïve Bayes classifier, random forests, XGBoost, and closest neighbors are used. The results of the descriptive analysis improve the descriptions, visualizations and relationships in the analysis of the open-ended questions. The predictive analysis reports that the algorithms with the highest selection occurrence for the open-ended questions were the naive Bayes classifier and the random forests, showing accuracies between 48% and 76%. Similar results were obtained compared with the pre-established categories. Satisfactory results are seen in the comprehensive analysis of the 12 survey questions.

https://doi.org/10.15517/rmta.v29i2.46379

PDF (Español (España))

PS (Español (España))

DVI (Español (España))

References

M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E.D. Trippe, J.B. Gutiérrez, K. Kochut, A brief survey of text mining: Classification, clustering and extraction techniques, arXiv, 2017. Doi: https://arxiv.org/abs/1707.02919

S. Ananiadou, D.B. Kell, J.i. Tsujii, Text mining and its potential applications in systems biology, Trends in Biotechnology 24 (2006), no. 12, 571–579. Doi: 10.1016/j.tibtech.2006.10.002 N.P. Araujo, Método semisupervisado para la clasificación automática de textos de opinión. Masters Thesis in Computer Science, Instituto Nacional de Astrofísica, Óptica y Electrónica, Puebla, México, 2009. Link

A. Ben-Hur, J. Weston, A User’s Guide to Support Vector Machines, in: O. Carugo & F. Eisenhaber (Eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology 609, Humana Press, Springer, New York, 2009, pp. 223–239. Doi: 10.1007/978-1-60327-241-4_13, Link

Contraloría General de la República, Memoria Anual 2018, San José. Costa Rica, 2019. Link S.V. Guttula, A.A. Rao, G.R. Sridhar, M.S. Chakravarthy, K. Nageshwararo, P.V. Rao, Cluster analysis and phylogenetic relationship in biomarker indentification of type 2 diabetes and nephropathy, International Journal of Diabetes in Developing Countries 30 (2010), 52–56. Doi: 10.4103/0973-3930.60003

T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning. Data Mining, Inference, and Prediction, 2nd Edition, Springer, New York, 2009. Doi: 10.1007/978-0-387-84858-7

M.C. Justicia de la Torre, Nuevas técnicas de minería de textos: Aplicaciones. Doctorate Thesis in Communication Sciences and Artificial Intelligence, University of Granada, Spain, 2017. https://digibug.ugr.es/handle/10481/46975

S. Kannan, V. Gurusamy, Preprocessing Techniques for Text Mining. Preprint, Madurai Kamaraj University, India, 2014. Link

M. Maheswari, J.G.R. Sathiaseelan, Text mining: Survey on techniques and applications, International Journal of Science and Research 6 (2017), no. 6, 1660–1664. Link

J.D. Mateo Vázquez, Competición de Kaggle.com: Santander Customer Satisfaction Master Thesis, Universidad Internacional de Andalucía, Huelva, España, 2014. Link

E.E. Milios, M.M. Shafiei, S. Wang, R. Zhang, B. Tang, J. Tougas, A Systematic Study on Document Representation and Dimensionality Reduction for Text Clustering, Preprint, Faculty of Computer Science, Dalhouse University, 2007. Link

F. Murtagh, P. Legendre, Hierarchical agglomerative clustering method: Which algorithms implement Ward’s criterion?, Journal of Classification 31 (2014), 274–295. Doi: https://doi.org/10.1007/s00357-014-9161-z

B. Nguyen Cong, J. Rivero Pérez, C. Morell, Aprendizaje supervisado de funciones de distancia: estado del arte Revista Cubana de Ciencias Informáticas 9(2015), no. 2, 14–28. Link

J. Silge, D. Robinson, Text Mining with R. A Tidy Approach. O’Reilly, Sebastopol CA, 2019. https://www.tidytextmining.com/

J.L Solka, Text data mining: Theory and methods, Statistics Surveys 2 (2008), 94–112. Doi: 10.1214/07-SS016

S. Tufféry, Data Mining and Statistics for Decision Making, John Wiley & Sons, New York, 2011. Doi: 10.1002/9780470979174

J. Xu, X. Liu, Z. Huo, C. Deng, F. Nie, H. Huang, Multi-class support vector machine via maximizing multi-class margins, Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017 pp. 3154 3160. Doi: 10.24963/ijcai.2017/440

O.R. Zaïane, Introduction to Data Mining, Chapter 1 in: Principles of Knowledge Discovery in Databases, Departament of Computer Science, University of Alberta. Canada. Link

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Downloads

Download data is not yet available.

Text minig in the National Transparency Survey 2019

Keywords

How to Cite

Download Citation

Abstract

References

Downloads