Resumo
El Sistema Nacional de Investigadores de México (SNI) evalúa, selecciona y reconoce, mediante un estímulo económico, el capital humano nacional que realiza investigación de calidad. Esta logística puede ser considerada como una selección de proyectos, la cual conlleva, obligatoriamente, a la elección de capital humano especializado. En este artículo se utiliza la técnica de análisis y agrupamiento de datos conocida como clustering (k Means) para profundizar sobre los criterios seguidos por el SNI en cuanto a dicha elección de investigadores. Una vez que se conoce el perfil productivo de cada nombramiento definido por el SNI, y a través de la distancia de Hamming, se realiza un análisis comparativo entre los datos estimados y reales asociados a cada nombramiento. Las estimaciones permitieron concluir que no se justifica la actual clasificación en cuatro agrupaciones (nombramientos), tal vez ello se deba a que los evaluadores del SNI utilizan información no recolectada en las variables reportadas por las solicitudes. Además, se demuestra la necesidad de mejorar la información estadística utilizada como base de datos para la evaluación; se señalan las diferencias en las clasificaciones estimadas para las siete áreas del conocimiento definidas por el SNI y se recomiendan algunos de los resultados para complementar las evaluaciones por pares, realizadas actualmente, siempre que se mejore la cantidad y calidad de la información disponible. Sin duda, ello debe de servir para hacer más eficiente la futura selección de proyectos de investigación y desarrollo concernientes a un programa de la política pública de investigación en México.Referências
Anderberg, Michael R. (1973). Cluster Analysis for Applications. New York: Academic Press.
Bao, Zhiqiang, Bing, Han and Wu, Shunjun. (2006). A General Weighted Fuzzy Clustering Algorithm. En Aurélio Campilho and Mohamed Kamel (Eds), Image Analysis and Recognition. ICIAR 2006. Lecture Notes in Computer Science, (Vol. 4142, pp. 102-109). Springer, Berlin, Heidelberg. Recuperado de https://link.springer.com/chapter/10.1007/11867661_10
Bezdek, James C. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Ed. Plenum Press.
Blum, Avrim y Mitchell, Tom. (julio, 1998). Combining labeled and unlabeled data with co-training. Proceedings of the 11th annual conference on computational learning theory (COLT), Madison, USA, 92-100.
Bock, Hans-Hermann. (2008). Origins and extensions of the k-means algorithm in cluster analysis. Electronic Journal for History of Probability and Statistics, 4(2), 1-18. Recuperado de https://eudml.org/doc/130880
Campello, Ricardo, Hruschka, Eduardo R. y Alves, Vinícius S. (2009). On the efficiency of evolutionary fuzzy clustering. Journal Heuristics, 15, 43-75. Recuperado de https://link.springer.com/article/10.1007/s10732-007-9059-6
Consejo Nacional de Ciencia y Tecnología, CONACyT. (2017). Reglamento del Sistema Nacional de Investigadores. México. Recuperado de http://www.conacyt.gob.mx/index.php/el-conacyt/sistema-nacional-de-investigadores/marco-legal
Dae-Won, Kim, Kwang, H. Lee and Doheon, Lee. (2004). On cluster validity index for estimation of the optimal number of fuzzy clusters. Pattern Recognition, 37(10), 2009-2025. Recuperado de https://dl.acm.org/citation.cfm?id=2793552
Dietterich, Thomas G., Lathrop, Richard H. and Lozano-Perez, Tomás. (1997). Solving the multiple instance problem with axis-parallel rectangles. Artificial Inteligence, 89(1-2), 31-71. Recuperado de http://www.sciencedirect.com/science/article/pii/S0004370296000343
Dunn, Joseph. (1974). A fuzzy relative of the ISODATA process and its use in detecting compact well separated cluster. Journal of Cybernetics, 3(3), 32-57. Recuperado de http://www.tandfonline.com/doi/abs/10.1080/01969727308546046
Fayyad, Usama, Piatetsky-Shapiro, Gregory y Smyth, Padhraic. (1996). Knowledge discovery and data mining: Towards a unifying framework. Proceedings of the 2nd ACM international conference on knowledge discovery and data mining (KDD), Portland, USA, 82-88. Recuperado de https://dl.acm.org/citation.cfm?id=3001460&picked=prox
Fisher, Ronald Aylmer. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179-188.
Gärtner, Thomas, Flach, Peter A., Kowalczyk, Adam and Smola, Alex J. (july, 2002). Multi-instance kernels. Proceedings of the 19th international conference on machine learning (ICML). Sydney, Australia, 179-186. Recuperado de https://dl.acm.org/citation.cfm?id=656014
Goethals, Bart, Hoekx, Eveline y Van den Bussche, Jan. (2005). Mining tree queries in a graph. The Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago, Illinois, USA, 61-69. Recuperado de https://dl.acm.org/citation.cfm?id=1081870&picked=prox
Greene, William H. (2008). Econometric Analysis (6a ed.). New York University: Prentice Hall.
Han, Jiawei, y Kamber, Micheline. (2006). Data Mining: Concepts and Techniques (2a. ed.). USA, Waltham: Elsevier.
Hamming, Richard Wesley. (1950). Error detecting and error correcting codes. The Bell System Technical Journal, 29(2), 147-160.
Huang, Zhexue. (1998). Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 2(3), 283-304. Recuperado de https://link.springer.com/article/10.1023/A:1009769707641
Kailing, Karin, Kriegel, Hans-Peter, Pryakhin, Alexey and Schubert, Matthias. (2004). Clustering multi-represented objects with noise. Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining (PAKDD). Sydney, Australia, 394-403.
Kan, Raymond and Zhou, Guofu. (2007). Optimal portfolio choice with parameter uncertainty. Journal of Financial and Quantitative Analysis, 42(3), 621-656. Recuperado de http://apps.olin.wustl.edu/faculty/zhou/KZ_JFQA_W07.pdf
Khurram, Jamali, Kirsten, Wandschneider y Phanindra, V. Wunnava. (2007). The effect of political regimes and technology on economic growth. Applied Economics, 39(11), 1425-1432. Recuperado de https://econpapers.repec.org/article/tafapplec/v_3a39_3ay_3a2007_3ai_3a11_3ap_3a1425-1432.htm
Kittler, Josef, Hatef, Mohamad, Duin, Robert P.W. y Matas, Jiri. (1998). On combining classifiers. IEEE Trans Pattern Analysis and Machine Intelligence, 20(3), 226-239.
Kriegel, Hans-Peter, Borgwardt, Karsten M., Kröger, Peer, Pryakhin, Alexey, Schubert, Matthias and Zimek, Arthur. (2007). Future trends in data mining. Data Min Knowl Disc, 15, 87-97.
Kriegel, Hans-Peter, Kröger, Peer, Pryakhin, Alexey and Schubert, Matthias. (April 2004). Using support vector machines for classifying large sets of multi-represented objects. Proceedings of the 4th SIAM international conference on data mining (SDM). Florida, USA, 102-113.
Kriegel, Hans-Peter, Pryakhin, Alexey y Schubert, Matthias (april, 2005). Multi-represented kNN-classification for large class sets. Proceedings of the 10th international conference on database systems for advanced applications (DASFAA). Beijing, China, 511-522.
Krueger, Anne and Ruttan Vernon. (1989). Development thougth and development assistance. In Aid and Development (pp. 13-28). Baltimore, USA: The Johns Hopkins University Press.
Kuo, Renjieh, Ho, L. M., and Hu, C. M. (2002). Integration of self-organizing feature map and k-means algorithm for market segmentation. Computers and Operations Research, 29(11), 1475-1493.
MacQueen, James B. (1967). Some methods for classification and analysis of mulivariate observations. In L.M. LeCam, J. Neyman (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability Volume: 1 Statistics. University of California Press, Berkely, 281-297. Recuperado de https://projecteuclid.org/euclid.bsmsp/1200512992
Mahdavi, Mehrdad y Abolhassani, Hassan. (2009). Harmony K-means algorithm for document clustering. Data Min Knowl Disc, 18(3), 370-391.
Prasanta, Kumar Dey. (2006). Integrated project evaluation and selection using multiple-attribute decision-making technique. International Journal Production Economics, 103(1), 90-103.
Reguia, Cherroun. (2014). Product innovation and the competitive advantage. European Scientific Journal, 1, 140-157.
Schultz, Theodore W. (1961). Investment in human capital. American Economic Review, 51(1), 1-17.
Shian-Chang, Huang, En-Chi, Chang and Hsin-Hung, Wu. (2009). A case study of applying data mining techniques in an outfitter’s customer value analysis. Expert Systems with Applications, 36(3), 5909-5915.
Soto, Jesús A., Flores-Sintas, Antonio and Vigo, M. Isabel. (2004). Marco formal para una nueva función objetivo en agrupación difusa. Revista Iberoamericana de Inteligencia Artificial, 8(23), 35-41.
Tan, Pang-Ning, Steinbach, Michael and Kumar, Vipin. (2006). Introduction to Data Mining. USA: Pearson Addison New York, Wesley.
Washio, Takashi and Motoda, Hiroshi. (2003). State of the art of graph-based data mining. ACM SIGKDD Explorations Newsletter, 5(1), 59-68.
Weidmann, Nils, Eibe, Frank and Bernhard, Pfahringer. (September, 2003). A two-level learning method for generalized multinstance problems. Proceedings of the 14th European conference on machine learning (ECML), Cavtat-Dubrovnik, Croatia, 468-479. Recuperado de https://link.springer.com/chapter/10.1007/978-3-540-39857-8_42
Wu, Xindong, Kumar, Vipin, Quinlan, J. Ross, Ghosh, Joydeep, Yang, Qiang, Motoda, Hiroshi … Steinberg, Dan. (2008). Top 10 algorithms in data mining. Knowl Inf Syst, 14(1), 1-37.
Yarowsky, David. (1995). Unsupervised word sense disambiguation rivaling supervised methods. ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics. Stroudsburg, PA, USA, 189-196.