Abstract
k-NN criteria are non parametric methods of statistical classificaction. They are accurate, versatile and distribution free. However, their computational cost may be too expensive; especially for large sample sizes. We present a new condensation algorithm based on the Binormal model for ROC curves. It transforms the training sample into a small set of low dimensional vetors. Contrasting with other condensation techniques described in the literature, our proposal helps to control the exchange of accuracy for condensation on
the training sample. The results of a Monte Carlo study show that its performance can be very competitive in different realistic scenarios, resulting in better training samples than other frequently used methods.
References
Bamber, D. (1975) “The area above the ordinal dominance graph and the area below the receiver operating characteristic graph”, Journal of Mathematical and Statistical Psicology 12(4): 387–415.
Cuevas-Covarrubias, C. (2003) Statistical Inference for ROC Curves. Tesis de Doctorado, Departamento de Estadística, Universidad de Warwick, Coventry, Reino Unido.
Cuevas-Covarrubias, C.; Monroy, V.; Ortega, V. (2008) “Aplicación de un algoritmo k -NN para la gestión del capital humano. Predicción del desempeño y detección de competencias críticas en el desarrollo del personal”, Preprint, Up-Pharma, Ciudad de México, México.
Dorfman, D.D.; Alf, E. Jr. (1969) “Maximum likelihood estimation of parameters of signal-detection theory and determination of confidence intervals-rating-method data”, Journal of Mathematical Psychology
(3): 487–496.
Guo, G.; Wang, H.; Bell, D.; Bi, Y.; Greer, K. (2003) “KNN model-based approach in classification”, in: On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE, Lecture Notes in Computer Science, Volume 2888, Springer, Berlin: 986–996.
Hand, D.J. (1994) “Assessing classification rules”, Journal of Applied Statistics 21: 3–16.
Hanley, J.A.; McNeil, B.J. (1982) “The meaning and use of the area under the under a receiver operating characteristic (ROC) curve”, Radiology 143: 29–36.
Henley, W.E.; Hand, D.J. (1996) “A k-nearest-neighbour classifier for assessing consumer credit risk”, The Statistician, 45(1): 77–95.
Krzanowski, W.J.; Hand, D.J. (2009) ROC Curves for Continuous Data. Chapman & Hall/CRC, Londres, Reino Unido.
Zweig, M.H.; Campbell, G. (1993) “Receiver operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine”, Clin. Chem., 39(4): 561–577.