Abstract
In this research, a methodology is presented to improve strategies of analysis in situations where supervised classification becomes the fundamental tool for business decision. The need to categorize the new customers into one of several groups, according to the characteristics of the subject, is analyzed through the calculation of the error rate. Programs were written using the statistical software package R, to calculate the error rate of each of nine classifiers, using cross-validation method 10 (Stone, 1974), in the 50 permutations of the data under consideration. For each of the analyzed data sets it was demonstrated, through ANOVA, that there are indeed significant differences in the average error rates of classifiers (p=0.00); therefore, it is concluded that the best classifier is the one with the lowest error rate.References
Antipov, E., & Pokryshevskaya, E. (2010). Applying CHAID for logistic regression diagnostics and classification accuracy improvement. Journal of Targeting, Measurement and Analysis for Marketing, 18 (2), 109-117.
Blake, C. L., & Merz, C. J. (1998). Churn Data Set. University of California. Department of Information and Computer Science, Irvin, CA. Recuperado de: http://www.sgi.com/tech/mlc/db/churn.data
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Boca Raton, FL: CRC Press LLC.
Dobson, A. (2002). An Introduction to Generalized Linear Models. Boca Raton, FL: CRC Press LLC. doi:10.1002/sim.1493
Hothorn, T., Hornik, K., van de Wiel, M., & Zeileis, A (2006). A Lego System for Conditional Inference. The American Statistician, 60 (3), 257–263. doi:10.1198/000313006X118430
Manning, C., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. London: Cambridge University Press.
Ripley, B. D. (1996). Pattern Recognition and Neural Networks. London: Cambridge University Press.
Smith, C. (1947). Some examples of discrimination. Ann. Eugenic 18, 272–282.
Stone, M. (1974). Cross-validatory choice and the assessment of statistical predictions (with discussion). Journal of the Royal Statistical Society, B 36, 111-133.
Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S. New York, NY: Springer-Verlag. doi:10.1007/978-0-387-21706-2
Witten, I., Frank, E., & Hall, M. (2011). Data Mining: Practical Machine Learning Tools and Techniques. Burlington, MA: Morgan Kaufmann.