Abstract
Many data analysis problems deal with non supervised partitioning of a data set, in non empty clusters well separated between them and homogeneous within the clusters. An ideal partitioning is obtained when any object can be assigned a class without ambiguity. The present paper has two main parts; first, we present different methods and heuristics that find the number of clusters for optimal partitioning of a set; afterwards, we propose a new heuristic and we perform different comparisons in order to evaluate the advantages on well known data sets; we end the paper with some concluding remarks.
References
de-los-Cobos-Silva, S.; Goddard, J.; Pérez, B.R.; Gutiérrez, M.A. (2001) “SCA: sistema de clasificación aleatoria”, XV Foro Nacional de Estadística, 8-12 de octubre, Guadalajara-México.
Girolami, M. (2001) “Mercer kernel based clustering in feature space”, I.E.E.E. Transactions on Neural Networks (to appear).
Goddard, J.; de-los-Cobos-Silva, S. (2000) “On a class of distance metrics for fuzzy c-means”, Proc. VII Congress of SIGEF, Chania, Greece: 577–584. Goddard, J.; Martínez, A.E.; Martínez F.M. (1998) “Prototype selection for nearest neighbour classification”, Congreso Latinoamericano de Ingeniería Biomédica, Mazatlán, México.
Roberts, S.J.; Everson, R.; Rezek, I. (2000) “Maximum certainty data partitioning”, I.E.E.E. Patterns Recognotition 33(5): 833–839.
Roberts, S.J.; Everson, R.; Rezek, I.(2001) “Minimum entropy data partitioning”, Technical report, IISGroup, Dep. EEE, Imperial College of Science Technology & Medicine, U.K.
Roberts, S.J.; Everson, R.; Rezek, I.(2001). “Minimum-entropy data clustering using reversible jump Markov chain Monte Carlo”, (to appear in IEEE).
Trejos, J.; Murillo, A.; Piza, E. (1998) “Global stochastic optimization for partitioning”, in: A. Rizzi et al. (Eds.), Advances in Data Science and Classification. Springer, Heidelberg: 185–190.
Romesburg, H.C. (1984) Cluster Analysis for Researchers. Krieger Publishing Company.
Trejos, J. (1996) “Propiedades y aplicaciones de una medida de redundancia de la información: el número equivalente”, Mem. X Foro Nacional de Estadística y II Congreso Iberoamericano de Estadística, Oaxaca- México: 221–226.