Abstract
This paper presents a variant in the methods for clustering: a genetic algorithm for clustering through the tools of symbolic data analysis. Their implementation avoids the troubles of clustering classical methods: local minima and dependence of data types: numerical vectors (continuous data type).
The proposed method was programmed in MatLab©R and it uses an interesting operator of encoding. We compare the clusters by their intra-clusters inertia. We used the following measures for symbolic data types: Ichino-Yaguchi dissimilarity measure, Gowda-Diday dissimilarity measure, Euclidean distance and Hausdorff distance.
References
Arroyo, J; Maté, C. (2009) “Descriptive distance-based statistics for histogram data”, in: 11th Conference of the International Federation of Classification Societies, March 13–18, Dresden: 105–106.
Billard, L.; Diday, E. (2006) Symbolic Data Analysis: Conceptual Statistics And Data Mining. Wiley, New York.
Castillo, W.; González, J.; Trejos, J. (2009) Análisis Multivariado de Datos. Manuscrito en preparación.
Larrañaga, P.; Lozado, J. (2002) Estimation of Distribution Algorithms. Kluwer Academic Publishers, Dordrecht.
MacQueen, J. (1967) “Some methods for classification and analysis of multivariate observations”, Proc. Fifth Berkeley Symp. on Math. Statist. and Prob., Vol. 1, University of California Press, Berkeley: 281–297.
Pham, D.T.; Karaboga, D. (2000) Intelligent Optimization Techniques. Springer, London.