An alternative to classical latent class models selection methods for sparse binary data: an illustration with simulated data

Carlomagno Araya Alpizar

doi:10.15517/rmta.v23i1.22448

Vol. 23 No. 1 (2016), Articles

Vol. 23 No. 1 (2016)

An alternative to classical latent class models selection methods for sparse binary data: an illustration with simulated data

Articles

https://doi.org/10.15517/rmta.v23i1.22448

Published April 19, 2017

Carlomagno Araya Alpizar⁺⁻

Carlomagno Araya Alpizar

Sede de Occidente, Universidad de Costa Rica, San Ramón, Costa Rica

PDF

Keywords

sparse data
latent class
goodness-of-fit
binary data
datos escasos
clases latentes
bondad de ajuste
datos binarios

How to Cite

Araya Alpizar, C. (2017). An alternative to classical latent class models selection methods for sparse binary data: an illustration with simulated data. Revista De Matemática: Teoría Y Aplicaciones, 23(1), 199–220. https://doi.org/10.15517/rmta.v23i1.22448

Abstract

Within the context of a latent class model with manifest binary variables, we propose an alternative method that solves the problem of estimating empirical distribution with sparse contingency tables and the chi-square approximation for goodness-of-fit will not be valid. We analyze sparse binary data, where there are many response patterns with very small expected frequencies in several data sets varying in degree of sparseness from 1 to 5 defined d = n/2^p = n/R is a factor that is mentioned in almost all prior literature as being an important determinant of how well the distribution is represented by the chi-squared.The proposed approach produced results that were valid and reliable under the mentioned problematic data conditions. Results from the proposal presented compare the rates of Type I for traditional goodness-of-fit tests. We also show that with data density d ≤ 5, Pearson’s statistic (χ²) should not be used to select latent class models using the Patterns Method, given that this has the probability of Type I error being greater than 5%. By comparing the Patterns Method and the Parametric Bootstrap for data density d = 2, we show that the Patterns Method has more accurate Type I error probabilities since the likelihood ratio, Read-Cressie and Freeman-Tukey statistics afford values of α < 0.05. In contrast, the Parametric Bootstrap provides values in these statistics that surpass 5%.

https://doi.org/10.15517/rmta.v23i1.22448

PDF

References

Agresti, A. (2007) An Introduction to Categorical Data Analysis, 2nd Edition. Wiley Interscience, Hoboken NJ.

Agresti, A.; Yang, M.C. (1987) “An empirical investigation of some effects of sparseness in contingency tables”, Computational Statistics & Data Analysis 5(1): 9–21.

Bartholomew, D.J.; Knott, M.; Moustaki, I. (2011) Latent Variable Models and Factor Analysis: A Unified Approach. John Wiley & Sons, Chichester UK.

Bartholomew, D.J.; Leung, S.O. (2002) “A goodness of fit test for sparse 2p contingency tables”, British Journal of Mathematical and Statistical Psychology, 55(1): 1–15.

Bartholomew, D.J.; Tzamourani, P. (1999) “The goodness of fit of latent trait models in attitude measurement”, Sociological Methods & Research 27(4): 525–546.

Cochran, W.G. (1952) “The χ2 test of goodness of fit”, The Annals of Mathematical Statistics 23(3): 315–345.

Cochran, W.G. (1954) “Some methods for strengthening the common χ2 tests”, Biometrics 10(4): 417–451.

Collins, L.M.; Fidler, P.L.; Wugalter, S.E.; Long, J.D. (1993) “Goodness-of-fit testing for latent class models”, Multivariate Behavioral Research28(3): 375–389.

Cramér, H. (1946) Mathematical Methods of Statistics. Princeton University Press, New York.

Davison, A.C.; Fraser, D. ; Reid, N.; Sartori, N. (2013) “Accurate directional inference for vector parameters in linear exponential families”, Journal of the American Statistical Association 109: 302-314.

Dayton, C.M. (1998) Latent Class Scaling Analysis. Sage Publications, Thousand Oaks CA.

Dias, J.G.; Vermunt, J.K. (2006) “Bootstrap methods for measuring classification uncertainty in latent class analysis”, in: Compstat 2006-Proceedings in Computational Statistics, Physica-Verlag HD: 31–41.

Fisher, R.A. (1941) Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh.

Gong, H. (2012) Modeling and Measuring Association for Ordinal Data. M.Sc. dissertation, Faculty of Graduate Studies and Research, University of Regina, Canada.

Kendall, M.G. (1952) The Advanced Theory of Statistics. Vol. 1: Distribution Theory, 5th edition. Griffin, London.

Kojadinovic, I.; Yan, J. (2012) “Goodness-of-fit testing based on a weighted bootstrap: A fast large-sample alternative to the parametric bootstrap”, Canadian Journal of Statistics 40(3): 480–500.

Kraus, K. (2012) On the Measurement of Model Fit for Sparse Categorical Data. Doctoral dissertation, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Statistics, Uppsala University.

Kunihama, T.; Dunson, D.B. (2013) “Bayesian modeling of temporal dependence in large sparse contingency tables”, Journal of the American Statistical Association 108(504): 1324–1338.

Lancaster, H.O.; Seneta, E. (1969) “Chi-square distribution”, in: Encyclopedia of Biostatistics. John Wiley & Sons, Ltd. Florida.

Langeheine, R.; Pannekoek, J.; Van de Pol, F. (1996) “Bootstrapping goodness-of-fit measures in categorical data analysis”, Sociological Methods & Research 24(4): 492–516.

Larntz, K. (1978) “Small-sample comparisons of exact levels for chisquared goodness-of-fit statistics”, Journal of the American Statistical Association 73(362): 253–263.

Lazarsfeld, P.F.; Henry, N.W. (1968) Latent Structure Analysis. Houghton Mifflin, Boston.

Mielke P.W.; Berry, K.J. (2002) “Categorical independence tests for large sparse r-way contingency tables”, Perceptual and Motor Skills 95(2): 606–610.

Milovanovic, J. (2011) Chi-Square Orthogonal Components for Assessing Goodness-of-fit of Multidimensional Multinomial Data. Doctoral dissertation, Arizona State University.

Nylund, K.L.; Asparouhov, T.; Muthén, B.O. (2007) “Deciding on the number of classes in latent class analysis and growth mixture modeling: a Monte Carlo simulation study”, Structural Equation Modeling 14(4): 535–569.

Papoulis, A.; Pillai, S.U. (2002) Probability, Random Variables, and Stochastic Processes. McGraw-Hill Education.

Radavičius, M.; Samusenko, P. (2011) “Profile statistics for sparse contingency tables under Poisson sampling”, Austrian Journal of Statistics 40(1-2): 115–123.

Radavičius, M.; Samusenko, P. (2012) “Goodness-of-fit tests for sparse nominal data based on grouping”, Nonlinear Analysis: Modeling and Control 17(4): 489–501.

Reiser, M.; Lin, Y. (1999) “A goodness-of-fit test for the latent class model when expected frequencies are small”, Sociological methodology 29(1): 81–111.

Samusenko, P. (2012) Nonparametric Criteria for Sparse Contingency Tables. Doctoral dissertation, Vilnius Gediminas Technical Univerty, Lithuania.

Tate, M.W.; Hyer, L.A. (1973) “Inaccuracy of the χ2 test of goodness of fit when expected frequencies are small”, Journal of the American Statistical Association 68(344): 836–841.

Tollenaar, N.; Mooijaart, A. (2003) “Type I errors and power of the parametric bootstrap goodness-of-fit test: full and limited information”, British Journal of Mathematical and Statistical Psychology 56(2): 271–288.

Van Der Heijden, P.; Hart, H.; Dessens, J. (1997) “A parametric bootstrap procedure to perform statistical tests in a LCA of anti-social behaviour”, in: J. Rost et al. (Eds.) Applications of Latent Trait and Latent Class Models in the Social Sciences, University of Michigan Library, Ann Arbor: 196–208.

Von Davier, M. (1997) “Bootstrapping goodness-of-fit statistics for sparse categorical data-results of a Monte Carlo study”, Methods of Psychological Research 2(2): 29–48.

Van Kollenburg, G.; Mulder, J.; Vermunt, K. (2015) “Assessing model fit in latent class analysis when asymptotics do not hold methodology”, Methodology: European Journal of Research Methods for the Behavioral and Social Sciences 11(2): 65–79.

##plugins.facebook.comentarios##

Downloads

Download data is not yet available.

An alternative to classical latent class models selection methods for sparse binary data: an illustration with simulated data

Keywords

How to Cite

Download Citation

Abstract

References

##plugins.facebook.comentarios##

Downloads