Resumo
La mayoría de los datos en ciencias sociales y educación presentan valores perdidos debido al abandono del estudio o la ausencia de respuesta. Los métodos para el manejo de datos perdidos han mejorado gramáticamente en los últimos años, y los programas computacionales ofrecen en la actualidad una variedad de opciones sofisticadas. A pesar de la amplia disponibilidad de métodos considerablemente justificados, muchos investigadores e investigadoras siguen confiando en técnicas viejas de imputación que pueden crear análisis sesgados. Este artículo presenta una introducción conceptual a los patrones de datos perdidos. Seguidamente, se introduce el manejo de datos perdidos y el análisis de los mismos con base en los mecanismos modernos del método de máxima verosimilitud con información completa (FIML, siglas en inglés) y la imputación múltiple (IM). Asimismo, se incluye una introducción a los diseños de datos perdidos así como nuevas herramientas computacionales tales como la función Quark y el paquete semTools. Se espera que este artículo incentive el uso de métodos modernos para el análisis de los datos perdidoReferências
Allison, P.D. (2012). Handling Missing Data by Maximum Likelihood, SAS Global Forum, paper 312-2012. Recuperado de: http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf.
Arbuckle, J.L.(1996).Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling: Issues and techniques (pp. 243-277). Mahwah, NJ: Lawrence Erlbaum.
Baraldi, A. N., & Enders, C. K. (2010). An introduction to modern missing data analyses. Journal of School Psychology, 48(1), 5–37.
Boker, S., Neale, M., Maes, H. H., Wilde, M., Spiegel, M., Brick, T., . . Fox,J.(2011).OpenMx:Anopensourceextendedstructuralequation modeling framework. Psychometrika, 76, 306–317
Chesnut, S. R., Squire, D., Little, T. D., & Wang, E. W. (2014). Quark: An R library for preparing large datasets for multiple imputation with auxiliary variables. [SOFTWARE ADD-ON], USA, Texas Tech University, Institute of Measurement, Methodology, and Policy (IMMAP).
Collins, L.M., Schafer, J.L., & Cam, C.M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods, 6(4). 330-51.
Davey, A., & Savla, J. (2008). Estimating Statistical Power With Incomplete Data. Organizational Research Methods, 12(2), 320–346.
Enders, C.K. & Bandalos, D.L. (2001). The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models. Structural Equation Modeling, 8(3), 430–457.
Enders, C. (2010). Applied Missing Data Analysis-Methodology in Social Sciences. New York: Guilford Press.
Graham, J. W., Hofer, S. M., & MacKinnon, D. P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research, 31, 197–218.
Graham, J. W. (2003). Adding missing-data relevant variables to FIML-based structural equation models. Structural Equation Modeling , 10. 80-100.
Graham, J.W., Taylor, B.J., & Olchowski A.E. (2006). Planned missing data designs in psychological research. Psychological Methods, 11(4), 323-343.
Graham, J.W. (2009). Missing data analysis: making it work in the real world. Annual Review of Psychology, 60. 549-76.
Graham, J. W. (2012) Missing data: Analysis and design. New York: Springer.
Harel, O., Stratton, J., & Aseltine, R. (2011). Designed missingness to better estimate efficacy of behavioral studies (Technical Report 11-15). Storrs, CT: Department of Statistics, University of Connecticut.
Howard, W. J., Little, T. D., & Rhemtulla, M. (in press). Using principal component analysis (PCA) to obtain auxiliary variables for missing data estimation in large data sets. Multivariate Behavioral Research.
Jia, F., Moore, E. W. G., Kinai, R., Crowe, K. S.,Schoemann, A. M., & Little, T. D. (2014). Planned missing data design on small sample size: How small is too small? International Journal of Behavioral Development, 38(5). 435-452.
Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate statistical analysis (5th ed.). Upper Saddle River, NJ: Prentice Hall.
Kaiser, H. F. (1970). A second generation little jiffy. Psychometrika, 35, 401-415.
Kline, R. B. (2010). Principles and Practice of Structural Equation Modeling, Third Edition (3rd edition). New York: The Guilford Press.
Little, R.K.A., & Rubin, D.B. (2002). Statistical Analysis with Missing Data (2nd ed.). Hoboken, NJ: Wiley-Interscience.
Little, T. D. (2013). Longitudinal structural equation modeling. New York, NY: Guilford.
Little, T.D., Jorgensen, T.D., Lang, K.M., & Moore, E.W. (2014).On the Joys of Missing Data. Journal of Pediatric Psychology, 39(2). 151-162.
Garnier-Villarreal, M., Rhemtulla, M., & Little, T. D. (2014). Two-method planned missing designs for longitudinal research. International Journal of Behavioral Development, 38(5), 411–422.
Mooijaart, A. (2003). Estimating the statistical power in small samples by empirical distributions. En: New developments in psychometrics, H. Yanai, A. Okada, K. Shigemasu,Y. Kano y J.J. Meulman (eds.), pp. 149-156.
Muthén, L. K., & Muthén, B. O. (1998-2013). Mplus User's Guide. Los Angeles, CA: Muthen &
Muthen.
Muthén, L. K., & Muthén, B.O. (2002). How to use a Monte Carlo Study to decide on sample size and determine power. Structural Equation Modeling, 9, 599-620.
Pornprasertmanit, S., Miller, P., & Schoemann, A. (2014). simsem: Simulated structural equation modeling. R package version 0.5–8. Recuperado de: http://www.simsem.org
Pornprasertmanit, S., Miller, P., Schoemann, A. & Rosseel, Y. (2015). semTools: Useful tools for structural equation modeling. R package version 0.4–6. Disponible en: http://CRAN.R-project.org/package=semTools.
Popham, W. J. (1993). Circumventing the high costs of authentic assessment. Phi Delta Kappan, 74(6), 470-473.
R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Raghunathan, T. E., & Grizzle, J. E. (1995). A split questionnaire survey design. Journal of the American Statistical Association, 90, 54–63.
Rhemtulla, M., Jia, F., Wu, W., & Little, T. D. (2014). Planned missing designs to optimize the efficiency of latent growth parameter estimates. International Journal of Behavioral Development, 38(5), 423–434.
Rhemtulla, M.& Little, T. (2012). Tools of the Trade: Planned Missing Data Designs for Research in Cognitive Development. Journal of Cognitive Development, 13(4). 425-438
Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1–36. Disponible en: http://lavaan.ugent.be/
Rubin, D.B. (1976). Inference and Missing Data. Biometrika, 63(3). 581-592.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York: Chapman & Hall.
Schafer, J.L., & Graham, J.W. (2002). Missing Data: Our View of the State of the Art. Psychological Methods, 7(2). 147–177.
Shoemaker, D. M. (1973). Principles and procedures of multiple matrix sampling. Cambridge, MA: Ballinger.
Sirontnik, K.A. (1974). Introduction to matrix sampling for the practitioner. In W.J. Pophan (ed.), Evaluation in Education. Berkeley, CA: McCurtchau Publishing Corp.
van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: CRC Press.
Yuan, K., & Hayashi, K. (2003). Bootstrap approach to inference and power analysis based on three statistics for covariance structure models. British Journal of Mathematical and Statistical Psychology, 56, 93–110.