Do not Be Afraid of Missing Data: Modern Approaches to Handle Missing Information

Authors

  • Esteban Montenegro-Montenegro Institute for Measurement, Methodology, Analysis and Policy (IMMAP), Texas Tech University Author https://orcid.org/0000-0003-4572-7142
  • Youngha Oh Texas Tech University Author
  • Steven Chesnut University of Southern Mississippi Author

DOI:

https://doi.org/10.15517/ap.v29i119.18812

Keywords:

missing data, maximum likelihood estimation, full-information maximum likelihoo, multiple imputation, planned missingness, psychometrics.

Abstract

Most of the social and educational data have missing observations due to either attrition or nonresponse. Missing data methodology has improved dramatically in recent years, and popular computer programs as well as software now offer a variety of sophisticated options. Despite the widespread availability of theoretically justified methods, many researchers still rely on old imputation techniques that can create biased analysis. This article provides conceptual introductions to the patterns of missing data. In line with that, this article introduces how to handle and analyze the missing information based on modern mechanisms of full-information maximum likelihood (FIML) and multiple imputation (MI). An introduction about planned missing designs is also included and new computational tools like Quark function, and semTools package are also mentioned. The authors hope that this paper encourages researchers to implement modern methods for analyzing missing data.

Downloads

Download data is not yet available.

Author Biographies

  • Esteban Montenegro-Montenegro, Institute for Measurement, Methodology, Analysis and Policy (IMMAP), Texas Tech University

    Estudiante doctoral en Psicología Educativa en el programa 

    Research, Evaluation, Measurement, and Statistics (REMS) Concentration. 

    Asistente del Institute for Measurement,Methodology,Analysis & Policy(IMMAP)

  • Youngha Oh, Texas Tech University

    Institute for Measurement, Methodology, Analysis and Policy.

  • Steven Chesnut, University of Southern Mississippi

    Institute for Measurement, Methodology, Analysis and Policy.

References

Allison, P.D. (2012). Handling Missing Data by Maximum Likelihood, SAS Global Forum, paper 312-2012. Recuperado de: http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf.

Arbuckle, J.L.(1996).Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling: Issues and techniques (pp. 243-277). Mahwah, NJ: Lawrence Erlbaum.

Baraldi, A. N., & Enders, C. K. (2010). An introduction to modern missing data analyses. Journal of School Psychology, 48(1), 5–37. DOI: https://doi.org/10.1016/j.jsp.2009.10.001

Boker, S., Neale, M., Maes, H. H., Wilde, M., Spiegel, M., Brick, T., . . Fox,J.(2011).OpenMx:Anopensourceextendedstructuralequation modeling framework. Psychometrika, 76, 306–317 DOI: https://doi.org/10.1007/s11336-010-9200-6

Chesnut, S. R., Squire, D., Little, T. D., & Wang, E. W. (2014). Quark: An R library for preparing large datasets for multiple imputation with auxiliary variables. [SOFTWARE ADD-ON], USA, Texas Tech University, Institute of Measurement, Methodology, and Policy (IMMAP).

Collins, L.M., Schafer, J.L., & Cam, C.M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods, 6(4). 330-51. DOI: https://doi.org/10.1037//1082-989X.6.4.330

Davey, A., & Savla, J. (2008). Estimating Statistical Power With Incomplete Data. Organizational Research Methods, 12(2), 320–346. DOI: https://doi.org/10.1177/1094428107300366

Enders, C.K. & Bandalos, D.L. (2001). The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models. Structural Equation Modeling, 8(3), 430–457. DOI: https://doi.org/10.1207/S15328007SEM0803_5

Enders, C. (2010). Applied Missing Data Analysis-Methodology in Social Sciences. New York: Guilford Press.

Graham, J. W., Hofer, S. M., & MacKinnon, D. P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research, 31, 197–218. DOI: https://doi.org/10.1207/s15327906mbr3102_3

Graham, J. W. (2003). Adding missing-data relevant variables to FIML-based structural equation models. Structural Equation Modeling , 10. 80-100. DOI: https://doi.org/10.1207/S15328007SEM1001_4

Graham, J.W., Taylor, B.J., & Olchowski A.E. (2006). Planned missing data designs in psychological research. Psychological Methods, 11(4), 323-343. DOI: https://doi.org/10.1037/1082-989X.11.4.323

Graham, J.W. (2009). Missing data analysis: making it work in the real world. Annual Review of Psychology, 60. 549-76. DOI: https://doi.org/10.1146/annurev.psych.58.110405.085530

Graham, J. W. (2012) Missing data: Analysis and design. New York: Springer. DOI: https://doi.org/10.1007/978-1-4614-4018-5

Harel, O., Stratton, J., & Aseltine, R. (2011). Designed missingness to better estimate efficacy of behavioral studies (Technical Report 11-15). Storrs, CT: Department of Statistics, University of Connecticut.

Howard, W. J., Little, T. D., & Rhemtulla, M. (in press). Using principal component analysis (PCA) to obtain auxiliary variables for missing data estimation in large data sets. Multivariate Behavioral Research.

Jia, F., Moore, E. W. G., Kinai, R., Crowe, K. S.,Schoemann, A. M., & Little, T. D. (2014). Planned missing data design on small sample size: How small is too small? International Journal of Behavioral Development, 38(5). 435-452. DOI: https://doi.org/10.1177/0165025414531095

Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate statistical analysis (5th ed.). Upper Saddle River, NJ: Prentice Hall.

Kaiser, H. F. (1970). A second generation little jiffy. Psychometrika, 35, 401-415. DOI: https://doi.org/10.1007/BF02291817

Kline, R. B. (2010). Principles and Practice of Structural Equation Modeling, Third Edition (3rd edition). New York: The Guilford Press.

Little, R.K.A., & Rubin, D.B. (2002). Statistical Analysis with Missing Data (2nd ed.). Hoboken, NJ: Wiley-Interscience. DOI: https://doi.org/10.1002/9781119013563

Little, T. D. (2013). Longitudinal structural equation modeling. New York, NY: Guilford.

Little, T.D., Jorgensen, T.D., Lang, K.M., & Moore, E.W. (2014).On the Joys of Missing Data. Journal of Pediatric Psychology, 39(2). 151-162. DOI: https://doi.org/10.1093/jpepsy/jst048

Garnier-Villarreal, M., Rhemtulla, M., & Little, T. D. (2014). Two-method planned missing designs for longitudinal research. International Journal of Behavioral Development, 38(5), 411–422. DOI: https://doi.org/10.1177/0165025414542711

Mooijaart, A. (2003). Estimating the statistical power in small samples by empirical distributions. En: New developments in psychometrics, H. Yanai, A. Okada, K. Shigemasu,Y. Kano y J.J. Meulman (eds.), pp. 149-156. DOI: https://doi.org/10.1007/978-4-431-66996-8_15

Muthén, L. K., & Muthén, B. O. (1998-2013). Mplus User's Guide. Los Angeles, CA: Muthen &

Muthen.

Muthén, L. K., & Muthén, B.O. (2002). How to use a Monte Carlo Study to decide on sample size and determine power. Structural Equation Modeling, 9, 599-620. DOI: https://doi.org/10.1207/S15328007SEM0904_8

Pornprasertmanit, S., Miller, P., & Schoemann, A. (2014). simsem: Simulated structural equation modeling. R package version 0.5–8. Recuperado de: http://www.simsem.org

Pornprasertmanit, S., Miller, P., Schoemann, A. & Rosseel, Y. (2015). semTools: Useful tools for structural equation modeling. R package version 0.4–6. Disponible en: http://CRAN.R-project.org/package=semTools.

Popham, W. J. (1993). Circumventing the high costs of authentic assessment. Phi Delta Kappan, 74(6), 470-473.

R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

Raghunathan, T. E., & Grizzle, J. E. (1995). A split questionnaire survey design. Journal of the American Statistical Association, 90, 54–63. DOI: https://doi.org/10.1080/01621459.1995.10476488

Rhemtulla, M., Jia, F., Wu, W., & Little, T. D. (2014). Planned missing designs to optimize the efficiency of latent growth parameter estimates. International Journal of Behavioral Development, 38(5), 423–434. DOI: https://doi.org/10.1177/0165025413514324

Rhemtulla, M.& Little, T. (2012). Tools of the Trade: Planned Missing Data Designs for Research in Cognitive Development. Journal of Cognitive Development, 13(4). 425-438 DOI: https://doi.org/10.1080/15248372.2012.717340

Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48, 1–36. Disponible en: http://lavaan.ugent.be/ DOI: https://doi.org/10.18637/jss.v048.i02

Rubin, D.B. (1976). Inference and Missing Data. Biometrika, 63(3). 581-592. DOI: https://doi.org/10.1093/biomet/63.3.581

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley. DOI: https://doi.org/10.1002/9780470316696

Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York: Chapman & Hall. DOI: https://doi.org/10.1201/9781439821862

Schafer, J.L., & Graham, J.W. (2002). Missing Data: Our View of the State of the Art. Psychological Methods, 7(2). 147–177. DOI: https://doi.org/10.1037/1082-989X.7.2.147

Shoemaker, D. M. (1973). Principles and procedures of multiple matrix sampling. Cambridge, MA: Ballinger.

Sirontnik, K.A. (1974). Introduction to matrix sampling for the practitioner. In W.J. Pophan (ed.), Evaluation in Education. Berkeley, CA: McCurtchau Publishing Corp.

van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: CRC Press. DOI: https://doi.org/10.1201/b11826

Yuan, K., & Hayashi, K. (2003). Bootstrap approach to inference and power analysis based on three statistics for covariance structure models. British Journal of Mathematical and Statistical Psychology, 56, 93–110. DOI: https://doi.org/10.1348/000711003321645368

Published

2015-11-13

How to Cite

Montenegro-Montenegro, E., Oh, Y., & Chesnut, S. (2015). Do not Be Afraid of Missing Data: Modern Approaches to Handle Missing Information. Actualidades En Psicología, 29(119), 29-42. https://doi.org/10.15517/ap.v29i119.18812