Abstract
This paper describes how a Rasch model (Many-Facet Rasch Measurement) can be applied to performance assessment focusing on analysis of examinee, raters, tasks and variables. The article provides an introduction to MFRM, a description of analysis procedures, and an illustrative example to examine the effects of various sources of variability on students’ performance on a writing test by means of the FACETS program. Results highlight the usefulness of the MFRM to detect raters that have extreme values on the continuum of severity/leniency as well as providing objective measurement of examinee (scores free of rater severity).
References
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573.
Congdom, P. J. y McQueen, J. (2000). The stability of rater severity in large-scale assessment programs. Journal of Educational Measurement, 37, 163-178.
Eckes, T. (2011). Introduction to Many-Facet Rasch Measurement. Franfurt am Main: Peter Lang.
Engelhard, G. (2002). Monitoring raters in performance assessment. En G. Tindall y T. Haladyna (Eds.), Large-scale assessment programs for all students: Development, implementation, and analysis. (pp. 261-287). Mahwah, NJ: Erlbaum.
Engelhard, G. (2013). Invariant Measurement. Using Rasch Models in the Social, Behavioral, and Health Sciences. New York and London: Routledge.
Gyagenda, I. S. y Engelhard, G. (2010). Using Classical and Modern Measurement Theories to Explore Rater, Domain, and Gender Influences on Student Writing Ability. En M. L. Garner, G. Engelhard, W. P. Fisher y M. Wilson (Eds). Advances in Rasch Measurement Volume I (398-429). Maple Grove, MN: JAM Press.
Hambleton, R. K. (2000). Advances in performance assessment methodology. Applied Psychological Measurement, 24, 291-293.
Kondo-Brown, K. (2002). An analysis of rater bias with FACETS in measuring Japanese L2 writing performance. Language Testing, 19, 1-29.
Lane, S. y Stone, C.A. (2006). Performance Assessment. En R. L. Brennan (Ed.): Educational Measurement (pp 387-431). Wesport, CT: ACE/Praeger.
Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago: MESA Press.
Linacre, J. M. y Wright, B. D. (2002). Construction of measures from many-facet data. Journal of Applied Measurement, 3, 484-509.
Linacre, J. M. (2004). Optimizing rating scale category effectiveness. En E. V. Smith y R. M. Smith (Eds.) Introduction to Rasch Measurement (pp. 48-72). Maple Grove, MN: JAM Press.
Linacre, J. M. (2010). A user’s guide to Facets: Rasch model computer programs. Chicago: Winsteps.com.
Linacre, J. M. (2015). Facet Rasch Measurement computer program (Version 3.71.3) (Computer program). Chicago: Winsteps.com.
Martínez Arias, R. (2010). La evaluación del desempeño. Papeles del Psicólogo, 31, 85-96.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
McNamara, T. F. (2000). Language testing. Oxford, UK: Oxford University Press.
Myford, C. M. y Wolfe, E. W. (2004a) Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part I. En E. V. Smith y R. M. Smith (Eds.) Introduction to Rasch Measurement (pp. 460-517). Maple Grove, MN: JAM Press.
Myford, C. M. y Wolfe, E. W. (2004b) Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part II. En E. V. Smith y R. M. Smith (Eds.) Introduction to Rasch Measurement (pp. 518-574). Maple Grove, MN: JAM Press.
Park, T. (2004). An Investigation of an ESL Placement Test of Writing Using Many- facet Rasch Measurement, Papers in TESOL & Applied Linguistics, 4, 1-21.
Prieto, G. (2011). Evaluación de la ejecución mediante el modelo Many-Facet Rasch Measurement. Psicothema, 23, 233-238.
Prieto, G. y Delgado, A. (2003). Análisis de un test mediante el modelo de Rasch. Psicothema, 15, 94-100.
Prieto, G. y Nieto, E. (2014). Analysis of rater severity on written expression exam using Many Faceted Rasch Measurement. Psicológica, 35, 285-397.
Rasch, G. (1960). Probabilistics models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
Rasch, G. (1977). On specific objectivity: An attempt at formalazing the request for generality and validity of scientific statements. Danish Yearbook of Philosophy, 14, 58-94.
Smith R. M., Shumacker R. E. y Bush M. J. (1998). Using item means squares to evaluate fit to the Rasch model. Journal of Outcome Measurement , 2, 66-78.
Tesio, L., Simone, A., Grzeda, M. T., Ponzio, M., Dati, G., Zaratin, P., Perucca, L. y Battaglia, M. A. (2015). Funding Medical Research Projects: Taking into Account Referees’ Severity and Consistency through Many-Faceted Rasch Modeling of Projects’ Scores. Journal of Applied Measurement, 16, 129-152.
Tyndall, B. y Kenyon, D. M. (1996) Validation of a new holistic rating scale using Rasch multi- faceted analysis. En A. Cumming y R. Berwick (Eds.), Validation in language testing (pp. 39-57). Clevedon: Multilingual Matters.
Wolfe, E.W. (2009). Item and Rater Analysis of Constructed Response Items via the Multi-Faceted Rasch Model. Journal of Applied Measurement, 10, 335-347.
Wright, B. D. y Stone, M. H. (1979). Best test design. Chicago: MESA Press.