Actualidades Investigativas en Educación ISSN electrónico: 1409-4703

Evaluación de la inteligencia artificial y de la calibración de docentes en los cursos de escritura de inglés como lengua extranjera en una universidad pública costarricense
PDF (English)

Palabras clave

artificial intelligence
higher education
second language instruction
writing (composition)
inteligencia artificial
educación superior
enseñanza de una lengua extranjera
expresión escrita

Cómo citar

Charpentier-Jiménez, W. (2024). Evaluación de la inteligencia artificial y de la calibración de docentes en los cursos de escritura de inglés como lengua extranjera en una universidad pública costarricense. Actualidades Investigativas En Educación, 24(1), 1–25.


Este artículo explora la evaluación de la inteligencia artificial (IA) en cursos de escritura en inglés como lengua extranjera (ILE) y la importancia de la calibración en las evaluaciones de escritura. El papel de la calibración ha recibido poca atención en contextos lingüísticos, mientras que la inteligencia artificial ha ganado mayor reconocimiento en los últimos años. La investigación se realizó desde agosto de 2022 hasta marzo de 2023, e involucró a ocho estudiantes de TESOL en un bachillerato en inglés como lengua extranjera (ILE) en una universidad pública de Costa Rica: diez docentes de TESOL a nivel universitario y un software de IA. Se utilizó un diseño cuasiexperimental cuantitativo y una recopilación de datos de elicitación de lenguaje. Los datos fueron recopilados mediante una rúbrica que midió la producción escrita. Los datos cuantitativos se analizaron utilizando estadística descriptiva. El análisis de datos indica que: 1) los párrafos creados por humanos (X̄ = 7,56) y la escritura de IA (X̄ = 7,61) producen resultados similares; 2) algunos criterios pueden favorecer la creatividad humana o la escritura orientada a reglas; y 3) el profesorado presenta inconsistencias al calificar la escritura humana en particular. Estos hallazgos demuestran que la IA se equipara, al menos a nivel básico, con las habilidades de escritura humana. Además, los datos muestran que el estudiantado puede estar quedándose atrás en aspectos como gramática, vocabulario y puntuación. Finalmente, el análisis indica que la calificación de docentes carece de consistencia, y un modelo de calibración debería ser incorporado como parte de su formación.
PDF (English)


Abd-Elaal, El-Sayed., Gamage, Sithara., and Mills, Julie. (2022). Assisting academics to identify computer generated writing. European Journal of Engineering Education, 47(5), 725-745.

Adamopoulou, Eleni., and Moussiades, Lefteris. (2020). An Overview of Chatbot Technology. In Ilias Maglogiannis, Lazaros Iliadis, and Elias Pimenidis (Eds.), Artificial Intelligence Applications and Innovations (Vol. 584, pp. 373–383). Springer International Publishing.

Adler-Kassner, Linda., and O’Neill, Peggy. (2010). Reframing writing assessment to improve teaching and learning. Utah State University Press.

Arindra, Margaretha Yola., and Ardi, Priyatno. (2020). The Correlation between Students’ Writing Anxiety and the Use of Writing Assessment Rubrics. LEARN Journal: Language Education and Acquisition Research Network, 13(1), 76–93.

Arora, Varun. (2022). Artificial intelligence in schools: a guide for teachers, administrators, and technology leaders. Routledge.

Bernard, Etienne. (2021). Introduction to machine learning. Wolfram Media.

Beyduz, Baris. (2023). The Parent`s Guide to Artificial Intelligence and Education: Helping your Child Adapt and Succeed in a Rapidly Changing World: How A.I. Will Shape Our Kids. Independently published.

Booth, Melanie. (n.d.). College-Level Writing Rubric. Saint Mary’s College.

Bourg, David M., and Seemann, Glenn. (2004). AI for game developers. O’Reilly.

Brown, H. Douglas., and Abeywickrama, Priyanvada. (2019). Language assessment: principles and classroom practices (3th ed.). Pearson Education.

Brown, H. Douglas., and Lee, Heekyeong. (2015). Teaching by principles: an interactive approach to language pedagogy (4th ed.). Pearson Education.

Cameron, Ryan M. (2019). A.I. - 101: a primer on using artifical intelligence in education. Exceedly Press.

Campbell, Madelaine. (2019). Teaching Academic Writing in Higher Education. Education Quarterly Reviews, 2(3).

Carr, Nathan T. (2000). A Comparison of the Effects of Analytic and Holistic Rating Scale Types in the Context of Composition Tests. Issues in Applied Linguistics, 11(2).

Cheung, Yin Ling. (2016). Teaching Writing. In Willy A. Renandya and Handoyo Puji Widodo (Eds.), English Language Teaching Today: Linking Theory and Practice (1st ed. 2016). Springer International Publishing: Imprint: Springer.

Clark, Donald. (2020). Artificial intelligence for learning: how to use AI to support employee development. Kogan Page Limited.

Congdon, Peter J., and McQueen, Joy. (2000). The Stability of Rater Severity in Large-Scale Assessment Programs. Journal of Educational Measurement, 37(2), 163-178.

Coombe, Christine A., Folse, Keith S., and Hubley, Nancy J. (2007). A practical guide to assessing English laugage learners. University of Michigan.

CopyAI, Inc. (2022). (July 14 version) [Large language model].

Creswell, John. (2019). Educational research: planning, conducting, and evaluating quantitative and qualitative research (6th ed.). Pearson.

Dunn, Michael. (2021). The Challenges of Struggling Writers: Strategies That Can Help. Education Sciences, 11(12), 795.

Ericsson, Patricia., and Haswell, Richard. (2006). Machine Scoring of Student Essays: Truth and Consequences. USU Press Publications.

Ferris, Dana., and Hedgcock, John S. (2023). Teaching L2 composition: purpose, process, and practice (4th ed.). Routledge.

Fulcher, Glenn. (2010). Practical language testing. Hodder Education.

Ghalib, Thikra., and Al-Hattami, Abdulghani. (2015). Holistic versus Analytic Evaluation of EFL Writing: A Case Study. English Language Teaching, 8(7), p225.

Giansiracusa, Noah. (2021). Crafted by Computer: Artificial Intelligence Now Generates Headlines, Articles, and Journalists. In Noah Giansiracusa, How Algorithms Create and Prevent Fake News (pp. 17–39). Apress.

Glass, Kathy Tuchman. (2005). Curriculum design for writing instruction: creating standards-based lesson plans and rubrics. Corwin Press.

Gulson, Kalervo N., Sellar, Sam., and Webb, P. Taylor. (2022). Algorithms of education: how datafication and artificial intelligence shape policy. University of Minnesota Press.

Gunnell, K. L., Fowler, D., and Colaizzi, K. (2016). Inter-rater reliability calibration program: critical components for competency-based education. The Journal of Competency-Based Education, 1(1), 36-41.

Gwet, Kilem Li. (2014). Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among raters (Fourth edition). Advances Analytics, LLC.

Hamdan, Allam Mohammed Mousa., Hassanien, Aboul Ella., Khamis, Reem., Alareeni, Bahaaeddin., Razzaque, Ajum., and Awwad, Bahaa Sobhi Abde Latif. (Eds.). (2021). Applications of artificial intelligence in business, education and healthcare. Springer.

Harmer, Jeremy. (2011). How to teach writing (9a. impr). Longman, Pearson Education.

Hernández Sampieri, Roberto., Fernández Collado, Carlos., and Baptista Lucio, Pilar. (2010). Metodología de la investigación (5a. ed). McGraw-Hill.

Holmes, Wayne., and Porayska-Pomsta, Kaska. (Eds.). (2023). The ethics of artificial intelligence in education: practices, challenges, and debates. Routledge, Taylor and Francis Group.

Hyland, Ken. (2019). Second language writing (2nd ed.). Cambridge University Press.

Johnston, Michael. (2023). The Artificial Intelligence Disruption: How to Adapt and Succeed in the Age of Intelligent Machines. Self Published.

Jones, Herbert. (2018). Deep Learning: An Essential Guide to Deep Learning for Beginners Who Want to Understand How Deep Neural Networks Work and Relate to Machine Learning and Artificial Intelligence. CreateSpace Independent Publishing Platform.

Kent, David. (2022). Artificial intelligence in education: fundamentals for educators. Kotesol DDC.

Kochmar, Ekaterine. (2022). Getting started with Natural Language Processing. Manning Publications.

Lasry, Brigitte., and Kobayashi, Hael. (Eds.). (2018). Human decisions: thoughts on AI. United Nations Educational, Scientific and Cultural Organization.

Li, Wentao. (2022). Scoring rubric reliability and internal validity in rater-mediated EFL writing assessment: Insights from many-facet Rasch measurement. Reading and Writing, 35(10), 2409–2431.

Luo, Bei., Lau, Raymond Y. K., Li, Chunping., and Si, Yain‐Whar. (2022). A critical review of state‐of‐the‐art chatbot designs and applications. WIREs Data Mining and Knowledge Discovery, 12(1).

Ma, Wenyue. (2022). What the analytic versus holistic scoring of international teaching assistants can reveal: Lexical grammar matters. Language Testing, 39(2), 239–264.

Mackey, Alison, and Gass, Susan. (2016). Second language research: methodology and design (2nd ed.). Routledge.

Martín-Marchante, Beatriz. (2022). The use of ICTs and artificial intelligence in the revision of the writing process in Valencian public universities. Research in Education and Learning Innovation Archives, (28), 16-31.

McAllister, Ken., and White, Edward. (2006). Interested Complicities: The Dialectic of Computer-Assisted Writing Assessment. In Patricia Ericsson and Richard Haswell, Machine Scoring of Student Essays: Truth and Consequences (pp. 8-27). USU Press Publications.

McRoy, Susan. (2021). Principles of natural language processing. Susan McRoy.

Mertler, Craig. (2019). Introduction to educational research (2nd ed.). SAGE Publications, Inc.

Murray, Denise E., and Christison, MaryAnn. (2011). What English language teachers need to know. Routledge.

Nation, Paul. (2009). Teaching ESL/EFL reading and writing. Routledge.

Nosratinia, Mania., and Razavi, Faezeh. (2016). Writing Complexity, Accuracy, and Fluency among EFL Learners: Inspecting Their Interaction with Learners’ Degree of Creativity. Theory and Practice in Language Studies, 6(5), 1043-1052.

Oh, Saerhim. (2020). Second Language Learners’ Use of Writing Resources in Writing Assessment. Language Assessment Quarterly, 17(1), 60–84.

Page, Ellis. (1966). The Imminence of... Grading Essays by Computer. The Phi Delta Kappan, 47(5), 238-243.

Page, Ellis., and Dieter, Paulus. (1968). The Analysis of Essays by Computer (Final Report of U.S. Office of Education Project No. 6-1318). Washington, DC: Department of Health, Education, and Welfare. ERIC Document Reproduction Service, ED 028 633.

Page, Ellis., and Petersen, Nancy. (1995). The computer moves into essay grading: Updating the ancient test. Phi Delta Kappan, 76(7), 561.

Peaci̇, Davut. (2020). Writing evaluation in university English preparatory programs: Two universities of Turkey and Saudi Arabia. Dil ve Dilbilimi Çalışmaları Dergisi, 16(1), 253–264.

Popenici, Stefan. (2023). Artificial intelligence and learning futures: critical narratives of technology and imagination in higher education. Routledge.

Raaijmakers, Stephan. (2022). Deep learning for natural language processing. Manning Publications Co.

Raynor, William J. (2009). International dictionary of artificial intelligence (2. ed., new ed). Global Professional Publ.

Reid, Joy M. (2006). Essentials of teaching academic writing. Houghton Mifflin.

Ricker-Pedley, Kathryn L. (2011). An examination of the link between rater calibration performance and subsequent scoring accuracy in graduate record examinations® (GRE ®) writing. ETS Research Report Series, 2011(1), i–22.

Roberts, Daniel A. (2022). The principles of deep learning theory: an effective theory approach to understanding neural networks. Cambridge University Press.

Roumate, Fatima. (2023). Artificial intelligence in higher education and scientific research: future development. SPRINGER VERLAG, SINGAPOR.

Salas-Pilco, Sdenka Zobeida., and Yang, Yuqin. (2022). Artificial intelligence applications in Latin American higher education: a systematic review. International Journal of Educational Technology in Higher Education, 19(1), 21.

Scheel, Carrie., Mecham, Jim., Zuccarello, Vic., and Mattes, Ryan. (2018). An evaluation of the inter-rater and intra-rater reliability of OccuPro’s functional capacity evaluation. Work, 60(3), 465-473.

Sethuraman, Mekala., and Radhakrishnan, Geetha. (2020). Promoting Cognitive Strategies in Second Language Writing. Eurasian Journal of Educational Research, (88), 1–17.

Shabani, Enayat A., and Panahi, Jaleh. (2020). Examining consistency among different rubrics for assessing writing. Language Testing in Asia, 10(1), 12.

Sharples, Mike., and Pérez y Pérez, Rafael. (2022). Story Machines: How Computers Have Become Creative Writers. Routledge.

Smith, Adam. (2022). Revolutionizing Education with Artificial Intelligence. Independently published.

Schiff, Daniel. (2022). Education for AI, not AI for Education: The Role of Education and Ethics in National AI Policy Strategies. International Journal of Artificial Intelligence in Education, 32(3), 527–563.

Sparks, Jesse R., Song, Yi., Brantley, Wyman., and Liu, Ou Lydia. (2014). Assessing Written Communication in Higher Education: Review and Recommendations for Next-Generation Assessment: Assessing Written Communication. ETS Research Report Series, 2014(2), 1-52.

Srinivasan, Rajeev. (2018). The Ethical Dilemmas of Artificial Intelligence. In Brigitte Lasry and Hael Kobayashi (Eds.), Human decisions: thoughts on AI (pp. 103-107). United Nations Educational, Scientific and Cultural Organization.

Sundqvist, Pia., Sandlund, Erica., Skar, Gustaf B., and Tengberg, Michael. (2020). Effects of Rater Training on the Assessment of L2 English Oral Proficiency. Nordic Journal of Modern Language Methodology, 8(1), 3-29.

Tillema, Marion. (2012). Writing in first and second language: empirical studies on text quality and writing processes. Netherlands Graduate School of Linguistics.

Tzen, MonZen., and Moquet, Xavier. (2018). A.I and big data: what kind of education and what kind of place is there for the citizen? In Brigitte Lasry and Hael Kobayashi (Eds.), Human decisions: thoughts on AI (pp. 108-111). United Nations Educational, Scientific and Cultural Organization.

Wendler, Cathy., Glazer, Nancy., and Cline, Frederick. (2019). Examining the Calibration Process for Raters of the GRE ® General Test. ETS Research Report Series, 2019(1), 1–19.

Weir, Cyril. (2005). Language testing and validation: An evidence-based approach. Houndmills UK: Palgrave Macmillan.

Wilhelm, Anne Garrison., Rouse, Amy Gillespie., and Jones, Francesca. (2018). Exploring Differences in Measurement and Reporting of Classroom Observation Inter-Rater Reliability. Practical Assessment, Research, and Evaluation, 23.

Yu, Shengquan., and Yu, Lu. (2021). An introduction to artificial intelligence in education. Springer Nature.

Zimmerman, Michelle Renée. (2018). Teaching AI: exploring new frontiers for learning. International Society for Technology in Education.


Creative Commons License

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-SinDerivadas 4.0.

Derechos de autor 2023 William Charpentier Jiménez


Los datos de descargas todavía no están disponibles.