Resumen
Este artículo explora la evaluación de la inteligencia artificial (IA) en cursos de escritura en inglés como lengua extranjera (ILE) y la importancia de la calibración en las evaluaciones de escritura. El papel de la calibración ha recibido poca atención en contextos lingüísticos, mientras que la inteligencia artificial ha ganado mayor reconocimiento en los últimos años. La investigación se realizó desde agosto de 2022 hasta marzo de 2023, e involucró a ocho estudiantes de TESOL en un bachillerato en inglés como lengua extranjera (ILE) en una universidad pública de Costa Rica: diez docentes de TESOL a nivel universitario y un software de IA. Se utilizó un diseño cuasiexperimental cuantitativo y una recopilación de datos de elicitación de lenguaje. Los datos fueron recopilados mediante una rúbrica que midió la producción escrita. Los datos cuantitativos se analizaron utilizando estadística descriptiva. El análisis de datos indica que: 1) los párrafos creados por humanos (X̄ = 7,56) y la escritura de IA (X̄ = 7,61) producen resultados similares; 2) algunos criterios pueden favorecer la creatividad humana o la escritura orientada a reglas; y 3) el profesorado presenta inconsistencias al calificar la escritura humana en particular. Estos hallazgos demuestran que la IA se equipara, al menos a nivel básico, con las habilidades de escritura humana. Además, los datos muestran que el estudiantado puede estar quedándose atrás en aspectos como gramática, vocabulario y puntuación. Finalmente, el análisis indica que la calificación de docentes carece de consistencia, y un modelo de calibración debería ser incorporado como parte de su formación.
Citas
Abd-Elaal, El-Sayed., Gamage, Sithara., and Mills, Julie. (2022). Assisting academics to identify computer generated writing. European Journal of Engineering Education, 47(5), 725-745. https://doi.org/10.1080/03043797.2022.2046709
Adamopoulou, Eleni., and Moussiades, Lefteris. (2020). An Overview of Chatbot Technology. In Ilias Maglogiannis, Lazaros Iliadis, and Elias Pimenidis (Eds.), Artificial Intelligence Applications and Innovations (Vol. 584, pp. 373–383). Springer International Publishing. https://doi.org/10.1007/978-3-030-49186-4_31
Adler-Kassner, Linda., and O’Neill, Peggy. (2010). Reframing writing assessment to improve teaching and learning. Utah State University Press.
Arindra, Margaretha Yola., and Ardi, Priyatno. (2020). The Correlation between Students’ Writing Anxiety and the Use of Writing Assessment Rubrics. LEARN Journal: Language Education and Acquisition Research Network, 13(1), 76–93. https://eric.ed.gov/?id=EJ1242955
Arora, Varun. (2022). Artificial intelligence in schools: a guide for teachers, administrators, and technology leaders. Routledge.
Bernard, Etienne. (2021). Introduction to machine learning. Wolfram Media.
Beyduz, Baris. (2023). The Parent`s Guide to Artificial Intelligence and Education: Helping your Child Adapt and Succeed in a Rapidly Changing World: How A.I. Will Shape Our Kids. Independently published.
Booth, Melanie. (n.d.). College-Level Writing Rubric. Saint Mary’s College. https://my.smccme.edu/ICS/icsfs/College_Writing_Rubric.pdf?target=7037f7b6-6809-4d28-86a5-f9ed01f0acf0
Bourg, David M., and Seemann, Glenn. (2004). AI for game developers. O’Reilly.
Brown, H. Douglas., and Abeywickrama, Priyanvada. (2019). Language assessment: principles and classroom practices (3th ed.). Pearson Education.
Brown, H. Douglas., and Lee, Heekyeong. (2015). Teaching by principles: an interactive approach to language pedagogy (4th ed.). Pearson Education.
Cameron, Ryan M. (2019). A.I. - 101: a primer on using artifical intelligence in education. Exceedly Press.
Campbell, Madelaine. (2019). Teaching Academic Writing in Higher Education. Education Quarterly Reviews, 2(3). https://doi.org/10.31014/aior.1993.02.03.92
Carr, Nathan T. (2000). A Comparison of the Effects of Analytic and Holistic Rating Scale Types in the Context of Composition Tests. Issues in Applied Linguistics, 11(2). https://doi.org/10.5070/L4112005035
Cheung, Yin Ling. (2016). Teaching Writing. In Willy A. Renandya and Handoyo Puji Widodo (Eds.), English Language Teaching Today: Linking Theory and Practice (1st ed. 2016). Springer International Publishing: Imprint: Springer.
Clark, Donald. (2020). Artificial intelligence for learning: how to use AI to support employee development. Kogan Page Limited.
Congdon, Peter J., and McQueen, Joy. (2000). The Stability of Rater Severity in Large-Scale Assessment Programs. Journal of Educational Measurement, 37(2), 163-178. https://doi.org/10.1111/j.1745-3984.2000.tb01081.x
Coombe, Christine A., Folse, Keith S., and Hubley, Nancy J. (2007). A practical guide to assessing English laugage learners. University of Michigan.
CopyAI, Inc. (2022). Copy.ai (July 14 version) [Large language model]. https://copy.ai
Creswell, John. (2019). Educational research: planning, conducting, and evaluating quantitative and qualitative research (6th ed.). Pearson.
Dunn, Michael. (2021). The Challenges of Struggling Writers: Strategies That Can Help. Education Sciences, 11(12), 795. https://doi.org/10.3390/educsci11120795
Ericsson, Patricia., and Haswell, Richard. (2006). Machine Scoring of Student Essays: Truth and Consequences. USU Press Publications. https://digitalcommons.usu.edu/usupress_pubs/139
Ferris, Dana., and Hedgcock, John S. (2023). Teaching L2 composition: purpose, process, and practice (4th ed.). Routledge.
Fulcher, Glenn. (2010). Practical language testing. Hodder Education.
Ghalib, Thikra., and Al-Hattami, Abdulghani. (2015). Holistic versus Analytic Evaluation of EFL Writing: A Case Study. English Language Teaching, 8(7), p225. https://doi.org/10.5539/elt.v8n7p225
Giansiracusa, Noah. (2021). Crafted by Computer: Artificial Intelligence Now Generates Headlines, Articles, and Journalists. In Noah Giansiracusa, How Algorithms Create and Prevent Fake News (pp. 17–39). Apress. https://doi.org/10.1007/978-1-4842-7155-1_2
Glass, Kathy Tuchman. (2005). Curriculum design for writing instruction: creating standards-based lesson plans and rubrics. Corwin Press.
Gulson, Kalervo N., Sellar, Sam., and Webb, P. Taylor. (2022). Algorithms of education: how datafication and artificial intelligence shape policy. University of Minnesota Press.
Gunnell, K. L., Fowler, D., and Colaizzi, K. (2016). Inter-rater reliability calibration program: critical components for competency-based education. The Journal of Competency-Based Education, 1(1), 36-41. https://doi.org/10.1002/cbe2.1010
Gwet, Kilem Li. (2014). Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among raters (Fourth edition). Advances Analytics, LLC.
Hamdan, Allam Mohammed Mousa., Hassanien, Aboul Ella., Khamis, Reem., Alareeni, Bahaaeddin., Razzaque, Ajum., and Awwad, Bahaa Sobhi Abde Latif. (Eds.). (2021). Applications of artificial intelligence in business, education and healthcare. Springer.
Harmer, Jeremy. (2011). How to teach writing (9a. impr). Longman, Pearson Education.
Hernández Sampieri, Roberto., Fernández Collado, Carlos., and Baptista Lucio, Pilar. (2010). Metodología de la investigación (5a. ed). McGraw-Hill.
Holmes, Wayne., and Porayska-Pomsta, Kaska. (Eds.). (2023). The ethics of artificial intelligence in education: practices, challenges, and debates. Routledge, Taylor and Francis Group.
Hyland, Ken. (2019). Second language writing (2nd ed.). Cambridge University Press.
Johnston, Michael. (2023). The Artificial Intelligence Disruption: How to Adapt and Succeed in the Age of Intelligent Machines. Self Published.
Jones, Herbert. (2018). Deep Learning: An Essential Guide to Deep Learning for Beginners Who Want to Understand How Deep Neural Networks Work and Relate to Machine Learning and Artificial Intelligence. CreateSpace Independent Publishing Platform.
Kent, David. (2022). Artificial intelligence in education: fundamentals for educators. Kotesol DDC.
Kochmar, Ekaterine. (2022). Getting started with Natural Language Processing. Manning Publications.
Lasry, Brigitte., and Kobayashi, Hael. (Eds.). (2018). Human decisions: thoughts on AI. United Nations Educational, Scientific and Cultural Organization.
Li, Wentao. (2022). Scoring rubric reliability and internal validity in rater-mediated EFL writing assessment: Insights from many-facet Rasch measurement. Reading and Writing, 35(10), 2409–2431. https://doi.org/10.1007/s11145-022-10279-1
Luo, Bei., Lau, Raymond Y. K., Li, Chunping., and Si, Yain‐Whar. (2022). A critical review of state‐of‐the‐art chatbot designs and applications. WIREs Data Mining and Knowledge Discovery, 12(1). https://doi.org/10.1002/widm.1434
Ma, Wenyue. (2022). What the analytic versus holistic scoring of international teaching assistants can reveal: Lexical grammar matters. Language Testing, 39(2), 239–264. https://doi.org/10.1177/02655322211040020
Mackey, Alison, and Gass, Susan. (2016). Second language research: methodology and design (2nd ed.). Routledge.
Martín-Marchante, Beatriz. (2022). The use of ICTs and artificial intelligence in the revision of the writing process in Valencian public universities. Research in Education and Learning Innovation Archives, (28), 16-31. https://doi.org/10.7203/realia.28.20622
McAllister, Ken., and White, Edward. (2006). Interested Complicities: The Dialectic of Computer-Assisted Writing Assessment. In Patricia Ericsson and Richard Haswell, Machine Scoring of Student Essays: Truth and Consequences (pp. 8-27). USU Press Publications. https://digitalcommons.usu.edu/usupress_pubs/139
McRoy, Susan. (2021). Principles of natural language processing. Susan McRoy.
Mertler, Craig. (2019). Introduction to educational research (2nd ed.). SAGE Publications, Inc.
Murray, Denise E., and Christison, MaryAnn. (2011). What English language teachers need to know. Routledge.
Nation, Paul. (2009). Teaching ESL/EFL reading and writing. Routledge.
Nosratinia, Mania., and Razavi, Faezeh. (2016). Writing Complexity, Accuracy, and Fluency among EFL Learners: Inspecting Their Interaction with Learners’ Degree of Creativity. Theory and Practice in Language Studies, 6(5), 1043-1052. https://doi.org/10.17507/tpls.0605.19
Oh, Saerhim. (2020). Second Language Learners’ Use of Writing Resources in Writing Assessment. Language Assessment Quarterly, 17(1), 60–84. https://doi.org/10.1080/15434303.2019.1674854
Page, Ellis. (1966). The Imminence of... Grading Essays by Computer. The Phi Delta Kappan, 47(5), 238-243. http://www.jstor.org/stable/20371545
Page, Ellis., and Dieter, Paulus. (1968). The Analysis of Essays by Computer (Final Report of U.S. Office of Education Project No. 6-1318). Washington, DC: Department of Health, Education, and Welfare. ERIC Document Reproduction Service, ED 028 633. https://archive.org/details/ERIC_ED028633/mode/2up
Page, Ellis., and Petersen, Nancy. (1995). The computer moves into essay grading: Updating the ancient test. Phi Delta Kappan, 76(7), 561. https://www.proquest.com/docview/218533317?pq-origsite=gscholar&fromopenview=true
Peaci̇, Davut. (2020). Writing evaluation in university English preparatory programs: Two universities of Turkey and Saudi Arabia. Dil ve Dilbilimi Çalışmaları Dergisi, 16(1), 253–264. https://doi.org/10.17263/jlls.712798
Popenici, Stefan. (2023). Artificial intelligence and learning futures: critical narratives of technology and imagination in higher education. Routledge.
Raaijmakers, Stephan. (2022). Deep learning for natural language processing. Manning Publications Co.
Raynor, William J. (2009). International dictionary of artificial intelligence (2. ed., new ed). Global Professional Publ.
Reid, Joy M. (2006). Essentials of teaching academic writing. Houghton Mifflin.
Ricker-Pedley, Kathryn L. (2011). An examination of the link between rater calibration performance and subsequent scoring accuracy in graduate record examinations® (GRE ®) writing. ETS Research Report Series, 2011(1), i–22. https://doi.org/10.1002/j.2333-8504.2011.tb02239.x
Roberts, Daniel A. (2022). The principles of deep learning theory: an effective theory approach to understanding neural networks. Cambridge University Press.
Roumate, Fatima. (2023). Artificial intelligence in higher education and scientific research: future development. SPRINGER VERLAG, SINGAPOR.
Salas-Pilco, Sdenka Zobeida., and Yang, Yuqin. (2022). Artificial intelligence applications in Latin American higher education: a systematic review. International Journal of Educational Technology in Higher Education, 19(1), 21. https://doi.org/10.1186/s41239-022-00326-w
Scheel, Carrie., Mecham, Jim., Zuccarello, Vic., and Mattes, Ryan. (2018). An evaluation of the inter-rater and intra-rater reliability of OccuPro’s functional capacity evaluation. Work, 60(3), 465-473. https://doi.org/10.3233/WOR-182754
Sethuraman, Mekala., and Radhakrishnan, Geetha. (2020). Promoting Cognitive Strategies in Second Language Writing. Eurasian Journal of Educational Research, (88), 1–17. https://doi.org/10.14689/ejer.2020.88.5
Shabani, Enayat A., and Panahi, Jaleh. (2020). Examining consistency among different rubrics for assessing writing. Language Testing in Asia, 10(1), 12. https://doi.org/10.1186/s40468-020-00111-4
Sharples, Mike., and Pérez y Pérez, Rafael. (2022). Story Machines: How Computers Have Become Creative Writers. Routledge. https://doi.org/10.4324/9781003161431
Smith, Adam. (2022). Revolutionizing Education with Artificial Intelligence. Independently published.
Schiff, Daniel. (2022). Education for AI, not AI for Education: The Role of Education and Ethics in National AI Policy Strategies. International Journal of Artificial Intelligence in Education, 32(3), 527–563. https://doi.org/10.1007/s40593-021-00270-2
Sparks, Jesse R., Song, Yi., Brantley, Wyman., and Liu, Ou Lydia. (2014). Assessing Written Communication in Higher Education: Review and Recommendations for Next-Generation Assessment: Assessing Written Communication. ETS Research Report Series, 2014(2), 1-52. https://doi.org/10.1002/ets2.12035
Srinivasan, Rajeev. (2018). The Ethical Dilemmas of Artificial Intelligence. In Brigitte Lasry and Hael Kobayashi (Eds.), Human decisions: thoughts on AI (pp. 103-107). United Nations Educational, Scientific and Cultural Organization.
Sundqvist, Pia., Sandlund, Erica., Skar, Gustaf B., and Tengberg, Michael. (2020). Effects of Rater Training on the Assessment of L2 English Oral Proficiency. Nordic Journal of Modern Language Methodology, 8(1), 3-29. https://doi.org/10.46364/njmlm.v8i1.605
Tillema, Marion. (2012). Writing in first and second language: empirical studies on text quality and writing processes. Netherlands Graduate School of Linguistics.
Tzen, MonZen., and Moquet, Xavier. (2018). A.I and big data: what kind of education and what kind of place is there for the citizen? In Brigitte Lasry and Hael Kobayashi (Eds.), Human decisions: thoughts on AI (pp. 108-111). United Nations Educational, Scientific and Cultural Organization.
Wendler, Cathy., Glazer, Nancy., and Cline, Frederick. (2019). Examining the Calibration Process for Raters of the GRE ® General Test. ETS Research Report Series, 2019(1), 1–19. https://doi.org/10.1002/ets2.12245
Weir, Cyril. (2005). Language testing and validation: An evidence-based approach. Houndmills UK: Palgrave Macmillan. https://ztcprep.com/library/tesol/Language_Testing_and_Validation/Language_Testing_and_Validation_(www.ztcprep.com).pdf
Wilhelm, Anne Garrison., Rouse, Amy Gillespie., and Jones, Francesca. (2018). Exploring Differences in Measurement and Reporting of Classroom Observation Inter-Rater Reliability. Practical Assessment, Research, and Evaluation, 23. https://doi.org/10.7275/AT67-MD25
Yu, Shengquan., and Yu, Lu. (2021). An introduction to artificial intelligence in education. Springer Nature.
Zimmerman, Michelle Renée. (2018). Teaching AI: exploring new frontiers for learning. International Society for Technology in Education.
Comentarios
Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-SinDerivadas 4.0.
Derechos de autor 2023 William Charpentier Jiménez