El treebank del español "IPROCOLDI": componente anotado del corpus CODIMEP-CR

Carla Victoria Jara Murillo

doi:10.15517/rfl.v39i2.15094

Vol. 39 No. 2 (2013), Linguistics

Vol. 39 No. 2 (2013)

El treebank del español "IPROCOLDI": componente anotado del corpus CODIMEP-CR

Linguistics

https://doi.org/10.15517/rfl.v39i2.15094

Carla Victoria Jara Murillo⁺⁻

Carla Victoria Jara Murillo

Universidad de Costa Rica, Departamento de Lingüística

PDF (Español (España))

Keywords

treebanks
spanish treebank
corpus linguistics
spanish syntax
nlp
corpus anotado morfosintácticamente
treebank de español
lingüística de corpus
sintaxis española
pln

How to Cite

Jara Murillo, C. V. (2014). El treebank del español "IPROCOLDI": componente anotado del corpus CODIMEP-CR. Journal of Philology and Linguistics of the University of Costa Rica, 39(2), 143–171. https://doi.org/10.15517/rfl.v39i2.15094

Abstract

This paper describes the process followed in order to create a Spanish treebank in the framework of the research project No. 745-B1-244 Interfaz para el procesamiento de corpus lingüísticos digitales – IPROCOLDI (Interface for the processing of digital language corpora). The data for the treebank was extracted from the Corpus de Mensajes Presidenciales Costarricenses (CODIMEP- CR). The interface and the treebank are located at http://163.178.116.145/iprocoldi/.

https://doi.org/10.15517/rfl.v39i2.15094

PDF (Español (España))

References

Abeillé, Anne (Ed.). 2003. Treebanks: Building and Using Parsed Corpora. Dordrecht: Kluwer Academic Publishers.

Baker, P., A. Hardy y T. McEnery. 2006. A Glossary of Corpus Linguistics. Edingburg: Edinburgh University Press.

Göhring, Anne. 2009. “Spanish Expansion of a Parallel Treebank”. Lizentiatsarbeit der Philosophischen Fakultät der Universität Zürich. http://www.cl.uzh.ch/studies/theses/lic-master-theses/lizGoehringAnne.pdf.

J

ara Murillo, Carla Victoria. 2011. CODIMEP-CR: Corpus Digital de Mensajes Presidenciales de Costa Rica. https://sites.google.com/site/mensajepresidencialcr/.

Leech, G. 2004. “Adding Linguistic Information”. En: M. Wynne (Ed.). Developing linguistic corpora. A guide to Good Practice. Arts and Humanities Data Service. http://www. ahds.ac.uk/creating/guides/linguistic-corpora/index.htm.

Leoni de León, Jorge Antonio, Sandra Schwab y Éric Wehrli. 2008. “Análisis sintáctico profundo del español: un ejemplo del procesamiento de secuencias idiomáticas”. Procesamiento de Lenguaje Natural. 41: 37-44. http://hdl.handle.net/10045/8062.

Lemnitzer, L. y Zinsmeister, H. 2006. Korpuslinguistik. Eine Einführung. Tuebingen: Gunter Narr Verlag.

Llisterri, Joaquim. 2012. “Corpus Linguistics and Written Language Resources – Bibliography”. http://liceu.uab.es/~joaquim/language_resources/lang_res/biblio_corpus.html.

Lüdeling, A. and M. Kytö. (Eds.). 2008. Corpus Linguistics: An International Handbook. Handbücher zur Sprache und Kommunikationswissenschaft series. Berlin: Mouton de Gruyter.

Moreno, Antonio, Susana López y Fernando Sánchez. 2003. “Developing a Syntactic Annotation Scheme and Tools for a Spanish Treebank”. En: Anne Abeillé. (Ed.). Treebanks: Building and Using parsed Corpora.

Navarro Colorado, F. B. 2007. Metodología, construcción y explotación de corpus anotados semántica y anafóricamente. Tesis doctoral. Universidad de Alicante. http://rua.ua.es/dspace/bitstream/10045/7736/1/tesis_doctoral_francisco_de_borja.pdf

Nivre, Joakim. 2008. “Treebanks”. En: A. Lüdeling and M. Kytö (Eds.). Corpus Linguistics: An International Handbook. http://stp.lingfil.uu.se/~nivre/docs/hsk.pdf.

Padró, Lluís y Evgeny Stanilovsky. 2012. “FreeLing 3.0: Towards Wider Multilinguality”. Proceedings of the Language Resources and Evaluation Conference (LREC 2012)

ELRA. Istanbul, Turkey. http://nlp.lsi.upc.edu/freeling/index.php?option=com_content&task=view&id=20&Itemid=49. (Ver además el sitio FreeLing 3.0, http://nlp.lsi.upc.edu/freeling)

Sampson, G. 2003. “Thoughts on Two Decades of Drawing Trees”. En: Anne Abeillé. (Ed.). Treebanks: Building and Using Parsed Corpora.

Schütze, Hinrich. 1999. Foundations of statistical natural language processing. Cambridge: MIT.

Subirats, Carlos y Marc Ortega. 2012. Corpus del Español Actual. http://sfncorpora.uab.es/CQPweb/cea/

Wallis, Sean. 2008. “Searching treebanks and other structured corpora”. En: A. Lüdeling and M. Kytö (Eds.). Corpus Linguistics: An International Handbook.

Comments

Downloads

Download data is not yet available.

El <i>treebank</i> del español "IPROCOLDI": componente anotado del corpus CODIMEP-CR