Abstract
In this paper we focus on collocations, which have been studied in computational linguistics since they constitute a key factor when processing natural languages. For instance, they usually represent a challenge in automatic translation because the association of two terms is not easily computed. We proposed that the parser should be provided with a lexical database in order to make more effective the identification of collocations during the parsing process. We assessed this claim by using a corpus of 6’000 sentences retrieved from the British magazine The Economist Espresso. The corpus was parsed twice, first with the collocation detection component turned on and then with it turned off, and to make the comparison the Fips tagger was used. The results showed an improvement of the quality when the parser has access to collocation knowledge.
References
Bally, Ch. 1909 [1951]. Traite ́ de stylistique franc ̧aise, Paris, Klincksieck.
Chomsky, N. 1977. “On wh-movement” in P. Culicover, T. Wasow & A. Akmajian (eds.) Formal Syntax, Academic Press.
Church, K. & R. Patil, 1982. “Coping with Syntactic Ambiguity or How to Pu the Block in the Box on the Table”, American Journal of Computationa Linguistics, vol. 8, number 3-4, 139-150.
Petrov, S., D. Das & R. McDonald, 2012. “A Universal Part-of-Speech Tagset”, Proceedings of LREC-2011.
Sag, I., T. Baldwin, F. Bond, A. Copestake & D. Flickinger (2002), “Multiword Expressions: A Pain in the Neck for NLP”, Proceedings of Cicling 2002 Springer- Verlag.
Seretan, V., 2011. Syntax-Based Collocation Extraction, Springer Verlag.
Seretan, V. & E. Wehrli, 2009. “Multilingual Collocation Extraction with a Syn- tac- tic Parser”, Language Resources and Evaluation 43:1, 71-85.
Tutin, A. & F. Grossmann, 2002. “Collocations re ́gulie` res et irre ́gulie` res: esquisse de
typologie du phe ́ nome` ne collocatif”,
Revue Franc ̧aise de Linguistique Applique ́ e, Lexique : recherches actuelles, Vol. VII, 7-25.
Wehrli, E., 2007. “Fips, a deep linguistic multilin- gual parser” in Proceedings o the ACL 2007 Workshop on Deep Linguistic Processing, Prague, Czech Republic, 120-127.
Wehrli, E. & L. Nerima, 2013. “Anaphora Resolution, Collocations and Trans- lation” in J. Monti, R. Mitkov, G. Corpas Pastor & V. Seretan (eds.) Pro- ceedings of the Workshop on Multi-word Units in Machine Translation and Translation Technology, Nice.
Wehrli, E. & L. Nerima, 2015. “The Fips Multilingual Parser”, in N. Gala, R.
Rapp and G. Bel-Enguix (eds.) Language Production, Cognition, and the Lexicon, Text, Speech and Language Technology 48, Springer, 473-489.