An XML-based Tool for Tracking English Inclusions in German Text

作者: Claire Grover , Beatrice Alex

DOI:

关键词:

摘要: The use of lexicons and corpora advances both linguistic research performance current natural language processing (NLP) systems. We present a tool that exploits such resources, specifically English German lexical databases the World Wide Web to recognise inclusions in newspaper articles. output can assist resource developers monitoring changing patterns inclusion usage. corpus used for classification covers three different domains. report results illustrate their value NLP research.

参考文章(16)
Gregory Grefenstette, Julien Nioche, Estimation of English and non-English language use on the WWW riao conference. pp. 237- 246 ,(2000)
Martin Volk, Exploiting the WWW as a corpus to resolve PP attachment ambiguities Volk, Martin (2001). Exploiting the WWW as a corpus to resolve PP attachment ambiguities. In: Corpus Linguistics, Lancaster, 2001 - 2001.. ,(2001) , 10.5167/UZH-20269
Christian Jacquemin, Caroline Bush, Combining Lexical and Formatting Cues for Named Entity Acquisition from the Web empirical methods in natural language processing. pp. 181- 189 ,(2000) , 10.3115/1117794.1117817
Wilhelm P. Neumann, Walter de Gruyter Berlin-New York Nachrichten Aus Chemie Technik Und Laboratorium. ,vol. 30, pp. 190- 194 ,(1982) , 10.1002/NADC.19820300305
Natalia N. Modjeska, Katja Markert, Malvina Nissim, Using the web in machine learning for other-anaphora resolution Proceedings of the 2003 conference on Empirical methods in natural language processing -. pp. 176- 183 ,(2003) , 10.3115/1119355.1119378
D. Crystal, Language and the Internet ,(2001)
Frank Keller, Mirella Lapata, Using the web to obtain frequencies for unseen bigrams Computational Linguistics. ,vol. 29, pp. 459- 484 ,(2003) , 10.1162/089120103322711604
Claire Grover, Marc Moens, Colin Matheson, Andrei Mikheev, LT TTT - A Flexible Tokenisation Tool language resources and evaluation. ,(2000)