Building Domain-Specific Taggers without Annotated (Domain) Data

作者: Manabu Torii , John Miller , K. Vijay-Shanker

DOI:

关键词: Natural language processingArtificial intelligenceDomain (biology)Hidden Markov modelComputer sciencePart-of-speech taggingComponent (UML)LexiconSpeech recognition

摘要: Part of speech tagging is a fundamental component in many NLP systems. When taggers developed one domain are used another domain, the performance can degrade considerably. We present method for developing new domains without requiring POS annotated text domain. Our involves using raw and identifying related words to form specific lexicon. This lexicon provides initial lexical probabilities EM training an HMM model. evaluate by applying it Biology show that we achieve results comparable with some this

参考文章(14)
Yoshimasa Tsuruoka, Yuka Tateishi, Jin-Dong Kim, Tomoko Ohta, John McNaught, Sophia Ananiadou, Jun’ichi Tsujii, Developing a Robust Part-of-Speech Tagger for Biomedical Text Advances in Informatics. pp. 382- 392 ,(2005) , 10.1007/11573036_36
Q.I. Wang, D. Schuurmans, Improved estimation for unsupervised part-of-speech tagging international conference natural language processing. pp. 219- 224 ,(2005) , 10.1109/NLPKE.2005.1598738
Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556
Matthew Lease, Eugene Charniak, None, Parsing Biomedical Literature Lecture Notes in Computer Science. pp. 58- 69 ,(2005) , 10.1007/11562214_6
David Elworthy, Does Baum-Welch Re-estimation Help Taggers? conference on applied natural language processing. pp. 53- 58 ,(1994) , 10.3115/974358.974371
LAWRENCE H. SMITH, THOMAS C. RINDFLESCH, W. JOHN WILBUR, The importance of the lexicon in tagging biological text Natural Language Engineering. ,vol. 12, pp. 335- 351 ,(2006) , 10.1017/S1351324905003967
Silviu Cucerzan, David Yarowsky, Language independent, minimally supervised induction of lexical probabilities Proceedings of the 38th Annual Meeting on Association for Computational Linguistics - ACL '00. pp. 270- 277 ,(2000) , 10.3115/1075218.1075253
Michele Banko, Robert C. Moore, Part of speech tagging in context Proceedings of the 20th international conference on Computational Linguistics - COLING '04. pp. 556- 561 ,(2004) , 10.3115/1220355.1220435
Julian Kupiec, Robust part-of-speech tagging using a hidden Markov model Computer Speech & Language. ,vol. 6, pp. 225- 242 ,(1992) , 10.1016/0885-2308(92)90019-Z