Building Domain-Specific Taggers without Annotated (Domain) Data

作者： Manabu Torii , John Miller , K. Vijay-Shanker

DOI:

关键词: Natural language processing 、 Artificial intelligence 、 Domain (biology) 、 Hidden Markov model 、 Computer science 、 Part-of-speech tagging 、 Component (UML) 、 Lexicon 、 Speech recognition

摘要: Part of speech tagging is a fundamental component in many NLP systems. When taggers developed one domain are used another domain, the performance can degrade considerably. We present method for developing new domains without requiring POS annotated text domain. Our involves using raw and identifying related words to form specific lexicon. This lexicon provides initial lexical probabilities EM training an HMM model. evaluate by applying it Biology show that we achieve results comparable with some this

uni-trier.de 本地加速

aclweb.org 本地加速

aclweb.org PDF 下载加速

uni-trier.de PDF 下载加速

参考文章(14)

Yoshimasa Tsuruoka, Yuka Tateishi, Jin-Dong Kim, Tomoko Ohta, John McNaught, Sophia Ananiadou, Jun’ichi Tsujii, Developing a Robust Part-of-Speech Tagger for Biomedical Text Advances in Informatics. pp. 382- 392 ,(2005) , 10.1007/11573036_36

Q.I. Wang, D. Schuurmans, Improved estimation for unsupervised part-of-speech tagging international conference natural language processing. pp. 219- 224 ,(2005) , 10.1109/NLPKE.2005.1598738

L. Baum, An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process Inequalities. ,vol. 3, pp. 1- 8 ,(1972)

Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556

Matthew Lease, Eugene Charniak, None, Parsing Biomedical Literature Lecture Notes in Computer Science. pp. 58- 69 ,(2005) , 10.1007/11562214_6

David Elworthy, Does Baum-Welch Re-estimation Help Taggers? conference on applied natural language processing. pp. 53- 58 ,(1994) , 10.3115/974358.974371

LAWRENCE H. SMITH, THOMAS C. RINDFLESCH, W. JOHN WILBUR, The importance of the lexicon in tagging biological text Natural Language Engineering. ,vol. 12, pp. 335- 351 ,(2006) , 10.1017/S1351324905003967

Silviu Cucerzan, David Yarowsky, Language independent, minimally supervised induction of lexical probabilities Proceedings of the 38th Annual Meeting on Association for Computational Linguistics - ACL '00. pp. 270- 277 ,(2000) , 10.3115/1075218.1075253

Michele Banko, Robert C. Moore, Part of speech tagging in context Proceedings of the 20th international conference on Computational Linguistics - COLING '04. pp. 556- 561 ,(2004) , 10.3115/1220355.1220435

10.

Julian Kupiec, Robust part-of-speech tagging using a hidden Markov model Computer Speech & Language. ,vol. 6, pp. 225- 242 ,(1992) , 10.1016/0885-2308(92)90019-Z

Building Domain-Specific Taggers without Annotated (Domain) Data

来源期刊

我的账户

Building Domain-Specific Taggers without Annotated (Domain) Data

来源期刊

相似文章 9

我的账户