Extending an existing specialized semantic lexicon

作者: Benoît Habert , Adeline Nazarenko , Pierre Zweigenbaum , J. Bouaud

DOI:

关键词:

摘要: There is a constant need to extend and tune specialized vocabularies account for new words word usages. This paper addresses the issue of characterizing semantic class such words. We test hypothesis that analysis distribution in representative corpus, as obtained by robust NLP tools, can help identify with similar meanings, decide on most likely category given based categories its neighbors. report an experiment moderatesize corpus patient discharge summaries collected during MENELAS project, taking high-level axes SNOMED nomenclature, processing ZELLIG suite tools. attempt quantify extent which this process succeeds proposing correct while we vary several parameters method. The percentage correctly categorized (precision) ranges between 50 75 %, best (recall) 37 % whole categorization process. Categorization results are significantly above chance, but not sufficient fully-automated discuss possible uses further directions improvement.

参考文章(10)
Pierre Zweigenbaum, Benoit Habert, Jacques Bouaud, Adeline Nazarenko, Corpus-based identification and refinement of semantic classes. conference of american medical informatics association. pp. 585- 589 ,(1997)
William R. Hersh, Susan Malveau, Emily M. Campbell, Assessing the feasibility of large-scale natural language processing in a corpus of ordinary medical records: a lexical analysis. conference of american medical informatics association. pp. 580- 584 ,(1997)
Mark A. Musen, Jan H. van Bemmel, Handbook of Medical Informatics ,(2002)
Roberto Basili, Michelangelo Della Rocca, Maria Teresa Pazienza, Contextual word sense tuning and disambiguation Applied Artificial Intelligence. ,vol. 11, pp. 235- 262 ,(1997) , 10.1080/088395197118244
Pierre Zweigenbaum, None, MENELAS: an access system for medical records using natural language. Computer Methods and Programs in Biomedicine. ,vol. 45, pp. 117- 120 ,(1994) , 10.1016/0169-2607(94)90029-9
Benoít Habert, Elie Naulleau, Adeline Nazarenko, Symbolic word clustering for medium-size corpora international conference on computational linguistics. pp. 490- 495 ,(1996) , 10.3115/992628.992713
Lynette Hirschman, Ralph Grishman, Naomi Sager, Grammatically-based automatic word class formation Information Processing & Management. ,vol. 11, pp. 39- 57 ,(1975) , 10.1016/0306-4573(75)90033-3
Didier Bourigault, An endogeneous corpus-based method for structural noun phrase disambiguation conference of the european chapter of the association for computational linguistics. pp. 81- 86 ,(1993) , 10.3115/976744.976755
C. G. Chute, S. P. Cohn, K. E. Campbell, D. E. Oliver, J. R. Campbell, , The Content Coverage of Clinical Classifications Journal of the American Medical Informatics Association. ,vol. 3, pp. 224- 233 ,(1996) , 10.1136/JAMIA.1996.96310636
JH Van Bemmel, Mark A. Musen, Handbook of Medical Informatics Springer-Verlag. ,(1997)