作者: Max De Wilde , Roser Morante , W. Daelemans
DOI:
关键词:
摘要: Short common words such as an and I were systematically ruled out in order to avoid being mistaken for medical abbreviations. Unrecognized further stemmed with a Python implementation of the Porter Stemmer identify more terms. It is important understand that lemmatization was not option here it would have increased processing time dramatically. Each UMLS ID subsequently mapped semantic type, each group turn broader group, recommended challenge guidelines.