作者: Jan Šnajder
DOI:
关键词:
摘要: Knowledge about derivational morphology has been proven useful for a number of natural language processing (NLP) tasks. We describe the construction and evaluation DerivBase.hr, large-coverage morphological resource Croatian. DerivBase.hr groups 100k lemmas from web corpus hrWaC into 56k clusters derivationally related lemmas, so-called families. focus on suffixal derivation between within nouns, verbs, adjectives. propose two approaches: an unsupervised approach knowledge-based based hand-crafted model but without using any additional lexico-semantic resources The acquisition procedure consists three steps: preprocessing, inflectional lexicon, induction methodology manually constructed families which we sample annotate pairs lemmas. evaluate so-obtained sample, show that version attains good clustering quality 81.2% precision, 76.5% recall, 78.8% F1 -score. As with similar other languages, expect to be NLP