作者: Ruhi Sarikaya , Paola Virga , Yuqing Gao
DOI:
关键词:
摘要: This paper describes a bootstrapping methodology for semi– automatic semantic annotation of “mini–corpus” that is conventionally annotated manually to train an initial parser used in natural language understanding (NLU) systems. We propose cast the problem as classification problem: each word assigned unique set tag(s) and/or label(s) from universal tag/label set. approach enables “local” resulting partially sentences. The proposed method reduces time and cost forms major bottleneck development NLU present experiments conducted on medical domain “mini– corpus” contains 10K hand–annotated Three methods are compared: (baseline), similarity classification–based annotations. support vector machine (SVM) based scheme shown outperform both parsed–based annotation.