作者: Yi-cheng Pan , Yu-ying Liu , Lin-shan Lee
DOI: 10.1109/ASRU.2005.1566535
关键词:
摘要: In this paper, we propose two efficient approaches for named entity recognition (NER) from spoken documents. The first approach used a very data structure, the PAT trees, to extract global evidences whole documents, be with well-known local (internal and external) popularly by conventional approaches. basic idea is that (NE) may not easily recognized in certain contexts, but become much more when its repeated occurrences all different sentences same document are considered jointly. This equally useful NER text second try recover some entities (NEs) which out-of-vocabulary (OOV) words thus can't obtained transcriptions. use reliable important transcription construct queries retrieve relevant documents external knowledge sources (such as Internet). Matching NEs these retrieved selected sections of phone lattice can OOV words. experiments were performed on Mandarin Chinese incorporating hybrid statistic/rule based system language. Very significant performance improvements