Named entity recognition from spoken documents using global evidences and external knowledge sources with applications on Mandarin Chinese

作者: Yi-cheng Pan , Yu-ying Liu , Lin-shan Lee

DOI: 10.1109/ASRU.2005.1566535

关键词:

摘要: In this paper, we propose two efficient approaches for named entity recognition (NER) from spoken documents. The first approach used a very data structure, the PAT trees, to extract global evidences whole documents, be with well-known local (internal and external) popularly by conventional approaches. basic idea is that (NE) may not easily recognized in certain contexts, but become much more when its repeated occurrences all different sentences same document are considered jointly. This equally useful NER text second try recover some entities (NEs) which out-of-vocabulary (OOV) words thus can't obtained transcriptions. use reliable important transcription construct queries retrieve relevant documents external knowledge sources (such as Internet). Matching NEs these retrieved selected sections of phone lattice can OOV words. experiments were performed on Mandarin Chinese incorporating hybrid statistic/rule based system language. Very significant performance improvements

参考文章(10)
Daniel M. Bikel, Richard Schwartz, Ralph M. Weischedel, An Algorithm that Learns What‘s in a Name Machine Learning. ,vol. 34, pp. 211- 231 ,(1999) , 10.1023/A:1007558221122
Andreas Stolcke, Lidia Mangu, Eric Brill, Finding consensus among words : Lattice-based word error minimization conference of the international speech communication association. ,(1999)
Ricardo A. Baeza-Yates, Gaston H. Gonnet, Tim Snider, New indices for text: PAT Trees and PAT arrays Information Retrieval. pp. 66- 82 ,(1992)
Pi-Chuan Chang, Lin-Shan Lee, Improved language model adaptation using existing and derived external resources ieee automatic speech recognition and understanding workshop. pp. 531- 536 ,(2003) , 10.1109/ASRU.2003.1318496
David D. McDonald, Internal and external evidence in the identification and semantic categorization of proper names Corpus processing for lexical acquisition. pp. 21- 39 ,(1996)
Lee-Feng Chien, PAT-tree-based keyword extraction for Chinese information retrieval international acm sigir conference on research and development in information retrieval. ,vol. 31, pp. 50- 58 ,(1997) , 10.1145/258525.258534
Ming-Yi Tsai, Lin-Shan Lee, Pronunciation variation analysis based on acoustic and phonemic distance measures with application examples on Mandarin Chinese ieee automatic speech recognition and understanding workshop. pp. 117- 122 ,(2003) , 10.1109/ASRU.2003.1318414
S. Cox, S. Dasmahapatra, High-level approaches to confidence estimation in speech recognition IEEE Transactions on Speech and Audio Processing. ,vol. 10, pp. 460- 471 ,(2002) , 10.1109/TSA.2002.804304
F. Wessel, R. Schluter, K. Macherey, H. Ney, Confidence measures for large vocabulary continuous speech recognition IEEE Transactions on Speech and Audio Processing. ,vol. 9, pp. 288- 298 ,(2001) , 10.1109/89.906002
Jianfeng Gao, Ming Zhou, Jian Sun, A Class-based Language Model Approach to Chinese Named Entity Identification International Journal of Computational Linguistics & Chinese Language Processing, Volume 8, Number 2, August 2003. ,vol. 8, pp. 1- 28 ,(2003)