Named Entity Extraction using Information Distance

作者: Girish Palshikar , Sangameshwar Patil , Sachin Pawar

DOI:

关键词:

摘要: Named Entity extraction (NEX) problem consists of automatically constructing a gazette containing instances for each NE interest. NEX is important domains which lack corpus with tagged NEs. In this paper, we propose new unsupervised (bootstrapping) technique, based on variant the Multiword Expression Distance (MED) (Bu et al., 2010) and information distance (Bennett 1998). Ecacy our method shown using comparison BASILISK PMI in agriculture domain. Our discovered 8 diseases are not found Wikipedia.

参考文章(14)
Sangameshwar Patil, Sachin Pawar, Girish K. Palshikar, Savita Bhat, Rajiv Srivastava, Unsupervised Gazette Creation Using Information Distance Natural Language Processing and Information Systems. pp. 388- 391 ,(2013) , 10.1007/978-3-642-38824-8_45
Xiaoyan Zhu, Ming Li, Fan Bu, Measuring the Non-compositionality of Multiword Expressions international conference on computational linguistics. pp. 116- 124 ,(2010)
Sachin Pawar, Rajiv Srivastava, Girish Keshav Palshikar, Automatic gazette creation for named entity recognition and application to resume processing bangalore annual compute conference. pp. 15- ,(2012) , 10.1145/2459118.2459133
Wenhui Liao, Sriharsha Veeramachaneni, A simple semi-supervised algorithm for named entity recognition Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing - SemiSupLearn '09. pp. 58- 65 ,(2009) , 10.3115/1621829.1621837
Jae-Ho Kim, In-Ho Kang, Key-Sun Choi, Unsupervised named entity classification models and their ensembles Proceedings of the 19th international conference on Computational linguistics -. pp. 1- 7 ,(2002) , 10.3115/1072228.1072316
Antonio Jimeno, Ernesto Jimenez-Ruiz, Vivian Lee, Sylvain Gaudan, Rafael Berlanga, Dietrich Rebholz-Schuhmann, Assessment of disease named entity recognition on a corpus of annotated sentences BMC Bioinformatics. ,vol. 9, pp. 1- 10 ,(2008) , 10.1186/1471-2105-9-S3-S3
Fien De Meulder, Walter Daelemans, Memory-based named entity recognition using unannotated data north american chapter of the association for computational linguistics. pp. 208- 211 ,(2003) , 10.3115/1119176.1119211
Partha Pratim Talukdar, Thorsten Brants, Mark Liberman, Fernando Pereira, A Context Pattern Induction Method for Named Entity Extraction conference on computational natural language learning. pp. 141- 148 ,(2006) , 10.3115/1596276.1596303
M. Li, X. Chen, X. Li, B. Ma, P.M.B. Vitanyi, The similarity metric IEEE Transactions on Information Theory. ,vol. 50, pp. 3250- 3264 ,(2004) , 10.1109/TIT.2004.838101
Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, Alexander Yates, Unsupervised named-entity extraction from the Web: An experimental study Artificial Intelligence. ,vol. 165, pp. 91- 134 ,(2005) , 10.1016/J.ARTINT.2005.03.001