MapMarker: Extraction of Postal Addresses and Associated Information for General Web Pages

作者: Chia-Hui Chang , Shu-Ying Li

DOI: 10.1109/WI-IAT.2010.64

关键词:

摘要: Address information is essential for people’s daily life. People often need to query addresses of unfamiliar location through Web and then use map services mark down the direction purpose. Although both address are available online, they not well combined. Users usually copy individual from a site paste it another with locate its direction. Such operations have be repeated if multiple listed on single page such as public school list or apartment list. Furthermore, associated has copied included each marker better comprehension. Our research devoted automate above process make combination an easier task users. The main techniques applied here include postal extraction extraction. We apply sequence labeling algorithm based Conditional Random Fields (CRFs) train models Meanwhile, using extracted landmarks, we pattern mining identify boundaries blocks extract address. experimental result shows high F-score at 91% 87% accuracy

参考文章(13)
Wentao Cai, Shengrui Wang, Qingshan Jiang, Address extraction: extraction of location-based information from the web asia pacific web conference. pp. 925- 937 ,(2005) , 10.1007/978-3-540-31849-1_88
Saeid Asadi, Guowei Yang, Xiaofang Zhou, Yuan Shi, Boxuan Zhai, Wendy Wen-Rong Jiang, Pattern-Based Extraction of Addresses from Web Page Content Progress in WWW Research and Development. ,vol. 4976, pp. 407- 418 ,(2008) , 10.1007/978-3-540-78849-2_41
Lise Getoor, Ben Taskar, Introduction to statistical relational learning MIT Press. ,(2007)
Charles Sutton, Andrew McCallum, An Introduction to Conditional Random Fields for Relational Learning MIT Press. ,(2007)
Karla AV Borges, Alberto HF Laender, Claudia B Medeiros, Clodoveu A Davis Jr, None, Discovering geographic locations in web pages using urban addresses geographic information retrieval. pp. 31- 36 ,(2007) , 10.1145/1316948.1316957
Olga Uryupina, Semi-supervised learning of geographical gazetteers from the internet north american chapter of the association for computational linguistics. pp. 18- 25 ,(2003) , 10.3115/1119394.1119397
Bernhard E. Boser, Isabelle M. Guyon, Vladimir N. Vapnik, A training algorithm for optimal margin classifiers conference on learning theory. pp. 144- 152 ,(1992) , 10.1145/130385.130401
P. Nagabhushan, S. A. Angadi, B. S. Anami, A Fuzzy Symbolic Inference System for Postal Address Component Extraction and Labelling Fuzzy Systems and Knowledge Discovery. pp. 937- 946 ,(2006) , 10.1007/11881599_117