Unsupervised Metadata Extraction in Scientific Digital Libraries Using A-Priori Domain-Specific Knowledge.

作者: Alexander Ivanyukovich , Maurizio Marchese

DOI:

关键词:

摘要: Information extraction from unstructured sources is a crucial step in the semantic annotation of content. The challenge supporting an high quality automatic approach (or at least semi-automatic) order to sustain scalability semantic-enabled services future. Unsupervised information encompasses number underlying research problems, such as natural language processing, heterogeneous integration, knowledge representation, and others that are under past current investigation. In this paper we concentrate on problem unsupervised metadata Digital Libraries domain. We propose present novel focusing improvement without involving external (oracles, manually prepared databases, etc), but relying document itself its corresponding context. More specifically, focus improvements scientific papers (mainly computer science domain) collected various over Internet. Finally, compare results our with state art domain discuss future work.

参考文章(17)
Andy Powell, Pete Johnston, Guidelines for implementing Dublin Core in XML Dublin Core Metadata Initiative. ,(2003)
James R. Cordy, TXL - A Language for Programming Language Tools and Applications Electronic Notes in Theoretical Computer Science. ,vol. 110, pp. 3- 31 ,(2004) , 10.1016/J.ENTCS.2004.11.006
H. F. Moed, E. C. M. Noyons, M. Luwel, Combining mapping and citation analysis for evaluative bibliometric purposes: a bibliometric study Journal of the Association for Information Science and Technology. ,vol. 50, pp. 115- 131 ,(1999) , 10.1002/(SICI)1097-4571(1999)50:2<115::AID-ASI3>3.3.CO;2-A
S. AÏT-MOKHTAR, J.-P. CHANOD, C. ROUX, Robustness beyond shallowness: incremental deep parsing Natural Language Engineering. ,vol. 8, pp. 121- 144 ,(2002) , 10.1017/S1351324902002887
Silviu Cucerzan, David Yarowsky, Language independent, minimally supervised induction of lexical probabilities Proceedings of the 38th Annual Meeting on Association for Computational Linguistics - ACL '00. pp. 270- 277 ,(2000) , 10.3115/1075218.1075253
Yunhua Hu, Hang Li, Yunbo Cao, Li Teng, Dmitriy Meyerzon, Qinghua Zheng, Automatic extraction of titles from general documents using machine learning Information Processing & Management. ,vol. 42, pp. 1276- 1293 ,(2006) , 10.1016/J.IPM.2005.12.001
Eugene Agichtein, Luis Gravano, Snowball: extracting relations from large plain-text collections acm international conference on digital libraries. pp. 85- 94 ,(2000) , 10.1145/336597.336644
Min-Yuh Day, Tzong-Han Tsai, Cheng-Lung Sung, Cheng-Wei Lee, Shih-Hung Wu, Chorng-Shyong Ong, Wen-Lian Hsu, A knowledge-based approach to citation extraction information reuse and integration. pp. 50- 55 ,(2005) , 10.1109/IRI-05.2005.1506448
Vaclav Petricek, Ingemar J. Cox, Hui Han, Isaac G. Councill, C. Lee Giles, A Comparison of On-Line Computer Science Citation Databases Research and Advanced Technology for Digital Libraries. pp. 438- 449 ,(2005) , 10.1007/11551362_39
Hongyuan Zha, E.A. Fox, Zhenyue Zhang, Hui Han, C.L. Giles, E. Manavoglu, Automatic document metadata extraction using support vector machines acm ieee joint conference on digital libraries. pp. 37- 48 ,(2003) , 10.5555/827140.827146