Adapting Web Archive Catalogues for Dynamic Change

作者: Tamsir P. Ichsan , Paul H-J Wu , Ngoc Giang Nguyen

DOI:

关键词: MetadataWeb contentWeb intelligenceWeb standardsWeb 2.0Information retrievalComputer scienceWeb pageWorld Wide WebSemantic WebWAR

摘要: Web archives are an important source of information. However, before a archive can be properly utilized, it needs to catalogued. This is ensure that the accessed materials yield historical understanding intended by researcher. At same time, dynamic nature will easily render these catalogues outdated, and there constant need monitor when become irrelevant upon change content. means substantial amount human effort required maintain catalogue records for archives, adding additional burden any institutions it. In this paper, we propose automatic mechanism changes in content, so workload reduced. The system combines two component technologies make possible: (1) contextualized annotation module (2) evidence detection module. Contextualized enables cataloguing process link content on page (the evidence), value assigned element metadata schema. Thus, “supported” certain functions as decision. Regardless webpages outside evidence, remains valid long all same. order achieve evidence-specific detection, extend traditional Longest Common Subsequence (LCS) based Diff engine using Page Coordinate translation algorithm, which argue, through survey, first among many other monitoring approaches.

参考文章(7)
M. D. McIlroy, J. W. Hunt, An Algorithm for Differential File Comparison ,(2008)
Ling Liu, Wei Tang, David Buttler, Calton Pu, Information Monitoring on the Web: A Scalable Solution World Wide Web. ,vol. 5, pp. 263- 304 ,(2002) , 10.1023/A:1021028509335
Fred Douglis, Thomas Ball, Yih‐Farn Chen, Eleftherios Koutsofios, The AT&T Internet Difference Engine: Tracking and viewing changes on the web World Wide Web. ,vol. 1, pp. 27- 44 ,(1998) , 10.1023/A:1019243126596
Paul H. J. Wu, Adrian K. H. Heok, Ichsan P. Tamsir, Annotating the web archives – an exploration of web archives cataloging and semantic web international conference on asian digital libraries. ,vol. 4312, pp. 12- 21 ,(2006) , 10.1007/11931584_4
A. K. H. Heok, I. P. Tamsir, P. H. J. Wu, Annotating Web archives-structure, provenance, and context through archival cataloguing The New Review of Hypermedia and Multimedia. ,vol. 13, pp. 55- 75 ,(2007) , 10.1080/13614560701423620
Emmanuel Desmontils, Christine Jacquin, Ludovic Simon, Dinosys: An Annotation Tool for Web-Based Learning Advances in Web-Based Learning – ICWL 2004. ,vol. 3143, pp. 59- 66 ,(2004) , 10.1007/978-3-540-27859-7_8
Alpa Sachde, Jyoti Jacob, Sharma Chakravarthy, CX-DIFF: A change detection algorithm for XML content and change presentation issues for WebVigiL Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). ,vol. 2814, pp. 273- 284 ,(2003)