作者: Tamsir P. Ichsan , Paul H-J Wu , Ngoc Giang Nguyen
DOI:
关键词: Metadata 、 Web content 、 Web intelligence 、 Web standards 、 Web 2.0 、 Information retrieval 、 Computer science 、 Web page 、 World Wide Web 、 Semantic Web 、 WAR
摘要: Web archives are an important source of information. However, before a archive can be properly utilized, it needs to catalogued. This is ensure that the accessed materials yield historical understanding intended by researcher. At same time, dynamic nature will easily render these catalogues outdated, and there constant need monitor when become irrelevant upon change content. means substantial amount human effort required maintain catalogue records for archives, adding additional burden any institutions it. In this paper, we propose automatic mechanism changes in content, so workload reduced. The system combines two component technologies make possible: (1) contextualized annotation module (2) evidence detection module. Contextualized enables cataloguing process link content on page (the evidence), value assigned element metadata schema. Thus, “supported” certain functions as decision. Regardless webpages outside evidence, remains valid long all same. order achieve evidence-specific detection, extend traditional Longest Common Subsequence (LCS) based Diff engine using Page Coordinate translation algorithm, which argue, through survey, first among many other monitoring approaches.