Improving the Compression Efficiency for News Web Service Using Semantic Relations Among Webpages

作者: Xiao Wei , Xiangfeng Luo , Qing Li

DOI: 10.4018/IJCINI.2013040104

关键词: Information retrievalWorld Wide WebWeb serviceSearch engineService (systems architecture)Semantic compressionDuplicate contentCompression ratioComputer scienceReading (process)Web page

摘要: Both compression and decompression play important roles in a web service system. High ratio helps to save the storage, while fast contributes decreasing response time of service. Specifically focusing on news service, this paper proposes mechanism improve efficiency simultaneously by taking advantage semantic relations among webpages. Firstly, webpages are clustered into topics according similarity relation Webpages belonging same topic have much duplicate content, which can when using delta-compression. Secondly, associated detected with help multiple-semantics link network topics. Associated compressed zip file may decrease times habit user's reading Web. The authors apply proposed practical search engine experimental results show that it has high speed as well.

参考文章(34)
Piotr Indyk, Taher H. Haveliwala, Aristides Gionis, Scalable Techniques for Clustering the Web. WebDB (Informal Proceedings). pp. 129- 134 ,(2000)
Ramakrishnan Srikant, Rakesh Agrawal, Fast Algorithms for Mining Association Rules in Large Databases very large data bases. pp. 487- 499 ,(1994)
Weiyang Lin, Sergio A. Alvarez, Carolina Ruiz, Efficient Adaptive-Support Association Rule Mining for Recommender Systems Data Mining and Knowledge Discovery. ,vol. 6, pp. 83- 105 ,(2002) , 10.1023/A:1013284820704
Paolo Ferragina, Giovanni Manzini, On compressing the textual web web search and data mining. pp. 391- 400 ,(2010) , 10.1145/1718487.1718536
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber, Bigtable ACM Transactions on Computer Systems. ,vol. 26, pp. 1- 26 ,(2008) , 10.1145/1365815.1365816
Georges Dupret, Benjamin Piwowarski, A user behavior model for average precision and its generalization to graded judgments international acm sigir conference on research and development in information retrieval. pp. 531- 538 ,(2010) , 10.1145/1835449.1835538
D. Geer, Reducing the Storage Burden via Data Deduplication IEEE Computer. ,vol. 41, pp. 15- 17 ,(2008) , 10.1109/MC.2008.502
Nieves R. Brisaboa, Antonio Fariña, Susana Ladra, Gonzalo Navarro, Reorganizing compressed text Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08. pp. 139- 146 ,(2008) , 10.1145/1390334.1390360
Hao Yan, Shuai Ding, Torsten Suel, Compressing term positions in web indexes international acm sigir conference on research and development in information retrieval. pp. 147- 154 ,(2009) , 10.1145/1571941.1571969
Xindong Wu, Chengqi Zhang, Shichao Zhang, Efficient mining of both positive and negative association rules ACM Transactions on Information Systems. ,vol. 22, pp. 381- 405 ,(2004) , 10.1145/1010614.1010616