Advanced Techniques in Web Data Pre-processing and Cleaning

作者: Pablo E. Román , Robert F. Dell , Juan D. Velásquez

DOI: 10.1007/978-3-642-14461-5_2

关键词:

摘要: Central to successful e-business is the construction of web sites that attract users, capture user preferences, and entice them into making a purchase. Web mining diverse data applied categorize both content structure with goal aiding e-business. requires knowledge site (hyperlink graph), (vector model) sessions (the sequence pages visited by each site). Much for can be noisy. The origin noise comes from many sources, example, undocumented changes content, different understanding text media semantic, logs without individual identification. There may not any record number times specific page has been in session as stored on proxy or browser cache. Such presents challenge mining. This chapter issues approaches cleaning preparation analysis.

参考文章(117)
Rossitza Setchi, Ivan Jordanov, Robert J. Howlett, Lakhmi C. Jain, Knowledge-Based Intelligent Information and Engineering Systems ,(2004)
Adam Jatowt, Mitsuru Ishizuka, Temporal multi-page summarization Web Intelligence and Agent Systems: An International Journal. ,vol. 4, pp. 163- 180 ,(2006)
Raúl Peña-Ortiz, Julio Sahuquillo, Ana Pont, José A. Gil, Dweb model: Representing Web 2.0 dynamism Computer Communications. ,vol. 32, pp. 1118- 1128 ,(2009) , 10.1016/J.COMCOM.2009.01.002
Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, Geri Gay, Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search ACM Transactions on Information Systems. ,vol. 25, pp. 7- ,(2007) , 10.1145/1229179.1229181
V KRYSSANOV, K KAKUSHO, E KULESHOV, M MINOH, Modeling hypermedia-based communication Information Sciences. ,vol. 174, pp. 37- 53 ,(2005) , 10.1016/J.INS.2004.08.006
Mark Levene, José Borges, George Loizou, Zipf's law for Web surfers Knowledge and Information Systems. ,vol. 3, pp. 120- 129 ,(2001) , 10.1007/PL00011657
Rayid Ghani, Rosie Jones, Dunja Mladenić, Mining the web to create minority language corpora Proceedings of the tenth international conference on Information and knowledge management - CIKM'01. pp. 279- 286 ,(2001) , 10.1145/502585.502633
Junghoo Cho, Hector Garcia-Molina, Estimating frequency of change ACM Transactions on Internet Technology. ,vol. 3, pp. 256- 290 ,(2003) , 10.1145/857166.857170
S. Chakrabarti, B.E. Dom, S.R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, J. Kleinberg, Mining the Web's link structure Computer. ,vol. 32, pp. 60- 67 ,(1999) , 10.1109/2.781636