作者: Hongzhou Sha , Tingwen Liu , Peng Qin , Yong Sun , Qingyun Liu
DOI: 10.1016/J.PROCS.2013.05.104
关键词:
摘要: Abstract Data cleaning is an important step performed in the preprocessing stage of web usage mining, and widely used many data mining systems. Despite efforts on for server logs, it still open question enterprise proxy logs. With unlimited accesses to websites, logs trace requests from multiple clients servers,which make them quite different sever both location content. Therefore, irrelevant items such as software updating cannot be filtered out by traditional methods. In this paper, we propose first method named EPLogCleaner that can filter plenty based common prefix their URLs. We evaluation with a real network traffic captured one proxy. Experimental results show improve quality further filtering more than 30% URL comparing