Detecting and Characterizing Web Bot Traffic in a Large E-commerce Marketplace

作者: Haitao Xu , Zhao Li , Chen Chu , Yuanmi Chen , Yifan Yang

DOI: 10.1007/978-3-319-98989-1_8

关键词:

摘要: A certain amount of web traffic is attributed to bots on the Internet. Web bot has raised serious concerns among website operators, because they usually consume considerable resources at servers, resulting in high workloads and longer response time, while not bringing any profit. Even worse, content pages it crawled might later be used for other fraudulent activities. Thus, important detect characterize it. In this paper, we first propose an efficient approach a large e-commerce marketplace then perform in-depth analysis characteristics traffic. Specifically, our proposed detection consists following modules: (1) Expectation Maximization (EM)-based feature selection method extract most distinguishable features, (2) gradient based decision tree calculate likelihood being IP, (3) threshold estimation mechanism aiming recover reasonable non-bot flow. The been applied Taobao/Tmall platforms, its capability demonstrated by identifying Based data samples originating from normal users, conduct comparative uncover behavioral patterns different users. results reveal their differences terms active search queries, item store preferences, many aspects. These findings provide new insights public websites further improve protecting valuable contents.

参考文章(20)
J. R. Quinlan, Generating production rules from decision trees international joint conference on artificial intelligence. pp. 304- 307 ,(1987)
Sylvia Ratnasamy, Ramakrishna Gummadi, Petros Maniatis, Hari Balakrishnan, Not-a-Bot: improving service availability in the face of botnet attacks networked systems design and implementation. pp. 307- 320 ,(2009)
Grazyna Suchacka, Mariusz Sobkow, Detection of Internet robots using a Bayesian approach 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF). pp. 365- 370 ,(2015) , 10.1109/CYBCONF.2015.7175961
Muhammad Asim Jamshed, Wonho Kim, KyoungSoo Park, Suppressing bot traffic with accurate human attestation asia pacific workshop on systems. pp. 43- 48 ,(2010) , 10.1145/1851276.1851287
Aaron Koehl, Haining Wang, Surviving a search engine overload the web conference. pp. 171- 180 ,(2012) , 10.1145/2187836.2187860
Greg Buehrer, Jack W. Stokes, Kumar Chellapilla, A large-scale study of automated web search traffic Proceedings of the 4th international workshop on Adversarial information retrieval on the web - AIRWeb '08. pp. 1- 8 ,(2008) , 10.1145/1451983.1451985
Sunghwan Ihm, Vivek S. Pai, Towards understanding modern web traffic Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference - IMC '11. pp. 295- 312 ,(2011) , 10.1145/2068816.2068845
Katerina Goseva-Popstojanova, Goce Anastasovski, Ana Dimitrijevikj, Risto Pantev, Brandon Miller, Characterization and classification of malicious Web traffic Computers & Security. ,vol. 42, pp. 92- 115 ,(2014) , 10.1016/J.COSE.2014.01.006
Mark Meiss, Filippo Menczer, Alessandro Vespignani, On the lack of typical behavior in the global Web traffic network the web conference. pp. 510- 518 ,(2005) , 10.1145/1060745.1060820
Eytan Adar, Jaime Teevan, Susan T. Dumais, Large scale analysis of web revisitation patterns Proceeding of the twenty-sixth annual CHI conference on Human factors in computing systems - CHI '08. pp. 1197- 1206 ,(2008) , 10.1145/1357054.1357241