作者: Haitao Xu , Zhao Li , Chen Chu , Yuanmi Chen , Yifan Yang
DOI: 10.1007/978-3-319-98989-1_8
关键词:
摘要: A certain amount of web traffic is attributed to bots on the Internet. Web bot has raised serious concerns among website operators, because they usually consume considerable resources at servers, resulting in high workloads and longer response time, while not bringing any profit. Even worse, content pages it crawled might later be used for other fraudulent activities. Thus, important detect characterize it. In this paper, we first propose an efficient approach a large e-commerce marketplace then perform in-depth analysis characteristics traffic. Specifically, our proposed detection consists following modules: (1) Expectation Maximization (EM)-based feature selection method extract most distinguishable features, (2) gradient based decision tree calculate likelihood being IP, (3) threshold estimation mechanism aiming recover reasonable non-bot flow. The been applied Taobao/Tmall platforms, its capability demonstrated by identifying Based data samples originating from normal users, conduct comparative uncover behavioral patterns different users. results reveal their differences terms active search queries, item store preferences, many aspects. These findings provide new insights public websites further improve protecting valuable contents.