Adscape: harvesting and analyzing online display ads

作者: Paul Barford , Igor Canadi , Darja Krushevskaja , Qiang Ma , S. Muthukrishnan

DOI: 10.1145/2566486.2567992

关键词:

摘要: Over the past decade, advertising has emerged as primary source of revenue for many web sites and apps. In this paper we report a first-of-its-kind study that seeks to broadly understand features, mechanisms dynamics display on - i.e., Adscape. Our takes perspective users who are targets ads shown sites. We develop scalable crawling capability enables us gather details including creatives landing pages. strategy is focused maximizing number unique harvested. Of critical importance our recognition user's profile (i.e., browser cookies) can have significant impact which shown. deploy crawler over variety websites profiles yields 175K distinct ads. find while targeting widely used, there remain instances in delivered do not depend user profile; further, vary more than websites. also assess population advertisers seen identify 3.7K entities from business segments. Finally, when specific types generally correspond with profiles, users' patterns visit.

参考文章(13)
B. Pinkerton, Finding What People Want : Experiences with the WebCrawler Proc. of the Second International WWW Conference. ,(1994)
Claude Castelluccia, Mohamed-Ali Kaafar, Minh-Dung Tran, Betrayed by your ads!: reconstructing user profiles from targeted ads privacy enhancing technologies. pp. 1- 17 ,(2012) , 10.1007/978-3-642-31680-7_1
Jon Feldman, Monika Henzinger, Nitish Korula, Vahab S. Mirrokni, Cliff Stein, Online stochastic packing applied to display ad allocation european symposium on algorithms. pp. 182- 194 ,(2010) , 10.1007/978-3-642-15775-2_16
Andrei Z. Broder, Marc Najork, Janet L. Wiener, Efficient URL caching for world wide web crawling Proceedings of the twelfth international conference on World Wide Web - WWW '03. pp. 679- 689 ,(2003) , 10.1145/775152.775247
D. Eichmann, The RBSE spider — Balancing effective search against Web load Computer Networks and ISDN Systems. ,vol. 27, pp. 308- ,(1994) , 10.1016/S0169-7552(94)90151-1
O.A. McBryan, GENVL and WWWW: Tools for taming the Web Computer Networks and ISDN Systems. ,vol. 27, pp. 308- ,(1994) , 10.1016/S0169-7552(94)90149-X
Franziska Roesner, David Wetherall, Tadayoshi Kohno, Detecting and defending against third-party tracking on the web networked systems design and implementation. pp. 12- 12 ,(2012)
Minghai Liu, Rui Cai, Ming Zhang, Lei Zhang, User browsing behavior-driven web crawling Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11. pp. 87- 92 ,(2011) , 10.1145/2063576.2063593
Chia-Hui Chang, M. Kayed, M.R. Girgis, K.F. Shaalan, A Survey of Web Information Extraction Systems IEEE Transactions on Knowledge and Data Engineering. ,vol. 18, pp. 1411- 1428 ,(2006) , 10.1109/TKDE.2006.152
Saikat Guha, Bin Cheng, Paul Francis, Challenges in measuring online advertising systems internet measurement conference. pp. 81- 87 ,(2010) , 10.1145/1879141.1879152