Can web pages be classified using anonymized TCP/IP headers?

作者: Sean Sanders , Jasleen Kaur

DOI: 10.1109/INFOCOM.2015.7218614

关键词:

摘要: Web page classification is useful in many domains— including ad targeting, traffic modeling, and intrusion detection. In this paper, we investigate whether learning-based techniques can be used to classify web pages based only on anonymized TCP/IP headers of generated when a visited. We do three steps. First, select informative features for given downloaded page, study which these remain stable over time are also consistent across client browser platforms. Second, use the selected evaluate four different labeling schemes methods classification. Lastly, empirically effectiveness real-world applications.

参考文章(31)
M. Hall, Correlation-based Feature Selection for Machine Learning PhD Thesis, Waikato Univer-sity. ,(1998)
Engin Kirda, Christopher Kruegel, Giovanni Vigna, Gregoire Jacob, PUBCRAWL: protecting users and businesses from CRAWLers usenix security symposium. pp. 25- 25 ,(2012)
Ting-Fang Yen, Xin Huang, Fabian Monrose, Michael K. Reiter, Browser Fingerprinting from Coarse Traffic Summaries: Techniques and Implications Detection of Intrusions and Malware, and Vulnerability Assessment. pp. 157- 175 ,(2009) , 10.1007/978-3-642-02918-9_10
Fred Douglis, Balachander Krishnamurthy, Jeffrey Mogul, Anja Feldmann, Rate of change and other metrics: a live study of the world wide web usenix symposium on internet technologies and systems. pp. 14- 14 ,(1997)
Hector Garcia-Molina, Junghoo Cho, The Evolution of the Web and Implications for an Incremental Crawler very large data bases. pp. 200- 209 ,(2000)
Ben Y. Zhao, Haitao Zheng, Christo Wilson, Tristan Konolige, Gang Wang, Xiao Wang, You are how you click: clickstream analysis for Sybil detection usenix security symposium. pp. 241- 256 ,(2013)
Qixiang Sun, D.R. Simon, Yi-Min Wang, W. Russell, V.N. Padmanabhan, Lili Qiu, Statistical identification of encrypted Web browsing traffic ieee symposium on security and privacy. pp. 19- 30 ,(2002) , 10.1109/SECPRI.2002.1004359
Alice Este, Francesco Gringoli, Luca Salgarelli, On the stability of the information carried by traffic flow features at the packet level ACM SIGCOMM Computer Communication Review. ,vol. 39, pp. 13- 18 ,(2009) , 10.1145/1568613.1568616
Yeon-sup Lim, Hyun-chul Kim, Jiwoong Jeong, Chong-kwon Kim, Ted "Taekyoung" Kwon, Yanghee Choi, Internet traffic classification demystified: on the sources of the discriminative power conference on emerging network experiment and technology. pp. 9- ,(2010) , 10.1145/1921168.1921180
Gabriel Macia-Fernandez, Yong Wang, Rafael Rodriguez-Gomez, Aleksandar Kuzmanovic, ISP-Enabled Behavioral Ad Targeting without Deep Packet Inspection international conference on computer communications. pp. 1469- 1477 ,(2010) , 10.1109/INFCOM.2010.5461963