作者: Sean Sanders , Jasleen Kaur
DOI: 10.1109/INFOCOM.2015.7218614
关键词:
摘要: Web page classification is useful in many domains— including ad targeting, traffic modeling, and intrusion detection. In this paper, we investigate whether learning-based techniques can be used to classify web pages based only on anonymized TCP/IP headers of generated when a visited. We do three steps. First, select informative features for given downloaded page, study which these remain stable over time are also consistent across client browser platforms. Second, use the selected evaluate four different labeling schemes methods classification. Lastly, empirically effectiveness real-world applications.