作者: Tudor Dumitras , Iulian Neamtiu
DOI:
关键词: Computer science 、 Honeypot 、 Empirical research 、 Metadata 、 Software evolution 、 Process (engineering) 、 Data collection 、 Computer security 、 Malware 、 Cluster analysis
摘要: Rigorous experiments and empirical studies hold the promise of empowering researchers practitioners to develop better approaches for cyber security. For example, understanding provenance lineage polymorphic malware strains can lead new techniques detecting classifying unknown attacks. Unfortunately, many challenges stand in way: lack sufficient field data (e.g., samples contextual information about their impact real world), metadata collection process existing sets, ground truth, difficulty developing tools methods rigorous analysis. As a first step towards experimental methods, we introduce two reconstructing phylogenetic trees dynamic control-flow graphs binaries, inspired from research software evolution, bioinformatics time series analysis. Our approach is based on observation that long evolution histories open source projects provide an opportunity creating precise models provenance, which be used clustering as well. As second step, present combine use representative corpus (gathered end hosts rather than network traces or honeypots) with sound analysis techniques. While our serve concrete purpose-- provenance--they also general blueprint addressing threats validity security studies.