A comparative study of file-type identification techniques

作者: Nasser S. Alamri , William H. Allen

DOI: 10.1109/SECON.2015.7132993

关键词: Data miningDigital forensicsArtificial intelligenceFeature extractionMachine learningWork (electrical)Computer scienceFile formatIdentification (information)

摘要: Research in file-type identification has employed a number of different approaches to classify unknown files according their actual file type. However, due the lack implementation details much published research and use private datasets for many those projects, it is often not possible compare new techniques with prior work. In this paper, we present comparison five common approaches, along parameters used perform comparisons. All were evaluated same dataset which was drawn from public or widely-available sources. Our results show that each approach can produce good 88% 97% classification rates, but achieving these requires “tuning” inputs classifiers.

参考文章(15)
Simson Garfinkel, Paul Farrell, Vassil Roussev, George Dinolt, Bringing science to digital forensics with standardized forensic corpora Digital Investigation. ,vol. 6, ,(2009) , 10.1016/J.DIIN.2009.06.016
Irfan Ahmed, Kyung-suk Lhee, Hyunjung Shin, ManPyo Hong, On Improving the Accuracy and Performance of Content-Based File Type Identification australasian conference on information security and privacy. pp. 44- 59 ,(2009) , 10.1007/978-3-642-02620-1_4
William C. Calhoun, Drue Coles, Predicting the types of file fragments Digital Investigation. ,vol. 5, ,(2008) , 10.1016/J.DIIN.2008.05.005
Luigi Sportiello, Stefano Zanero, Context-Based File Block Classification international conference on digital forensics. pp. 67- 82 ,(2012) , 10.1007/978-3-642-33962-2_5
Gilbert Harman, Sanjeev Kulkarni, An Elementary Introduction to Statistical Learning Theory Wiley Publishing. ,(2011)
Gregory Conti, Erik Dean, Matthew Sinda, Benjamin Sangster, Visual Reverse Engineering of Binary and Data Files visualization for computer security. pp. 1- 17 ,(2008) , 10.1007/978-3-540-85933-8_1
Irfan Ahmed, Kyung-Suk Lhee, Hyun-Jung Shin, Man-Pyo Hong, Fast content-based file type identification international conference on digital forensics. pp. 65- 75 ,(2011) , 10.1007/978-3-642-24212-0_5
Mehdi Chehel Amirani, Mohsen Toorani, Sara Mihandoost, Feature-based Type Identification of File Fragments Security and Communication Networks. ,vol. 6, pp. 115- 128 ,(2013) , 10.1002/SEC.553
Kyung-suk Lhee, ManPyo Hong, Irfan Ahmed, Hyunjung Shin, Content-based File-type Identification Using Cosine Similarity and a Divide-and-Conquer Approach Iete Technical Review. ,vol. 27, pp. 465- 477 ,(2010) , 10.4103/02564602.2010.10876780
Nasser S. Alamri, William H. Allen, A taxonomy of file-type identification techniques acm southeast regional conference. pp. 49- ,(2014) , 10.1145/2638404.2638524