Random access in nondelimited variable-length record collections for parallel reading with Hadoop

作者: Jason Anderson , Christopher Gropp , Linh Ngo , Amy Apon , None

DOI: 10.23919/INM.2017.7987424

关键词:

摘要: The industry standard Packet CAPture (PCAP) format for storing network packet traces is normally only readable in serial due to its lack of delimiters, indexing, or blocking. This presents a challenge parallel analysis large networks, where can be many gigabytes size. In this work we present RAPCAP, novel method random access into variable-length record collections like PCAP by identifying boundary within small number bytes the point. Unlike related heuristic methods that limit scalability with nonzero probability error, new offers correctness guarantee well formed file and does not rely on prior knowledge contents. We include practical implementation algorithm an extension Hadoop framework, performance comparison ingestion. Finally, similar storage types could utilize modified version RAPCAP access.

参考文章(11)
Alexey Lukashin, Leonid Laboshin, Vladimir Zaborovsky, Vladimir Mulukha, Distributed Packet Trace Processing Method for Information Security Analysis International Conference on Next Generation Wired/Wireless Networking. pp. 535- 543 ,(2014) , 10.1007/978-3-319-10353-2_49
Yeonhee Lee, Wonchul Kang, Youngseok Lee, A hadoop-based packet trace processing tool traffic monitoring and analysis. pp. 51- 63 ,(2011) , 10.1007/978-3-642-20305-3_5
S. Nari, A. A. Ghorbani, Automated malware classification based on network behavior 2013 International Conference on Computing, Networking and Communications (ICNC). pp. 642- 647 ,(2013) , 10.1109/ICCNC.2013.6504162
Fu-Yu Wang, Wei-Hsuan Tai, Cheng-Yuan Ho, Yuan-Cheng Lai, I-Wei Chen, Statistical analysis of false positives and false negatives from real traffic with intrusion detection/prevention systems IEEE Communications Magazine. ,vol. 50, pp. 146- 154 ,(2012) , 10.1109/MCOM.2012.6163595
Yeonhee Lee, Youngseok Lee, Toward scalable internet traffic measurement and analysis with Hadoop acm special interest group on data communication. ,vol. 43, pp. 5- 13 ,(2012) , 10.1145/2427036.2427038
Peter J. A. Cock, Christopher J. Fields, Naohisa Goto, Michael L. Heuer, Peter M. Rice, The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants Nucleic Acids Research. ,vol. 38, pp. 1767- 1771 ,(2010) , 10.1093/NAR/GKP1137
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, The Hadoop Distributed File System ieee conference on mass storage systems and technologies. pp. 1- 10 ,(2010) , 10.1109/MSST.2010.5496972
Theophilus Benson, Aditya Akella, David A. Maltz, Network traffic characteristics of data centers in the wild internet measurement conference. pp. 267- 280 ,(2010) , 10.1145/1879141.1879175
Jeffrey Dean, Sanjay Ghemawat, MapReduce Communications of the ACM. ,vol. 51, pp. 107- 113 ,(2008) , 10.1145/1327452.1327492