作者: Jason Anderson , Christopher Gropp , Linh Ngo , Amy Apon , None
DOI: 10.23919/INM.2017.7987424
关键词:
摘要: The industry standard Packet CAPture (PCAP) format for storing network packet traces is normally only readable in serial due to its lack of delimiters, indexing, or blocking. This presents a challenge parallel analysis large networks, where can be many gigabytes size. In this work we present RAPCAP, novel method random access into variable-length record collections like PCAP by identifying boundary within small number bytes the point. Unlike related heuristic methods that limit scalability with nonzero probability error, new offers correctness guarantee well formed file and does not rely on prior knowledge contents. We include practical implementation algorithm an extension Hadoop framework, performance comparison ingestion. Finally, similar storage types could utilize modified version RAPCAP access.