作者: Yeonhee Lee , Youngseok Lee
关键词:
摘要: Internet traffic measurement and analysis has long been used to characterize network usage user behaviors, but faces the problem of scalability under explosive growth high-speed access. Scalable is difficult because a large data set requires matching computing storage resources. Hadoop, an open-source platform MapReduce distributed file system, become popular infrastructure for massive analytics it facilitates scalable processing services on system consisting commodity hardware. In this paper, we present Hadoop-based monitoring that performs IP, TCP, HTTP, NetFlow multi-terabytes in manner. From experiments with 200-node testbed, achieved 14 Gbps throughput 5 TB files IP HTTP-layer jobs. We also explain performance issues related