A measurement-based study of big-data movement

作者: Ranjana Addanki , Sourav Maji , Malathi Veeraraghavan , Chris Tracy

DOI: 10.1109/EUCNC.2015.7194115

关键词:

摘要: Parallel TCP connections are used for large scientific dataset transfers to increase throughput. Therefore, accurately characterize big-data movement, it is important reconstruct parallel flowsets from traffic measurements. In this work, we start with NetFlow records collected in an operational research-and-education network across which datasets moved routinely, individual elephant flows the records, and assemble flows. Our findings as follows. The top 1% of flowset sizes were hundreds GBs low TBs range, 95% had rates less than 2.5 Gbps, 99% durations shorter 4 hours. Median rate increases variance decreases increasing number per-flowset component Such useful planning, engineering, improving user performance, since among most demanding applications.

参考文章(10)
Péter Megyesi, Sándor Molnár, Analysis of Elephant Users in Broadband Network Traffic Meeting of the European Network of Universities and Companies in Information and Communication Engineering. pp. 37- 45 ,(2013) , 10.1007/978-3-642-40552-5_4
Tiago Fioreze, Aiko Pras, Self-management of hybrid optical and packet switching networks integrated network management. pp. 946- 951 ,(2011) , 10.1109/INM.2011.5990527
Arif Merchant, Mustafa Uysal, Pradeep Padala, Xiaoyun Zhu, Sharad Singhal, Kang Shin, Maestro Proceedings of the 8th ACM international conference on Autonomic computing - ICAC '11. pp. 245- 254 ,(2011) , 10.1145/1998582.1998638
Eli Dart, Lauren Rotman, Brian Tierney, Mary Hester, Jason Zurawski, The Science DMZ: a network design pattern for data-intensive science ieee international conference on high performance computing data and analytics. ,vol. 22, pp. 85- ,(2013) , 10.1145/2503210.2503245
Bryce Allen, John Bresnahan, Lisa Childers, Ian Foster, Gopi Kandaswamy, Raj Kettimuthu, Jack Kordas, Mike Link, Stuart Martin, Karl Pickett, Steven Tuecke, Software as a service for data scientists Communications of The ACM. ,vol. 55, pp. 81- 88 ,(2012) , 10.1145/2076450.2076468
Hyunchul Kim, KC Claffy, Marina Fomenkov, Dhiman Barman, Michalis Faloutsos, KiYoung Lee, Internet traffic classification demystified: myths, caveats, and the best practices conference on emerging network experiment and technology. pp. 11- ,(2008) , 10.1145/1544012.1544023
Tian Jin, Chris Tracy, Malathi Veeraraghavan, Characterization of high-rate large-sized flows ieee international black sea conference on communications and networking. pp. 73- 76 ,(2014) , 10.1109/BLACKSEACOM.2014.6849008
Yeonhee Lee, Youngseok Lee, Toward scalable internet traffic measurement and analysis with Hadoop acm special interest group on data communication. ,vol. 43, pp. 5- 13 ,(2012) , 10.1145/2427036.2427038
Tiago Fioreze, Lisandro Zambenedetti Granville, Aiko Pras, Anna Sperotto, Ramin Sadre, Self-management of hybrid networks: Can we trust netflow data? integrated network management. pp. 577- 584 ,(2009) , 10.1109/INM.2009.5188864