Toward query-friendly compression of rapid graph streams

作者: Arijit Khan , Charu Aggarwal

DOI: 10.1007/S13278-017-0443-4

关键词:

摘要: We study the problem of summary construction massive graph streams arriving in real time. Many graphs such as those formed by activity on social networks, communication and telephone networks are defined dynamically rapid edge a domain nodes. In these streams, it is often not possible to estimate frequency individual items (e.g., edges, nodes) with complete accuracy. Nevertheless, sketch-based stream summaries Count-Min can preserve information high-frequency reasonable However, sketches lose underlying structure unless one keeps about start end nodes all which prohibitively expensive. For example, existing methods identify but they unable answer more complex structural queries reachability edges. To this end, we design three-dimensional sketch, gMatrix that summarizes time, while also retaining behavior dataset. demonstrate how gMatrix, coupled one-time reverse hash mapping, able important properties, e.g., over edges an online manner theoretical performance guarantees. Our experimental results using large-scale attest capable answering both frequency-based high accuracy efficiency.

参考文章(46)
Philip S. Yu, Charu C. Aggarwal, Yuchen Zhao, On Clustering Graph Streams. siam international conference on data mining. pp. 478- 489 ,(2010)
Moses Charikar, Kevin Chen, Martin Farach-Colton, Finding Frequent Items in Data Streams international colloquium on automata languages and programming. ,vol. 312, pp. 693- 703 ,(2002) , 10.1016/S0304-3975(03)00400-6
Evimaria Terzi, Kristen LeFevre, GraSS: Graph Structure Summarization. siam international conference on data mining. pp. 454- 465 ,(2010)
Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi, Efficient Computation of Frequent and Top-k Elements in Data Streams Database Theory - ICDT 2005. pp. 398- 412 ,(2004) , 10.1007/978-3-540-30570-5_27
Andrew McGregor, Kook Jin Ahn, Sudipto Guha, Analyzing graph structure via linear measurements symposium on discrete algorithms. pp. 459- 467 ,(2012) , 10.5555/2095116.2095156
Neil Shah, Danai Koutra, Tianmin Zou, Brian Gallagher, Christos Faloutsos, TimeCrunch: Interpretable Dynamic Graph Summarization knowledge discovery and data mining. pp. 1055- 1064 ,(2015) , 10.1145/2783258.2783321
Atish Das Sarma, Sreenivas Gollapudi, Rina Panigrahy, Estimating PageRank on graph streams Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS '08. pp. 69- 78 ,(2008) , 10.1145/1376916.1376928
Isabelle Stanton, Gabriel Kliot, Streaming graph partitioning for large distributed graphs Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '12. pp. 1222- 1230 ,(2012) , 10.1145/2339530.2339722
Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Yin Zhang, Peter A. Dinda, Ming-Yang Kao, Gokhan Memik, Reversible sketches: enabling monitoring and analysis over high-speed data streams IEEE ACM Transactions on Networking. ,vol. 15, pp. 1059- 1072 ,(2007) , 10.1109/TNET.2007.896150
Darko Anicic, Paul Fodor, Sebastian Rudolph, Nenad Stojanovic, EP-SPARQL Proceedings of the 20th international conference on World wide web - WWW '11. pp. 635- 644 ,(2011) , 10.1145/1963405.1963495