Query-friendly compression of graph streams

作者: Charu Aggarwal , Arijit Khan

DOI: 10.5555/3192424.3192448

关键词:

摘要: We study the problem of synopsis construction massive graph streams arriving in real-time. Many graphs such as those formed by activity on social networks, communication and telephone networks are defined dynamically rapid edge a domain nodes. In these streams, it is often not possible to estimate frequency individual items (e.g., edges, nodes) with complete accuracy. Nevertheless, sketch-based stream summaries Count-Min can preserve information high-frequency reasonable However, sketch lose underlying structure unless one keeps about start end nodes all which prohibitively expensive. For example, existing methods identify but they unable answer more complex structural queries reachability edges. To this end, we design 3-dimensional sketch, gMatrix that summarizes real-time, while also retaining behavior dataset. demonstrate how gMatrix, coupled onetime reverse hash mapping, able important properties, e.g., over high edges an online manner theoretical performance guarantees. Our experimental results using large-scale attest capable answering both frequency-based accuracy efficiency.

参考文章(22)
Moses Charikar, Kevin Chen, Martin Farach-Colton, Finding Frequent Items in Data Streams international colloquium on automata languages and programming. ,vol. 312, pp. 693- 703 ,(2002) , 10.1016/S0304-3975(03)00400-6
Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi, Efficient Computation of Frequent and Top-k Elements in Data Streams Database Theory - ICDT 2005. pp. 398- 412 ,(2004) , 10.1007/978-3-540-30570-5_27
Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Yin Zhang, Peter A. Dinda, Ming-Yang Kao, Gokhan Memik, Reversible sketches: enabling monitoring and analysis over high-speed data streams IEEE ACM Transactions on Networking. ,vol. 15, pp. 1059- 1072 ,(2007) , 10.1109/TNET.2007.896150
Peixiang Zhao, Charu C. Aggarwal, Min Wang, gSketch Proceedings of the VLDB Endowment. ,vol. 5, pp. 193- 204 ,(2011) , 10.14778/2078331.2078335
Qun Chen, Andrew Lim, Kian Win Ong, D(k)-index: an adaptive structural summary for graph-structured data international conference on management of data. pp. 134- 144 ,(2003) , 10.1145/872757.872776
Kook Jin Ahn, Sudipto Guha, Andrew McGregor, Graph sketches Proceedings of the 31st symposium on Principles of Database Systems - PODS '12. pp. 5- 14 ,(2012) , 10.1145/2213556.2213560
Andrew McGregor, Graph stream algorithms: a survey international conference on management of data. ,vol. 43, pp. 9- 20 ,(2014) , 10.1145/2627692.2627694
Graham Cormode, Marios Hadjieleftheriou, Finding frequent items in data streams very large data bases. ,vol. 1, pp. 1530- 1541 ,(2008) , 10.14778/1454159.1454225
Chris Jermaine, Minos Garofalakis, Peter J. Haas, Graham Cormode, Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches ,(2012)
Sutanay Choudhury, Lawrence Holder, George Chin, Abhik Ray, Sherman Beus, John Feo, None, StreamWorks: a system for dynamic graph search international conference on management of data. pp. 1101- 1104 ,(2013) , 10.1145/2463676.2463697