ShatterPlots: Fast Tools for Mining Large Graphs.

作者: Jure Leskovec , Andrew Tomkins , Deepayan Chakrabarti , Christos Faloutsos , Ana Paula Appel

DOI:

关键词:

摘要: Graphs appear in several settings, like social networks, recommendation systems, computer communication gene/protein biological among others. A deep, recurring question is “What do real graphs look like?” That is, how can we separate ones from synthetic or with masked portions? The main contribution of this paper ShatterPlots, a simple and powerful algorithm to extract patterns that help us spot fake/masked graphs. idea shatter graph, by deleting edges, force it reach critical (“Shattering”) point, study the properties at point. One most striking “30-per-cent”: Shattering all have about 30% more nodes than edges. our discriminative “NodeShatteringRatio ”, which almost perfectly extensive collection. Additional contributions are (a) careful, scalable design algorithm, requires only O(E) time, (b) experiments large collection (19 total), up hundreds thousands million (c) wealth observations patterns, show distinguish ones.

参考文章(45)
Yiming Yang, Bryan Klimt, Introducing the Enron Corpus. conference on email and anti-spam. ,(2004)
Richard Durrett, Random graph dynamics ,(2007)
Anthony H. Dekker, Bernard D. Colbert, Network robustness and graph topology ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26. pp. 359- 368 ,(2004)
Deepayan Chakrabarti, AutoPart: parameter-free graph partitioning and outlier detection european conference on principles of data mining and knowledge discovery. pp. 112- 124 ,(2004) , 10.1007/978-3-540-30116-5_13
Matthew Richardson, Rakesh Agrawal, Pedro Domingos, Trust management for the semantic web international semantic web conference. pp. 351- 368 ,(2003) , 10.1007/978-3-540-39718-2_23
S. Milgram, The Small World Problem Psychology today. ,vol. 1, pp. 60- 67 ,(1967)
Milena Mihail, Christos Papadimitriou, On the Eigenvalue Power Law randomization and approximation techniques in computer science. pp. 254- 262 ,(2002) , 10.1007/3-540-45726-7_20
Ricard Solé, Brian Goodwin, Signs Of Life: How Complexity Pervades Biology ,(2000)
Matei Ripeanu, Adriana Iamnitchi, Ian T. Foster, Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design arXiv: Distributed, Parallel, and Cluster Computing. ,(2002)
M. Girvan, M. E. J. Newman, Community structure in social and biological networks Proceedings of the National Academy of Sciences of the United States of America. ,vol. 99, pp. 7821- 7826 ,(2002) , 10.1073/PNAS.122653799