The Graph Structure in the Web – Analyzed on Different Aggregation Levels

作者： Robert Meusel , Sebastiano Vigna , Oliver Lehmberg , Christian Bizer

关键词:

摘要: Knowledge about the general graph structure of theWorldWideWeb is important for understanding social mechanisms that govern its growth, designing ranking methods, devising better crawling algorithms, and creating accurate models structure. In this paper, we analyze a large web graph. The was extracted from publicly accessible crawl gathered by Common Crawl Foundation in 2012. covers over 3:5 billion pages 128:7 hyperlinks. We compare, among other features, degree distributions, connectivity, average distances, weakly/strongly connected components. conduct our analysis on three different levels aggregation: page, host, pay-level domain (PLD) (one “dot level” above public suffixes). Our shows that, as evidenced previous research (Serrano et al., 2007), some features previously observed Broder 2000 are very dependent artifacts process, whereas appear to be more structural. confirm existence giant strongly component; however find, researchers (Donato 2005; Boldi 2002; Baeza-Yates Poblete, 2003), proportions nodes can reach or reached component, suggesting “bow-tie structure” described al. best current knowledge not structural property Web. More importantly, statistical testing visual inspection size-rank plots show distributions indegree, outdegree sizes components page host power laws, contrarily what reported much smaller crawls, although they might heavy tailed. If aggregate at domain, however, law emerges. also provide first time measurement distance-based using recently introduced algorithms scale size (Boldi Vigna, 2013).

webscience-journal.net 本地加速

nowpublishers.com 本地加速

webscience-journal.net PDF 下载加速

nowpublishers.com PDF 下载加速

参考文章(25)

Stefano Millozzi, Stefano Leonardi, Debora Donato, Panayiotis Tsaparas, Mining the inner structure of the Web graph. international workshop on the web and databases. pp. 145- 150 ,(2005)

Walter Willinger, David Alderson, John C. Doyle, Mathematics and the Internet: A Source of Enormous Confusion and Great Potential American Mathematical Society. ,(2009)

Yu Hirate, Shin Kato, Hayato Yamana, Web Structure in 2005 workshop on algorithms and models for the web-graph. pp. 36- 46 ,(2007) , 10.1007/978-3-540-78808-9_4

Christian Bizer, Kai Eckert, Robert Meusel, Hannes Mühleisen, Michael Schuhmacher, Johanna Völker, Deployment of RDFa, Microdata, and Microformats on the Web A Quantitative Analysis international semantic web conference. pp. 17- 32 ,(2013) , 10.1007/978-3-642-41338-4_2

Rajeev Motwani, Terry Winograd, Lawrence Page, Sergey Brin, The PageRank Citation Ranking : Bringing Order to the Web the web conference. ,vol. 98, pp. 161- 172 ,(1999)

M. Ángeles Serrano, Ana Maguitman, Marián Boguñá, Santo Fortunato, Alessandro Vespignani, Decoding the structure of the WWW ACM Transactions on the Web. ,vol. 1, pp. 10- ,(2007) , 10.1145/1255438.1255442

Oliver Lehmberg, Robert Meusel, Christian Bizer, Graph structure in the web Proceedings of the 2014 ACM conference on Web science - WebSci '14. pp. 119- 128 ,(2014) , 10.1145/2615569.2615674

Dennis Fetterly, Mark Manasse, Marc Najork, Spam, damn spam, and statistics: using statistical analysis to locate spam web pages international workshop on the web and databases. pp. 1- 6 ,(2004) , 10.1145/1017074.1017077

P. Boldi, S. Vigna, The webgraph framework I Proceedings of the 13th conference on World Wide Web - WWW '04. pp. 595- 602 ,(2004) , 10.1145/988672.988752

10.

Aaron Clauset, Cosma Rohilla Shalizi, M. E. J. Newman, Power-Law Distributions in Empirical Data Siam Review. ,vol. 51, pp. 661- 703 ,(2009) , 10.1137/070710111

The Graph Structure in the Web – Analyzed on Different Aggregation Levels

来源期刊

我的账户

The Graph Structure in the Web – Analyzed on Different Aggregation Levels

来源期刊

相似文章 10

我的账户