Random graphs and complex networksrhofstad/NotesRGCNII.pdf1.2 Random graphs and real-world networks9...

RANDOM GRAPHS AND COMPLEX NETWORKS

Volume 2

Remco van der Hofstad

November 16, 2020

Aan Mad, Max en Larshet licht in mijn leven

Ter nagedachtenis aan mijn oudersdie me altijd aangemoedigd hebben

Contents

List of illustrations xi

List of tables xiii

Preface xv

Course outline xxi

Part I Preliminaries 1

1 Introduction and preliminaries 31.1 Motivation: Real-world networks 41.2 Random graphs and real-world networks 91.3 Random graph models 11

1.3.1 Erdos-Renyi random graph 111.3.2 Inhomogeneous random graphs 131.3.3 Configuration model 181.3.4 Switching algorithms for uniform random graphs 231.3.5 Preferential attachment models 271.3.6 A Bernoulli preferential attachment model 301.3.7 Universality of random graphs 31

1.4 Power-laws and their properties 321.5 Notation 331.6 Notes and discussion 351.7 Exercises for Chapter 1 35

2 Local convergence of random graphs 372.1 The metric space of rooted graphs 382.2 Local weak convergence of graphs 39

2.2.1 Definition of local weak convergence 392.2.2 Criterion for local weak convergence and tightness 412.2.3 Local weak convergence and completeness of the limit 432.2.4 Examples of local weak convergence 44

2.3 Local convergence of random graphs 452.3.1 Definition of local convergence of random graphs 462.3.2 Criterion for local convergence of random graphs and tightness 482.3.3 Local convergence and completeness of the limit 492.3.4 Local convergence of random regular graphs 502.3.5 Local convergence of Erdos-Renyi random graphs 51

2.4 Consequences of local weak convergence: local functionals 552.4.1 Local weak convergence and convergence of neighborhoods 552.4.2 Local weak convergence and clustering coefficients 562.4.3 Neighborhoods of edges and degree-degree dependencies 592.4.4 PageRank and local weak convergence 63

2.5 The giant component is almost local 642.5.1 Asymptotics of the giant 652.5.2 Properties of the giant 68

v

vi Contents

2.5.3 The condition (2.5.7) revisited 702.5.4 The giant in Erdos-Renyi random graphs 722.5.5 Outline of the remainder of this book 77

2.6 Notes and discussion 782.7 Exercises for Chapter 2 80

Part II Connected components in random graphs 85

3 Phase transition in general inhomogeneous random graphs 893.1 Motivation 893.2 Definition of the model 91

3.2.1 Examples of inhomogeneous random graphs 943.3 Degree sequence of inhomogeneous random graphs 96

3.3.1 Degree sequence of finitely-types case 973.3.2 Finite-type approximations of bounded kernels 973.3.3 Degree sequences of general inhomogeneous random graphs 99

3.4 Multi-type branching processes 1013.4.1 multi-type branching processes with finitely many types 1013.4.2 Survival vs. extinction of multi-type branching processes 1023.4.3 Poisson multi-type branching processes 103

3.5 Local weak convergence for inhomogeneous random graphs 1083.5.1 Local weak convergence: finitely many types 1083.5.2 Local weak convergence: infinitely many types 1123.5.3 Comparison to branching processes 1133.5.4 Local weak convergence to unimodular trees for GRGn(w) 116

3.6 The phase transition for inhomogeneous random graphs 1173.7 Related results for inhomogeneous random graphs 1213.8 Notes and discussion 1243.9 Exercises for Chapter 3 125

4 The phase transition in the configuration model 1294.1 Local weak convergence to unimodular trees for CMn(d) 1304.2 Phase transition in the configuration model 135

4.2.1 The giant component is almost local 1394.2.2 Finding the largest component 1404.2.3 Analysis of the algorithm for CMn(d) 1414.2.4 Proof of Theorem 4.4 1464.2.5 The giant component of related random graphs 150

4.3 Connectivity of CMn(d) 1524.4 Related results for the configuration model 1584.5 Notes and discussion 1614.6 Exercises for Chapter 4 162

5 Connected components in preferential attachment models 1675.1 Exchangeable random variables and Polya urn schemes 1675.2 Local weak convergence of preferential attachment models 177

5.2.1 Local weak convergence of PAMs with fixed number of edges 1775.3 Proof local convergence for preferential attachment models 189

Contents vii

5.3.1 Local weak convergence of related models 2055.3.2 Local weak convergence Bernoulli preferential attachment 205

5.4 Connectivity of preferential attachment models 2065.4.1 Giant component for Bernoulli preferential attachment 207

5.5 Further results for preferential attachment models 2095.6 Notes and discussion 2095.7 Exercises for Chapter 5 210

Part III Small-world properties of random graphs 215

6 Small-world phenomena in inhomogeneous random graphs 2196.1 Small-world effect in inhomogeneous random graphs 2196.2 Lower bounds on typical distances in IRGs 223

6.2.1 Logarithmic lower bound distances for finite-variance degrees 2236.2.2 log log lower bound on distances for infinite-variance degrees 228

6.3 The log log upper bound 2336.4 Path counting and log upper bound for finite-variance weights 238

6.4.1 Path-counting techniques 2396.4.2 Logarithmic distance bounds for finite-variance weights 2456.4.3 Distances for IRGn(κ): Proofs of Theorems 3.16 and 6.1(ii) 251

6.5 Related results on distances for inhomogeneous random graphs 2546.5.1 The diameter in inhomogeneous random graphs 2546.5.2 Distance fluctuations for GRGn(w) with finite-variance degrees 2566.5.3 Distances GRGn(w) for τ = 3 256


7 Small-world phenomena in the configuration model 2637.1 The small-world phenomenon in CMn(d) 2647.2 Proofs of small-world results CMn(d) 265

7.2.1 Branching process approximation 2657.2.2 Path-counting techniques 2667.2.3 A log log upper bound on the diameter core τ ∈ (2, 3) 271

7.3 Branching processes with infinite mean 2757.4 The diameter of the configuration model 281

7.4.1 The diameter of the configuration model: logarithmic case 2817.4.2 Diameter of the configuration model for τ ∈ (2, 3): log log case 283

7.5 Related results for the configuration model 2887.5.1 Distances for infinite-mean degrees 2887.5.2 Fluctuation of distances for finite-variance degrees 2907.5.3 Fluctuation of distances for infinite-variance degrees 291


8 Small-world phenomena in preferential attachment models 2978.1 Logarithmic distances in preferential attachment trees 2988.2 Small-world phenomena in preferential attachment models 3078.3 Path counting in preferential attachment models 309

viii Contents

8.4 Small-world effect in PAMs: logarithmic lower bounds for δ ≥ 0 3208.4.1 Logarithmic lower bounds on distances for δ > 0 3208.4.2 Extension to lower bounds on distances for δ = 0 and m ≥ 2 322

8.5 Typical distance in PA-models: log log-lower bound for δ < 0 3238.6 Small-world PAMs: logarithmic upper bounds for δ ≥ 0 334

8.6.1 Logarithmic upper bounds for δ > 0 3348.7 Small-world effect in PAMs: loglog upper bounds for δ < 0 334

8.7.1 The diameter of the core for δ < 0 3358.7.2 Connecting the periphery to the core for δ < 0 340

8.8 Diameters in preferential attachment models 3418.9 Further results on distances in preferential attachment models 346

8.9.1 Distance evolution for δ < 0 3468.9.2 Distances in the Bernoulli PAM 347


Part IV Related models and problems 353

9 Related Models 3579.1 Two real-world network examples 357

9.1.1 Citation networks 3589.1.2 Terrorist networks 3619.1.3 General real-world network models 361

9.2 Directed random graphs 3639.2.1 Directed inhomogeneous random graphs 3679.2.2 The directed configuration model 3719.2.3 Directed preferential attachment models 375

9.3 Random graphs with community structure: global communities 3769.3.1 Stochastic blockmodel 3769.3.2 Degree-corrected stochastic blockmodel 3809.3.3 Configuration models with global communities 3819.3.4 Preferential attachment models with global communities 383

9.4 Random graphs with community structure: local communities 3859.4.1 Inhomogeneous random graphs with communities 3869.4.2 Configuration models with community structure 3929.4.3 Random intersection graphs 3969.4.4 Exponential random graphs and maximal entropy 399

9.5 Spatial random graphs 4029.5.1 Small-world model 4039.5.2 Hyperbolic random graphs 4059.5.3 Geometric inhomogeneous random graphs 4099.5.4 Spatial preferential attachment models 4159.5.5 Complex network models on the hypercubic lattice 417


Appendix Some facts about the metric space structure of rooted graphs 437

Contents ix

References 445

Index 457

Todo list 461

Illustrations

1.1 Degree sequence of the Autonomous Systems graph in 2014 61.2 log-log plot of the degree sequence in the Internet Movie Data base 71.3 The probability mass functions of the in- and out-degree sequence in the

WWW from various data sets 71.4 AS and Internet hopcount 81.5 Typical distances in the Internet Movie Data base 91.6 Forward and backward switchings 256.1 A 12-step self-avoiding path connecting vertices i and j. 2246.2 A 10-step good path connecting i and j and the upper bounds on the weight

of its vertices. The height of a vertex is high for vertices with large weights. 2296.3 An example of a pair of paths (π, ρ) and its corresponding shape. 2427.1 Number of AS traversed in hopcount data (blue) compared to the model

(purple) with τ = 2.25, n = 10, 940. 2637.2 Empirical probability mass function of the distCMn(d)(o1, o2) for τ = 1.8 and

n = 103, 104, 105. 2899.1 (a) Number of publication per year (logarithmic Y axis). (b)Loglog plot for

the in-degree distribution tail in citation networks 3599.2 Degree distribution for papers from 1984 over time. 3599.3 Time evolution for the number of citations of samples of 20 randomly chosen

papers from 1980 for PS and EE, and from 1982 for BT. 3609.4 Average degree increment over a 20-years time window for papers published

in different years. PS presents an aging effect different from EE and BT,showing that papers in PS receive citations longer than papers in EE andBT. 361

9.5 Distribution of the age of cited papers for different citing years. 3619.6 The WWW according to Broder et al. 3649.7 Tail probabilities of the degrees, community sizes s and the inter-community

degrees k in real-world networks. a) Amazon co-purchasing network, b)Gowalla social network, c) English word relations, d) Google web graph.Pictures taken from Stegehuis et al. (2016b). 387

9.8 The relation between the denseness of a community 2ein/(s2 − s) and the

community size s can be approximated by a power law. a) Amazon co-purchasing network, b) Gowalla social network, c) English word relations,d) Google web graph. Pictures taken from Stegehuis et al. (2016b). 388

9.9 The degree distribution of a household model follows a power law with asmaller exponent than the community size distribution and outside degreedistribution 395

9.10 A hyperbolic embedding of the Internet at Autonomous Systems level byBoguna et al. (2010) (see (Boguna et al., 2010, Figure 3)) 408

xi

Tables

3.1 The five rows correspond to the following real-life networks: (1) Model of theprotein-protein connections in blood of people with Alzheimer-disease. (2)Model of the protein-protein connections in blood of people with MultipleSclerosis. (3) Network of actors in 2011 in the Internet Movie Data base(IMDb), where actors are linked if they appeared in the same movie. (4)Collaboration network based on DBLP. Two scientist are connected if theyever collaborated on a paper. (5) Network where nodes represent zebras.There is a connection between two zebras if they ever interacted with eachother during the observation phase. 89

4.1 The rows in the above table represent the following real-world networks:The California road network consists of nodes representing intersections orendpoints of roads. In the Facebook network nodes represent the users andthe edges Facebook friendships. Hyves was a Dutch social media platform.Nodes represent the users, and an edge between nodes represents a friendship.The arXiv astro-physics network represents authors of papers within theastro-physics section of arXiv. There is an edge between authors if they haveever been co-authors. Models the high voltage power network in westernUSA. The nodes denote transformers substations, and generators and theedges represent transmission cables. The data set consists of jazz-musiciansand there exists a connected if they ever collaborated. 129

xiii

Preface

In this book, we study connected components and small-world properties of ran-dom graph models for complex networks. This is Volume 2 of a sequel of twobooks. Volume 1 describes the preliminary topics of random graphs as models forreal-world networks. Since 1999, many real-world networks have been investigated.These networks turned out to have rather different properties than classical ran-dom graph models, for example in the number of connections that the elementsin the network make. As a result, a wealth of new models was invented to capturethese properties. Volume 1 studies the models as well as their degree structure.This book summarizes the insights developed in this exciting period that focuson the connected components and small-world properties of the proposed randomgraph models.

While Volume 1 is intended to be used for a master level course where stu-dents have a limited prior knowledge of special topics in probability, Volume2 describes more involved notions that have been the focus of attention in thepast two decades. Volume 2 is intended to be used for a PhD level course, areading seminar, or for researchers wishing to obtain a consistent and extendedoverview of the results and methodology that has been developed in this area.Volume 1 includes many of the preliminaries, such as convergence of random vari-ables, probabilistic bounds, coupling, martingales and branching processes, andwe frequently refer to these results. The series of Volumes 1 and 2 aims to be self-contained. In Volume 2 we do briefly repeat some of the preliminaries on randomgraphs, including the introduction of some of the models and the key results ontheir degree distributions as discussed in great length in Volume 1. In Volume 2,we give the results concerning connected components, connectivity transitions, aswell as the small-world nature of the random graph models introduced in Volume1. We aim to give detailed and complete proofs. When we do not give proofs ofour results, we provide extensive pointers to the literature. We further discussseveral more recent random graph models that aim to provide for more realisticmodels for real-world networks, as they incorporate their directed nature, theircommunity structure or their spatial embedding.

The field of random graphs was pioneered in 1959-1960 by Erdos and Renyi(1959; 1960; 1961a; 1961b), in the context of the Probabilistic Method. The initialwork by Erdos and Renyi on random graphs has incited a great amount of followup in the field, initially mainly in the combinatorics community. See the standardreferences on the subject by Bollobas (2001) and Janson, Luczak and Rucinski(2000) for the state of the art. Erdos and Renyi (1960) give a rather completepicture of the various phase transitions that occur in the Erdos-Renyi randomgraph. This initial work did not aim to realistically model real-world networks.In the period after 1999, due to the fact that data sets of real-world networksbecame abundantly available, their structure has attracted enormous attentionin mathematics as well as various applied domains. This is for example reflectedin the fact that one of the first articles in the field by Albert and Barabasi

xv

xvi Preface

(2002) has attracted over 33000 citations. One of the main conclusions fromthis overwhelming body of work is that many real-world networks share twofundamental properties. The first is that they are highly inhomogeneous, in thesense that vertices play rather different roles in the networks. This property isexemplified by the degree structure of the real-world networks obeying powerlaws: these networks are scale-free. The scale-free nature of real-world networksprompted the community to come up with many novel random graph modelsthat, unlike the Erdos-Renyi random graph, do have power-law degree sequences.This was the key focus in Volume 1, where three models were presented that canhave power-law degree sequences.

In this book, we pick up on the trail left in Volume 1, and we now focus onthe connectivity structure between vertices. Connectivity can be summarized intwo key aspects of real-world networks: the fact that they are highly connected,as exemplified by the fact that they tend to have one giant component containinga large proportion of the vertices (if not all of them), and their small-worldnature, quantifying the fact that most pairs of vertices are separated by shortpaths. We discuss the available methods for these proofs, including path countingtechniques, branching process approximations, exchangeable random variablesand De Finetti’s theorems. We pay particular attention to a recent technique,called local weak convergence, that makes the statement that random graphs‘locally look like trees’ precise. This technique is extremely powerful, and webelieve that its full potential has not yet been reached.

This book consists of four parts. In Part I, consisting of Chapters 1-2, we re-peat some definitions from Volume 1, including the random graph models studiedin this book, which are inhomogeneous random graphs, the configuration modeland preferential attachment models. We also we discuss general topics that areimportant in random graph theory, such as power-law distributions and theirproperties. In Chapter 2, we discuss local-weak convergence, a method that playsa central role in the theory of random graphs and in this book. In Part II, con-sisting of Chapters 3–5, we discuss large connected components in random graphmodels. In Chapter 3, we further extend the definition of the generalized randomgraph to general inhomogeneous random graphs. In Chapter 4 we discuss thelarge connected components in the configuration model, and in Chapter 5, wediscuss the connected components in preferential attachment models. In Part III,consisting of Chapters 3–5, we study the small-world nature in random graphs,starting with inhomogeneous random graphs, continuing with the configurationmodel and ending with the preferential attachment model. In Part IV, consistingof Chapter 9, we study related random graph models and their structure. Alongthe way, we give many exercises that help the reader to obtain a deeper under-standing of the material by working on their solutions. These exercises appear inthe last section of each of the chapters, and when applicable, we refer to them atthe appropriate place in the text.

I have tried to give as many references to the literature as possible. How-

Preface xvii

ever, the number of papers on random graphs is currently exploding. In Math-SciNet (see http://www.ams.org/mathscinet), there were, on December 21,2006, a total of 1,428 papers that contain the phrase ‘random graphs’ in thereview text, on September 29, 2008, this number increased to 1614, to 2346on April 9, 2013, and to 2986 on April 21, 2016, and to 12038 on October 5,2020. These are merely the papers on the topic in the math community. Whatis special about random graph theory is that it is extremely multidisciplinary,and many papers using random graphs are currently written in economics, bi-ology, theoretical physics and computer science. For example, in Scopus (seehttp://www.scopus.com/scopus/home.url), again on December 21, 2006, therewere 5,403 papers that contain the phrase ‘random graph’ in the title, abstractor keywords, on September 29, 2008, this increased to 7,928, to 13,987 on April9, 2013, to 19,841 on April 21, 2016 and to 30,251 on October 5, 2020. It can beexpected that these numbers will continue to increase, rendering it impossible toreview all the literature.

In June 2014, I decided to split the preliminary version of this book up into twobooks. This has several reasons and advantages, particularly since Volume 2 ismore tuned towards a research audience, while the first part is more tuned towardsan audience of master students with varying backgrounds. The pdf-versions ofboth Volumes 1 and 2 can be obtained from

http://www.win.tue.nl/∼rhofstad/NotesRGCN.html

For further results on random graphs, or for solutions to some of the exercises inthis book, readers are encouraged to look there. Also, for a more playful approachto networks for a broad audience, including articles, videos, and demos of manyof the models treated in this book, we refer all readers to the NetworkPages athttp://www.networkspages.nl. The NetworkPages are an interactive websitedeveloped by and for all those that are interested in networks. One can finddemos for some of the models discussed here, as well as of network algorithmsand processes on networks.

This book, as well as Volume 1 of it, would not have been possible withoutthe help and encouragement of many people. I thank Gerard Hooghiemstra forthe encouragement to write it, and for using it at Delft University of Technologyalmost simultaneously while I used it at Eindhoven University of Technology inthe Spring of 2006 and again in the Fall of 2008. I particularly thank Gerard formany useful comments, solutions to exercises and suggestions for improvementsof the presentation throughout the book. Together with Piet Van Mieghem, weentered the world of random graphs in 2001, and I have tremendously enjoyedexploring this field together with you, as well as with Henri van den Esker, DmitriZnamenski, Mia Deijfen and Shankar Bhamidi, Johan van Leeuwaarden, JuliaKomjathy, Nelly Litvak and many others.

I thank Christian Borgs, Jennifer Chayes, Gordon Slade and Joel Spencer forjoint work on random graphs that are alike the Erdos-Renyi random graph, but

xviii Preface

do have geometry. Special thanks go to Gordon Slade, who has introduced me tothe world of percolation, which is closely linked to the world of random graphs(see also the classic on percolation by Grimmett (1999)). It is peculiar to see thattwo communities work on two so closely related topics with different methods andeven different terminology, and that it has taken such a long time to build bridgesbetween the subjects. I am very happy that these bridges are now rapidly appear-ing, and the level of communication between different communities has increasedsignificantly. I hope that this book helps to further enhance this communication.Frank den Hollander deserves a special mention. Frank, you have been importantas a driving force throughout my career, and I am very happy now to be workingwith you on fascinating random graph problems!

Further I thank Marie Albenque, Gianmarco Bet, Shankar Bhamidi, FinbarBogerd, Marko Boon, Francesco Caravenna, Rui Castro, Kota Chisaki, Mia Dei-jfen, Michel Dekking, Henri van den Esker, Lorenzo Federico, Lucas Gerin, JesseGoodman, Rajat Hazra, Markus Heydenreich, Frank den Hollander, Yusuke Ide,Svante Janson, Lancelot James, Martin van Jole, Willemien Kets, Bas Kleijn,Julia Komjathy, John Lapeyre, Lasse Leskela, Nelly Litvak, Norio Konno, Ab-bas Mehrabian, Mislav Miskovic, Mirko Moscatelli, Jan Nagel, Sidharthan Nair,Alex Olssen, Mariana Olvera-Cravioto, Helena Pena, Rounak Ray, Nathan Ross,Christoph Schumacher, Karoly Simon, Clara Stegehuis, Dominik Tomecki, NicolaTurchi, Viktoria Vadon, Thomas Vallier, Xiaotin Yu for remarks and ideas thathave improved the content and presentation of these notes substantially. WouterKager has entirely read the February 2007 version of this book, giving many ideasfor improvements of the arguments and of the methodology. Artem Sapozhnikov,Maren Eckhoff and Gerard Hooghiemstra read and commented the October 2011version.

I especially thank Dennis Timmers, Eefje van den Dungen, Joop van de Pol andRowel Gundlach, who, as, my student assistants, have been a great help in thedevelopment of this book, in making figures, providing solutions to the exercises,checking the proofs and keeping the references up to date. Maren Eckhoff andGerard Hooghiemstra also provided many solutions to the exercises, for which Iam grateful! Sandor Kolumban and Robert Fitzner helped me to turn all picturesof real-world networks as well as simulations of network models into a unifiedstyle, a feat that is beyond my LATEX skills. A big thanks for that! Also my thanksfor suggestions and help with figures to Marko Boon, Alessandro Garavaglia,Dimitri Krioukov, Vincent Kusters, Lourens Touwen, Clara Stegehuis, Piet VanMieghem and Yana Volkovich.

This work would not have been possible without the generous support of theNetherlands Organisation for Scientific Research (NWO) through VIDI grant639.032.304, VICI grant 639.033.806 and the Gravitation Networks grant024.002.003.

Preface xix

Remco van der Hofstad Eindhoven

Course outline

The relation between the chapters in Volumes 1 and 2 of this book is as follows:

Random Graphsand Complex

NetworksBranchingProcesses[Volume 1,Chapter 3]

Probabilis-tic Methods[Volume 1,Chapter 2]

Intro-duction

[Volume 1,Chapter 1]

Erdos-Renyirandom graph

Phasetransition[Volume 1,Chapter 4]

Revisited[Volume 1,Chapter 5]

Inhomogeneousrandom graphs

Intro-duction


Con-nectivity


Small world[Volume 2,Chapter 6]

Configura-tion model

Intro-duction


Con-nectivity



Preferentialattachment model

Intro-duction


Con-nectivity



Related models[Vol. 2, Chapter 9]

Local weak con-vergence [Volume

2, Chapter 2]

xxi

xxii Course outline

Here is some more explanation as well as a possible itinerary of a master or PhDcourse on random graphs, including both Volumes 1 and 2, in a course outline:

B Start with the introduction to real-world networks in [Volume 1, Chapter 1],which forms the inspiration for what follows. Continue with [Volume 1, Chapter2], which gives the necessary probabilistic tools used in all later chapters, and pickthose topics that your students are not familiar with and that are used in thelater chapters that you wish to treat. [Volume 1, Chapter 3] introduces branchingprocesses, and is used in [Volume 1, Chapters 4, 5], as well as in most of Volume2.

B After these preliminaries, you can start with the classical Erdos-Renyi ran-dom graph as covered in [Volume 1, Chapters 4–5]. Here you can choose the levelof detail, and decide whether you wish to do the entire phase transition or wouldrather move on to the random graphs models for complex networks. It is possibleto omit [Volume 1, Chapter 5] before moving on.

B Having discussed the Erdos-Renyi random graph, you can make your ownchoice of topics from the world of random graphs. There are three classes of mod-els for complex networks that are treated in quite some detail in this book. Youcan choose how much to treat in each of these models. You can either treat fewmodels and discuss many aspects, or instead discuss many models at a less deeplevel. The introductory chapters about the three models, [Volume 1, Chapter 6]for inhomogeneous random graphs, [Volume 1, Chapter 7] for the configurationmodel, and [Volume 1, Chapter 8] for preferential attachment models, provide abasic introduction to them, focussing on their degree structure. These introduc-tory chapters need to be read in order to understand the later chapters aboutthese models (particularly the ones in Volume 2). The parts on the differentmodels can be read independently.

B For readers that are interested in more advanced topics, one can eithertake one of the models and discuss the different chapters in Volume 2 focussingon them. [Volume 2, Chapters 3 and 6] discuss inhomogeneous random graphs,[Volume 2, Chapters 4 and 7] discuss the configuration model, while [Volume 2,Chapters 5 and 8] focus on preferential attachment models. The alternative isto take one of the topics, and work through them in detail. [Volume 2, Part II]discusses the largest connected components or phase transition in our randomgraph models, while [Volume 2, Part III] treats their small-world nature.

When you have further questions and/or suggestions about course outlines,then feel free to contact me.

Part I

Preliminaries

1

Chapter 1

Introduction and preliminaries

Abstract

In this chapter, we draw motivation from real-world networks, and for-mulate random graph models for them. We focus on some of the mod-els that have received the most attention in the literature, namely, theErdos-Renyi random graph, Inhomogeneous random graphs, the con-figuration model and preferential attachment models. We also discusssome of their extensions that have the potential to yield more realis-tic models for real-world networks. We follow van der Hofstad (2017),which we refer to as [Volume 1], both for the motivation as well as forthe introduction of the random graph models involved.

Looking back, and ahead

In Volume 1 of this pair of books, we have discussed various models having flexibledegree sequences. The generalized random graph and the configuration modelgive us static flexible models for random graphs with various degree sequences.Preferential attachment models give us a convincing explanation of the abundanceof power-law degree sequences in various applications. In [Volume 1, Chapters 6–8], we have focussed on the properties of the degrees of such graphs. However,we have noted in [Volume 1, Chapter 1] that many real-world networks not onlyhave degree sequences that are rather different from the ones of the Erdos-Renyirandom graph, also many examples are small worlds and have a giant connectedcomponent.

In Chapters 3–8, we shall return to the models discussed in [Volume 1, Chapters6–8], and focus on their connected components as well as on the distances in theserandom graph models. Interestingly, a large chunk of the non-rigorous physicsliterature suggests that the behavior in various different random graph modelscan be described by only a few essential parameters. The key parameter of eachof these models in the power-law degree exponent, and the physics literaturepredicts that the behavior in random graph models with similar degree sequencesis similar. This is an example of the notion of universality, a notion which iscentral in statistical physics. Despite its importance, there are only few exampleof universality that can be rigorously proved. In Chapters 3–8, we investigate thelevel of universality present in random graph models.

We will often refer to Volume 1. When we do, we write [Volume 1, Theorem2.17] to mean that we refer to Theorem 2.17 in van der Hofstad (2017).

Organisation of this chapter

This chapter is organised as follows. In Section 9.1.3, we discuss real-world net-works the inspiration that they provide. In Section 1.2, we then discuss how graphsequences, where the size of the involved graphs tends to infinity, aim at describinglarge complex networks. In Section 1.3, we recall the definition of several random

3

4 Introduction and preliminaries

graph models, as introduced in Volume 1. In Section 1.5, we recall some of thestandard notion used in this pair of books. We close this chapter with notes anddiscussion in Section 1.6, and with exercises in Section 1.7.

1.1 Motivation: Real-world networks

In the past two decades, an enormous research effort has been performed onmodeling of various real-world phenomena using networks.

Networks arise in various applications, from the connections between friendsin friendship networks, the connectivity of neurons in the brain, to the relationsbetween companies and countries in economics and the hyperlinks between web-pages in the World-Wide web. The advent of the computer era has made manynetwork data sets available, and around 1999-2000, various groups started to in-vestigate network data from an empirical perspective. See Barabasi (2002) andWatts (2003) for expository accounts of the discovery of network properties byBarabasi, Watts and co-authors. Newman et al. (2006) bundle some of the origi-nal papers detailing the empirical findings of real-world networks and the networkmodels invented for them. The introductory book by Newman (2010) lists manyof the empirical properties of, and scientific methods for, networks. See also [Vol-ume 1, Chapter 1] for many examples of real-world networks and the empiricalfindings for them. Here we just give some basics.

Graphs

A graph G = (V,E) consists of a collection V of vertices, also called vertex set,and a collection of edges E, often called edge set. The vertices correspond tothe objects that we model, the edges indicate some relation between pairs ofthese objects. In our settings, graphs are usually undirected. Thus, an edge is anunordered pair u, v ∈ E indicating that u and v are directly connected. WhenG is undirected, if u is directly connected to v, then also v is directly connectedto u. Thus, an edge can be seen as a pair of vertices. When dealing with socialnetworks, the vertices represent the individuals in the population, while the edgesrepresent the friendships among them. We mainly deal with finite graphs, andthen, for simplicity, we take V = [n] := 1, . . . , n. The degree du of a vertex u isequal to the number of edges containing u, i.e.,

du = #v ∈ V : u, v ∈ E. (1.1.1)

Often, we deal with the degree of a random vertex in G. Let o ∈ [n] be a vertexchosen uniformly at random in [n], then the typical degree is the random variableDn given by

Dn = do. (1.1.2)

It is not hard to see that the probability mass function of Dn is given by

P(Dn = k) =1

n

∑i∈[n]

1di=k. (1.1.3)

1.1 Motivation: Real-world networks 5

Exercise 1.1 asks you to prove (1.1.3).

We next discuss some of the common features that many real-world networksturn out to have, starting with the high variability of the degree distribution:

Scale-free phenomenon

The first, maybe quite surprising, fundamental property of many real-world net-works is that the number of vertices with degree at least k decays slowly for largek. This implies that degrees are highly variable, and that, even though the aver-age degree is not so large, there exist vertices with extremely high degree. Often,the tail of the empirical degree distribution seems to fall off as an inverse powerof k. This is called a ‘power-law degree sequence’, and resulting graphs often gounder the name ‘scale-free graphs’. It is visualized for the AS graph in Figure1.1, where the degree distribution of the AS graph is plotted on a log-log scale.Thus, we see a plot of log k 7→ log nk, where nk is the number of vertices withdegree k. When Nk is proportional to an inverse power of k, i.e., when, for somenormalizing constant cn and some exponent τ ,

nk ≈ cnk−τ , (1.1.4)

then

log nk ≈ log cn − τ log k, (1.1.5)

so that the plot of log k 7→ log nk is close to a straight line. This is the reason whydegree sequences in networks are often depicted in a log-log fashion, rather thanin the more customary form of k 7→ nk. Here, and in the remainder of this section,we write ≈ to denote an uncontrolled approximation. The power-law exponent τcan be estimated by the slope of the line in the log-log plot. Naturally, we musthave that ∑

k

nk = n <∞, (1.1.6)

so that it is reasonable to assume that τ > 1.Let us define the degree distribution by p(n)

k = nk/n, so that p(n)

k equals theprobability that a uniformly chosen vertex has degree k. Then, (1.1.4) can berephrased as

p(n)

k ≈ ck−τ , (1.1.7)

where again ≈ denotes an uncontrolled approximation.

Vertices with extremely high degrees go under various names, indicating theirimportance in the field. They are often called hubs, as the hubs in airport net-works. Another name for them is super-spreader, indicating the importance ofthe high-degree vertices in spreading information, or diseases. The hubs quan-tify the amount of inhomogeneity in the real-world networks, and a large part


of these notes is centered around rigorously establishing the effect that the high-degree vertices have on various properties of the graphs involved, as well as onthe behavior of stochastic processes on them.

For Internet, log-log plots of degree sequences first appeared in a paper bythe Faloutsos brothers (1999) (see Figure 1.1 for the degree sequence in the Au-tonomous Systems graph). Here the power-law exponent is estimated as τ ≈2.15−2.20. Figure 1.2 displays the degree distribution in the Internet Movie Database (IMDb), in which the vertices are actors and two actors are connected whenthey have played together in a movie. Figure 1.3 displays the degree-sequence forboth the in- as well as the out-degrees in various World-Wide Web data bases.TO DO 1.1: Refer to recent discussion on power-laws in real-world networks.

100 101 102 103 104

10−7

10−5

10−3

10−1

degree

pro

port

ion

Figure 1.1 (a) Log-log plot of the probability mass function of the degreesequence of Autonomous Systems (AS) on on April 2014 on a log-log scalefrom Krioukov et al. (2012) (data courtesy of Dmitri Krioukov).

After the discussion of degrees in graphs, we continue with graph distances.

Small-world phenomenon

The first fundamental network property observed in many real-world networksis the fact that typical distances between vertices are small. This is called the‘small-world’ phenomenon (see e.g. the book by Watts (1999)). In particular,such networks are highly connected: their largest connected component containsa significant proportion of the vertices. Many networks, such as the Internet, evenconsist of one connected component, since otherwise e-mail messages could notbe delivered. For example, in the Internet, IP-packets cannot use more than athreshold of physical links, and if distances in the Internet would be larger thanthis threshold, then e-mail service would simply break down. Thus, the graph ofthe Internet has evolved in such a way that typical distances are relatively small,

1.1 Motivation: Real-world networks 7

100 101 102 103 104 10510−7

10−6

10−5

10−4

10−3

10−2

10−1

100

degree

pro

port

ion

Figure 1.2 Log-log plot of the degree sequence in the Internet Movie Database in 2007.

100 101 102 103 104 10510−6

10−5

10−4

10−3

10−2

10−1

100

in-degree

pro

port

ion

Google

Berkeley-Stanford

100 101 102 10310−6

10−5

10−4

10−3

10−2

10−1

100

out-degree

pro

port

ion

Google

Berkeley-Stanford

Figure 1.3 The probability mass function of the in- and out- degreesequences in the Berkeley-Stanford and Google competition graph data setsof the WWW in Leskovec et al. (2009). (a) in-degree; (b) out-degree.

even though the Internet itself is rather large. For example, as seen in Figure1.4(a), the number of Autonomous Systems (AS) traversed by an e-mail data set,sometimes referred to as the AS-count, is typically at most 7. In Figure 1.4(b),


1 2 3 4 5 6 7 8 9 10 11 12 130.0

0.1

0.2

0.3

0.4

12

34

56

78

910

1112

1314

1516

1718

1920

2122

2324

2526

2728

2930

3132

3334

3536

3738

3940

4142

4344

0.00

0.04

0.08

0.12

Figure 1.4 (a) Proportion of AS traversed in hopcount data. (b) Internethopcount data. Courtesy of Hongsuda Tangmunarunkit.

the proportion of routers traversed by an e-mail message between two uniformlychosen routers, referred to as the hopcount, is shown. It shows that the numberof routers traversed is at most 27, while the distribution resembles a Poissonprobability mass function. Figure 1.5 shows typical distances in the IMDb, wheredistances are quite small despite the fact that the network contains more thanone million vertices.

We can imagine that the small-world nature of real-world networks is signif-icant. Indeed, in small-worlds, news can spread quickly as relatively few peopleare needed to spread it between two typical individuals. This is quite helpful inInternet, where e-mail messages hop along the edges of the network. At the otherside of the spectrum, it also implies that infectious diseases can spread quite fast,as few infections carry it to large parts of a population. This implies that diseaseshave a larger potential of becoming pandemic, and the fact that human societybecomes a ‘smaller world’ due to the more extensive traveling of virtually ev-eryone is a continuous threat to health care workers throughout the population.

Let us continue by introducing graph distances, as displayed in Figures 1.2–1.5, formally. For u, v ∈ [n] and a graph G = ([n], E), we let the graph distancedistG(u, v) between u and v be equal to the minimal number of edges in a pathlinking u and v. When u and v are not in the same connected component, weset distG(u, v) =∞. We are interested in settings where G has a high amount ofconnectivity, so that many pairs of vertices are connected to one another by shortpaths. In order to describe how large distances between vertices typically are, wedraw o1 and o2 uniformly at random from [n], and we investigate the randomvariable

distG(o1, o2). (1.1.8)

The quantity in (1.1.8) is a random variable even for deterministic graphs dueto the occurrence of the two, uniformly at randomly, vertices o1, o2 ∈ [n]. Fig-

1.2 Random graphs and real-world networks 9

1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

distance

pro

port

ion

of

pair

s

2003

Figure 1.5 Typical distances in the Internet Movie Data base in 2003.

ures 1.2–1.5 display the probability mass function of this random variable forsome real-world networks. Often, we will consider distG(o1, o2) conditionally ondistG(o1, o2) < ∞. This means that we consider the typical number of edges be-tween a uniformly chosen pair of connected vertices. As a result, distG(o1, o2) issometimes referred to as the typical distance.

The nice property of distG(o1, o2) is that its distribution tells us somethingabout all possible distances in the graph. An alternative and frequently usedmeasure of distances in a graph is the diameter diam(G), defined as

diam(G) = maxu,v∈[n]

distG(u, v). (1.1.9)

However, the diameter has several disadvantages. First, in many instances, thediameter is algorithmically more difficult to compute than the typical distances(since one has to measure the distances between all pairs of vertices and maximizeover them). Second, it is a number instead of the distribution of a random variable,and therefore contains far less information that the distribution of distG(o1, o2).Finally, the diameter is highly sensitive to small changes of the graph. For exam-ple, adding a string of connected vertices to a graph may change the diameterdramatically, while it hardly influences the typical distances.

1.2 Random graphs and real-world networks

In this section, we discuss how random graph sequences can be used to modelreal-world networks. We start by discussing graph sequences:

Graph sequences

Motivated by the previous section, in which empirical evidence was discussedthat many real-world networks are scale free and small worlds, we set about thequestion of how to model them. Since many networks are quite large, mathemat-


ically, we model real-world networks by graph sequences (Gn)n≥1, where Gn hassize n and we take the limit n→∞. Since most real-word networks are such thatthe average degree remains bounded, we will focus on the sparse regime. In thesparse regime, it is assumed that

lim supn→∞

E[Dn] = lim supn→∞

1

n

∑i∈[n]

di <∞. (1.2.1)

Furthermore, we aim to study graphs that are asymptotically well behaved. Forexample, we will often either assume, or prove, that the typical degree distributionconverges, i.e., there exists a limiting degree random variable D such that

Dnd−→ D, (1.2.2)

whered−→ denotes weak convergence of random variables. Also, we will assume

that our graphs are small worlds, which is often translated in the asymptoticsense that there exists a constant K such that

limn→∞

P(distG(o1, o2) ≤ K log n) = 1. (1.2.3)

In what follows, we will discuss random graph models that share these two fea-tures.

There are many more features that one could take into account when modelingreal-world networks. See e.g., [Volume 1, Section 1.4] for a slightly expandeddiscussion of such features. Other features that many networks share, or ratherform a way to distinguish them, are the following:

(a) their degree correlations, measuring the extent to which high-degree verticestend to be connected to high-degree vertices, or rather to low-degree vertices(and vice versa);

(b) their clustering, measuring the extent to which pairs of neighbors of verticesare neighbors themselves as well;

(c) their community structure, measuring the extent to which the networks havemore dense connected subparts;

See e.g., the book by Newman (2010) for an extensive discussion of such fea-tures, as well as the algorithmic problems that arise from them.

Random graphs as models for real-world networks

Real-world networks tend to be quite complex and unpredictable. This is quiteunderstandable, since connections often arise rather irregularly. We model suchirregular behavior by letting connections arise through a random process, thusleading us to study random graphs. By the previous discussion, our graphs willbe large and their size will tend to infinity. In such a setting, we can either modelthe graphs by fixing their size to be large, or rather by letting the graphs grow toinfinite size in a consistent manner. We refer to these two settings as static and

1.3 Random graph models 11

dynamic random graphs. Both are useful viewpoints. Indeed, a static graph is amodel for a snapshot of a network at a fixed time, where we do not know how theconnections arose in time. Many network data sets are of this form. A dynamicsetting, however, may be more appropriate when we know how the network cameto be as it is. In the static setting, we can make model assumptions on the degreesso that they are scale free. In the dynamic setting, we can let the evolution of thegraph be such that they give rise to power-law degree sequences, so that thesesettings may provide explanations for the frequent occurrence of power-laws inreal-world networks.

Most of the random graph models that have been investigated in the (extensive)literature are caricatures of reality, in the sense that one cannot with dry eyes ar-gue that they describe any real-world network quantitatively correctly. However,these random graph models do provide insight into how any of the above featurescan influence the global behavior of networks, and thus provide for possible ex-planations of the empirical properties of real-world networks that are observed.Also, random graph models can be used as null models, where certain aspectsof real-world networks are taken into account, while others are not. This gives aqualitative way of investigating the importance of such empirical features in thereal world. Often, real-world networks are compared to uniform random graphswith certain specified properties, such as their number of edges or even their de-gree sequence. We will come back to how to generate random graphs uniformlyat random from the collection of all graphs with these properties below.

In the next section, we describe four models of random graphs, three of whichare static and one of which is dynamic.

1.3 Random graph models

We start with the most basic and simple random graph model, which has provedto be a source of tremendous inspiration, both for its mathematical beauty, aswell as providing a starting point for the analysis of random graphs.

1.3.1 Erdos-Renyi random graph

The Erdos-Renyi random graph is the simplest possible random graph. In it,we make every possible edge between a collection of n vertices open or closedwith equal probability. Thus, Erdos-Renyi random graph has vertex set [n] =1, . . . , n, and, denoting the edge between vertices s, t ∈ [n] by st, st is occupiedor present with probability p, and vacant or absent otherwise, independently ofall the other edges. The parameter p is called the edge probability. The aboverandom graph is denoted by ERn(p).

The model is named after Erdos and Renyi, since they have made profoundcontributions in the study of this model. See, in particular, Erdos and Renyi(1959, 1960, 1961a,b), where Erdos and Renyi investigate a related model inwhich a collection of m edges is chosen uniformly at random from te collection of

12 Introduction and preliminaries(n2

)possible edges. The model just defined was first introduced though by Gilbert

(1959), and was already investigated heuristically by Solomonoff and Rapoport(1951). Informally, when m = p

(n2

), the two models behave very similarly. We

remark in more detail on the relation between these two models at the end ofthis section. Exercise 1.2 investigates the uniform nature of ERn(p) with p =1/2. Alternatively speaking, the null model where we take no properties of thenetwork into account is the ERn(p) with p = 1/2. This model has expected degree(n− 1)/2, which is quite large. As a result, this model is not sparse at all. Thus,we next make this model sparser by making p smaller.

Since each edge is occupied with probability p, we obtain that

P(Dn = k) =

(n− 1

k

)pk(1− p)k = P(Bin(n− 1, p) = k), (1.3.1)

where Bin(m, p) is a binomial random variable with m trials and success proba-bility p. Since

E[Dn] = (n− 1)p, (1.3.2)

for this model to be sparse, we need that p becomes small with n. Thus, we take

p =λ

n, (1.3.3)

and study the graph as λ is fixed while n→∞. In this regime, we know that

Dnd−→ D, (1.3.4)

where D ∼ Poi(λ). It turns out that this result can be strengthened to thestatement that the proportion of vertices with degree k also converges to theprobability mass function of a Poisson random variable (see [Volume 1, Section5.4]), i.e., for every k ≥ 0,

P (n)

k =1

n

∑i∈[n]

1di=kP−→ pk ≡ e−λ

λk

k!. (1.3.5)

It is well known that the Poison distribution has very thin tails, even thinnerthan any exponential, as you are requested to prove in Exercise 1.3. We concludethat Erdos-Renyi random graph is not a good model for real-world networkswith their highly-variable degree distributions. In the next section, we discussinhomogeneous extensions of Erdos-Renyi random graphs which can have highly-variable degrees.

Before doing so, let us make some useful final remarks about the Erdos-Renyi random graph. Firstly, we can also view it as percolation on the com-plete graph. Percolation is a paradigmatic model in statistical physics describingrandom failures in networks (see Grimmett (1999) for an extensive overview ofpercolation theory focussing on Zd). Secondly, the model described here as theErdos-Renyi random graph was actually not invented by Erdos and Renyi, but


rather by Gilbert (1959). Erdos and Renyi (1959, 1960, 1961b), instead, con-sidered the closely related combinatorial setting where a uniform sample of medges is added to the empty graph. In the latter case, the proportion of edgesis 2m/n(n − 1) ≈ 2m/n2, so we should think of m ≈ 2λn for a fair comparison.Note that when we condition the total number of edges to be equal to m, thelaw of the Erdos-Renyi random graphis equal to the model where a collectionof m uniformly chosen edges is added, explaining the close relation between thetwo models. Due to the concentration of the total number of edges, we can in-deed roughly exchange the binomial model with p = λ/m with the combinatorialmodel with m = 2λn. The combinatorial model has the nice feature that it pro-duces a uniform graph from the collection of all graphs with m edges, and thuscould serve as a null model for a real-world network in which only the number ofedges is fixed.

1.3.2 Inhomogeneous random graphs

In inhomogeneous random graphs, we keep the independence of the edges, butmake the edge probabilities different for different edges. A general format forsuch models is in the seminal work of Bollobas et al. (2007). We will discuss suchgeneral inhomogeneous random graphs in Chapter 3 below. We start with onekey example, that has attracted the most attention in the literature so far, andis also discussed in great detail in [Volume 1, Chapter 6].

Rank-1 inhomogeneous random graphs

The simplest inhomogeneous random graph models are sometimes referred toas rank-1 models, since the edge probabilities are (close to) products of vertexweights. This means that the expected number of edges between vertices, whenviewed as a matrix, is (close to) a rank-1 matrix. We start by discussing oneof such models, which is the so-called generalized random graph and was firstintroduced by Britton et al. (2006).

In the generalized random graph model, the edge probability of the edge be-tween vertices i and j, for i 6= j, is equal to

pij = p(GRG)

ij =wiwj

`n + wiwj, (1.3.6)

wherew = (wi)i∈[n] are the vertex weights, and `n is the total weight of all verticesgiven by

`n =∑i∈[n]

wi. (1.3.7)

We denote the resulting graph by GRGn(w). In many cases, the vertex weightsactually depend on n, and it would be more appropriate, but also more cum-bersome, to write the weights as w(n) = (w(n)

i )i∈[n]. To keep notation simple, werefrain from making the dependence on n explicit. A special case of the gener-alized random graph is when we take wi ≡ nλ

n−λ , in which case pij = λ/n for alli, j ∈ [n], so that we retrieve the Erdos-Renyi random graph ERn(λ/n).


The generalized random graph GRGn(w) is close to many other inhomogeneousrandom graph models, such as the random graph with given prescribed degrees orChung-Lu model, where instead

pij = p(CL)

ij = min(wiwj/`n, 1), (1.3.8)

and which has been studied intensively by Chung and Lu (2002a,b, 2003, 2006a,b).A further adaptation is the so-called Poissonian random graph or Norros-Reittumodel (introduced by Norros and Reittu (2006)), for which

pij = p(NR)

ij = 1− exp (−wiwj/`n) . (1.3.9)

See Janson (2010a) or [Volume 1, Sections 6.7 and 6.8] for conditions under whichthese random graphs are asymptotically equivalent, meaning that all events haveequal asymptotic probabilities.

Naturally, the topology of the generalized random graph depends sensitivelyupon the choice of the vertex weights w = (wi)i∈[n]. These vertex weights canbe rather general, and we both investigate settings where the weights are deter-ministic, as well as where they are random. In order to describe the empiricalproportions of the weights, we define their empirical distribution function to be

Fn(x) =1

n

∑i∈[n]

1wi≤x, x ≥ 0. (1.3.10)

We can interpret Fn as the distribution of the weight of a uniformly chosen vertexin [n] (see Exercise 1.4). We denote the weight of a uniformly chosen vertex o in[n] by Wn = wo, so that, by Exercise 1.4, Wn has distribution function Fn.

The degree distribution can only converge when the vertex weights are suf-ficiently regular. We often assume that the vertex weights satisfy the followingregularity conditions, which turn out to imply convergence of the degree distri-bution in the generalized random graph:

Condition 1.1 (Regularity conditions for vertex weights) There exists a dis-tribution function F such that, as n→∞ the following conditions hold:(a) Weak convergence of vertex weight. As n→∞,

Wnd−→W, (1.3.11)

where Wn and W have distribution functions Fn and F , respectively. Equivalently,for any x for which x 7→ F (x) is continuous,

limn→∞

Fn(x) = F (x). (1.3.12)

(b) Convergence of average vertex weight. As n→∞,

E[Wn]→ E[W ], (1.3.13)

where Wn and W have distribution functions Fn and F , respectively. Further, we


assume that E[W ] > 0.(c) Convergence of second moment vertex weight. As n→∞,

E[W 2n ]→ E[W 2]. (1.3.14)

Condition 1.1(a) guarantees that the weight of a ‘typical’ vertex is close to arandom variable W that is independent of n. Condition 1.1(b) implies that theaverage weight of the vertices in GRGn(w) converges to the expectation of thelimiting weight variable. In turn, this implies that the average degree in GRGn(w)converges to the expectation of the limit random variable of the vertex weights.Condition 1.1(c) ensures the convergence of the second moment of the weights tothe second moment of the limiting weight variable.

Remark 1.2 (Regularity for random weights) Sometimes we will be interestedin cases where the weights of the vertices are random themselves. For example,this arises when the weights w = (wi)i∈[n] are realizations of i.i.d. random vari-ables. When the weights are random variables themselves, also the function Fnis a random distribution function. Indeed, in this case Fn is the empirical dis-tribution function of the random weights (wi)i∈[n]. We stress that E[Wn] is thento be interpreted as 1

n

∑i∈[n]wi, which is itself random. Therefore, in Condition

1.1, we require random variables to converge, and there are several notions ofconvergence that may be used. As it turns out, the most convenient notion ofconvergence is convergence in probability.

Let us now discuss some canonical examples of weight distributions that satisfythe Regularity Condition 1.1.

Weights moderated by a distribution function

Let F be a distribution function for which F (0) = 0 and fix

wi = [1− F ]−1(i/n), (1.3.15)

where [1−F ]−1 is the generalized inverse function of 1−F defined, for u ∈ (0, 1),by (recall [Volume 1, (6.2.14)–(6.2.15)])

[1− F ]−1(u) = infx : [1− F ](x) ≤ u. (1.3.16)

For the choice in (1.3.15), we can explicitly compute Fn as (see [Volume 1,(6.2.17)])

Fn(x) =1

n

(⌊nF (x)

⌋+ 1)∧ 1. (1.3.17)

It is not hard to see that Condition 1.1(a) holds for (wi)i∈[n] as in (1.3.15), whileCondition 1.1(b) holds when E[W ] <∞ and Condition 1.1(c) when E[W 2] <∞,as can be concluded from Exercise 1.6.


Independent and identically distributed weights

The generalized random graph can be studied both with deterministic weightsas well as with independent and identically distributed (i.i.d.) weights. Since weoften deal with ratios of the form wiwj/(

∑k∈[n]wk), we assume that P(w = 0) = 0

to avoid situations where all weights are zero.

Both models, i.e., with weights (wi)i∈[n] as in (1.3.15), and with i.i.d. weights(wi)i∈[n], have their own merits. The great advantage of i.i.d. weights is thatthe vertices in the resulting graph are, in distribution, the same. More precisely,the vertices are completely exchangeable, like in the Erdos-Renyi random graphERn(p). Unfortunately, when we take the weights to be i.i.d., then in the resultinggraph the edges are no longer independent (despite the fact that they are con-ditionally independent given the weights). In the sequel, we focus on the settingwhere the weights are prescribed. When the weights are deterministic, this changesnothing, when the weights are i.i.d., this means that we work conditionally onthe weights.

Degrees in generalized random graphs

We write di for the degree of vertex i in GRGn(w). Thus, di is given by

di =∑j∈[n]

Xij, (1.3.18)

where Xij is the indicator that the edge ij is occupied. By convention, we setXij = Xji. The random variables (Xij)1≤i<j≤n are independent Bernoulli vari-ables with P(Xij = 1) = pij as defined in (1.3.6).

For k ≥ 0, we let

P (n)

k =1

n

∑i∈[n]

1Di=k (1.3.19)

denote the degree sequence of GRGn(w). We denote the probability mass functionof a mixed Poisson distribution by pk, i.e., for k ≥ 0,

pk = E[e−W

W k

k!

], (1.3.20)

where W is a random variable having distribution function F from Condition1.1. The main result concerning the vertex degrees, which is [Volume 1, Theorem6.10] is as follows:

Theorem 1.3 (Degree sequence of GRGn(w)) Assume that Conditions 1.1(a)-(b) hold. Then, for every ε > 0,

P( ∞∑k=0

|P (n)

k − pk| ≥ ε)→ 0, (1.3.21)

where (pk)k≥0 is given by (1.3.20).


Consequently, with Dn = do denoting the degree of a random vertex, we obtain

Dnd−→ D, (1.3.22)

where P(D = k) = pk = E[e−W Wk

k!

], as shown in Exercise 1.7.

Recall from Section 9.1.3 that we are interested in scale-free random graphs,i.e., random graphs for which the degree distribution obeys a power law. We seefrom Theorem 1.3 that this is true precisely when D obeys a power law. This, inturn, occurs precisely when W obeys a power law, i.e., when, for w large,

P(W > w) =c

wτ−1(1 + o(1)), (1.3.23)

and then also, for w large,

P(D > w) =c

wτ−1(1 + o(1)). (1.3.24)

Generalized random graph conditioned on its degrees

The generalized random graph with its edge probabilities as in (1.3.6) is ratherspecial. Indeed, when we condition on its degree sequence, then the graph has auniform distribution over the set of all graphs with the same degree sequence. Forthis, note that GRGn(w) can be equivalently encoded by (Xij)1≤i<j≤n, where Xij

is the indicator that the edge ij is occupied. Then, (Xij)1≤i<j≤n are independentBernoulli random variables with edge probabilities as in (1.3.6). By convention,let Xii = 0 for every i ∈ [n], and Xji = Xij for 1 ≤ i < j ≤ n In terms ofthe variables X = (Xij)1≤i<j≤n, let di(X) =

∑j∈[n]Xij be the degree of vertex i.

Then, the uniformity is equivalent to the statement that, for each x = (xij)1≤i<j≤nsuch that di(x) = di for every i ∈ [n],

P(X = x | di(X) = di ∀i ∈ [n]) =1

#y : di(y) = di ∀i ∈ [n] , (1.3.25)

that is, the distribution is uniform over all graphs with the prescribed degreesequence. This will turn out to be rather convenient, and thus we state it formallyhere:

Theorem 1.4 (GRG conditioned on degrees has uniform law) The GRG withedge probabilities (pij)1≤i<j≤n given by

pij =wiwj

`n + wiwj, (1.3.26)

conditioned on di(X) = di∀i ∈ [n], is uniform over all graphs with degreesequence (di)i∈[n].

Proof This is [Volume 1, Theorem 6.15].

In Chapter 3 below, we discuss a far more general setting of inhomogeneousrandom graphs. The analysis of general inhomogeneous random graphs is sub-stantially more challenging than the rank-1 case. As explained in more detail inthe next chapter, this is due to the fact that they are no longer locally describedby single-type branching processes, but rather by multitype branching processes.


1.3.3 Configuration model

The configuration model is a model in which the degrees of vertices are fixedbeforehand. Such a model is more flexible than the generalized random graph.For example, the generalized random graph always has a positive proportion ofvertices of degree 0, 1, 2, etc, as easily follows from Theorem 1.3.

Fix an integer n that will denote the number of vertices in the random graph.Consider a sequence of degrees d = (di)i∈[n]. The aim is to construct an undi-rected (multi)graph with n vertices, where vertex j has degree dj. Without lossof generality, we assume throughout this chapter that dj ≥ 1 for all j ∈ [n],since when dj = 0, vertex j is isolated and can be removed from the graph.One possible random graph model is then to take the uniform measure over suchundirected and simple graphs. Here, we call a multigraph simple when it has noself-loops, and no multiple edges between any pair of vertices. However, the setof undirected simple graphs with n vertices where vertex j has degree dj may beempty. For example, in order for such a graph to exist, we must assume that thetotal degree

`n =∑j∈[n]

dj (1.3.27)

is even. We wish to construct a simple graph such that d = (di)i∈[n] are thedegrees of the n vertices. However, even when `n =

∑j∈[n] dj is even, this is not

always possible.

Since it is not always possible to construct a simple graph with a given degreesequence, instead, we construct a multigraph, that is, a graph possibly having self-loops and multiple edges between pairs of vertices. One way of obtaining such amultigraph with the given degree sequence is to pair the half-edges attached tothe different vertices in a uniform way. Two half-edges together form an edge,thus creating the edges in the graph. Let us explain this in more detail.

To construct the multigraph where vertex j has degree dj for all j ∈ [n], wehave n separate vertices and incident to vertex j, we have dj half-edges. Everyhalf-edge needs to be connected to another half-edge to form an edge, and byforming all edges we build the graph. For this, the half-edges are numbered in anarbitrary order from 1 to `n. We start by randomly connecting the first half-edgewith one of the `n − 1 remaining half-edges. Once paired, two half-edges forma single edge of the multigraph, and the half-edges are removed from the list ofhalf-edges that need to be paired. Hence, a half-edge can be seen as the left orthe right half of an edge. We continue the procedure of randomly choosing andpairing the half-edges until all half-edges are connected, and call the resultinggraph the configuration model with degree sequence d, abbreviated as CMn(d).

A careful reader may worry about the order in which the half-edges are be-ing paired. In fact, this ordering turns out to be completely irrelevant since therandom pairing of half-edges is completely exchangeable. It can even be done ina random fashion, which will be useful when investigating neighborhoods in the


configuration model. See e.g., [Volume 1, Definition 7.5 and Lemma 7.6] for moredetails on this exchangeability.

Interestingly, one can compute rather explicitly what the distribution of CMn(d)is. To do so, note that CMn(d) is characterized by the random vector (Xij)1≤i≤j≤n,where Xij is the number of edges between vertex i and j. Here Xii is the numberof self-loops incident to vertex i, and

di = Xii +∑j∈[n]

Xij (1.3.28)

In terms of this notation, and writing G = (xij)i,j∈[n] to denote a multigraph onthe vertices [n],

P(CMn(d) = G) =1

(`n − 1)!!

∏i∈[n] di!∏

i∈[n] 2xii∏

1≤i≤j≤n xij!. (1.3.29)

See e.g., [Volume 1, Proposition 7.7] for this result. In particular, P(CMn(d) = G)is the same for each simple G, where G is simple when xii = 0 for every i ∈ [n] andxij ∈ 0, 1 for every 1 ≤ i < j ≤ n. Thus, the configuration model conditionedon simplicity is a uniform random graph with the prescribed degree distribution.This is quite relevant, as it gives a convenient way to obtain such a uniform graph,which is a highly non-trivial fact.

Interestingly, the configuration model was invented by Bollobas (1980) to studyuniform random regular graphs (see also (Bollobas, 2001, Section 2.4)). The in-troduction was inspired by, and generalized the results in, the work of Benderand Canfield (1978). The original work allowed for a careful computation of thenumber of regular graphs, using a probabilistic argument. This is the probabilisticmethod at its best, and also explains the emphasis on the study of the probabilityfor the graph to be simple as we will see below. The configuration model, as wellas uniform random graphs with a prescribed degree sequence, were further stud-ied in greater generality by Molloy and Reed (1995, 1998). This extension is quiterelevant to us, as the scale-free nature of many real-world applications encouragesus to investigate configuration models with power-law degree sequences.

The uniform nature of the configuration model partly explains its popularity,and it has become one of the most highly studied random graph models. It alsoimplies that, conditioned on simplicity, the configuration model is the null modelfor a real-world network where all the degrees are fixed. It thus allows one todistinguish the relevance of the degree inhomogeneity and other features of thenetwork, such as its community structure, clustering, etc.

As for the GRGn(w), we again impose regularity conditions on the degreesequence d. In order to state these assumptions, we introduce some notation. Wedenote the degree of a uniformly chosen vertex o in [n] by Dn = do. The randomvariable Dn has distribution function Fn given by

Fn(x) =1

n

∑j∈[n]

1dj≤x, (1.3.30)


which is the empirical distribution of the degrees. We assume that the vertexdegrees satisfy the following regularity conditions:

Condition 1.5 (Regularity conditions for vertex degrees)(a) Weak convergence of vertex weight. There exists a distribution functionF such that, as n→∞,

Dnd−→ D, (1.3.31)

where Dn and D have distribution functions Fn and F , respectively.Equivalently, for any x,

limn→∞

Fn(x) = F (x). (1.3.32)

Further, we assume that F (0) = 0, i.e., P(D ≥ 1) = 1.(b) Convergence of average vertex degrees. As n→∞,

E[Dn]→ E[D], (1.3.33)

where Dn and D have distribution functions Fn and F from part (a), respectively.(c) Convergence of second moment vertex degrees. As n→∞,

E[D2n]→ E[D2], (1.3.34)

where again Dn and D have distribution functions Fn and F from part (a), re-spectively.

The possibility to obtain a non-simple graph is a major disadvantage of theconfiguration model. There are two ways of dealing with this complication:

(a) Erased configuration model

The first way of dealing with multiple edges is to erase the problems. Thismeans that we replace CMn(d) = (Xij)1≤i≤j≤n by its erased version ECMn(d) =

(X(er)ij )1≤i≤j≤n, where X

(er)ii ≡ 0, while X

(er)ij = 1 precisely when Xij ≥ 1. In words,

we remove the self-loops and merge all multiple edges to a single edge. Of course,this changes the precise degree distribution. However, [Volume 1, Theorem 7.10]shows that only a small proportion of the edges is erased, so that the erasing doesnot change the degree distribution. See [Volume 1, Section 7.3] for more details.Of course, the downside of this approach is that the degrees are changed by theprocedure, while we would like to keep the degrees precisely as specified.

Let us describe the degree distribution in the erased configuration model inmore detail, to study the effect of the erasure of self-loops and multiple edges.We denote the degrees in the erased configuration model by D(er) = (D(er)

i )i∈[n],so that

D(er)

i = di − 2si −mi, (1.3.35)


where (di)i∈[n] are the degrees in CMn(d), si = xii is the number of self-loops ofvertex i in CMn(d), and

mi =∑j 6=i

(xij − 1)1xij≥2 (1.3.36)

is the number of multiple edges removed from i.Denote the empirical degree sequence (p(n)

k )k≥1 in CMn(d) by

p(n)

k = P(Dn = k) =1

n

∑i∈[n]

1di=k, (1.3.37)

and denote the related degree sequence in the erased configuration model (P (er)

k )k≥1

by

P (er)

k =1

n

∑i∈[n]

1D(er)i =k. (1.3.38)

From the notation it is clear that (p(n)

k )k≥1 is a deterministic sequence when d =(di)i∈[n] is deterministic, while (P (er)

k )k≥1 is a random sequence, since the eraseddegrees (D(er)

i )i∈[n] form a random vector even when d = (di)i∈[n] is deterministic.

Now we are ready to state the main result concerning the degree sequence ofthe erased configuration model:

Theorem 1.6 (Degree sequence of erased configuration model with fixed degrees)For fixed degrees d satisfying Conditions 1.5(a)-(b), the degree sequence of theerased configuration model (P (er)

k )k≥1 converges in probability to (pk)k≥1. Moreprecisely, for every ε > 0,

P( ∞∑k=1

|P (er)

k − pk| ≥ ε)→ 0, (1.3.39)

where pk = P(D = k) as in Condition 1.5(a).

Theorem 1.6 indeed shows that most of the edges are kept in the erasureprocedure, see Exercise 1.9.

(b) Configuration model conditioned on simplicity

The second solution to the multigraph problem of the configuration model is tothrow away the result when it is not simple, and to try again. Therefore, this con-struction is sometimes called the repeated configuration model (see Britton et al.(2006)). It turns out that, when Conditions 1.5(a)-(c) hold, then (see [Volume 1,Theorem 7.12])

limn→∞

P(CMn(d) is a simple graph) = e−ν/2−ν2/4, (1.3.40)


where

ν =E[D(D − 1)]

E[D](1.3.41)

is the expected forward degree. Thus, this is a realistic option when E[D2] <∞.Unfortunately, this is not an option when the degrees obey an asymptotic powerlaw with τ ∈ (2, 3), since then E[D2] = ∞. Note that, by (1.3.29), CMn(d)conditioned on simplicity is a uniform random graph with the prescribed degreesequence. We denote this random graph by UGn(d). We will return to the dif-ficulty of generating simple graphs with infinite-variance degrees later in thischapter.

Relation GRG and CM

Since CMn(d) conditioned on simplicity yields a uniform (simple) random graphwith these degrees, and by (1.3.25), also GRGn(w) conditioned on its degrees isa uniform (simple) random graph with the given degree distribution, the laws ofthese random graph models are the same. As a result, one can prove results forGRGn(w) by proving them for CMn(d) under the appropriate degree conditions,and then proving that GRGn(w) satisfies these conditions in probability. See[Volume 1, Section 7.5], where this is worked out in great detail. We summarizethe results in Theorem 1.7 below, as it will be frequently convenient to deriveresults for GRGn(w) through those for appropriate CMn(d)’s.

A further useful result in this direction is that the weight regularity condi-tions in Conditions 1.1(a)-(c) imply the degree regularity conditions in Conditions1.5(a)-(c):

Theorem 1.7 (Regularity conditions weights and degrees) Let di be the degreeof vertex i in GRGn(w), and let d = (di)i∈[n]. Then, d satisfies Conditions 1.5(a)-(b) in probability when w satisfies Conditions 1.1(a)-(b), where

P(D = k) = E[W k

k!e−W

](1.3.42)

denotes the mixed-Poisson distribution with mixing distribution W having distri-bution function F in Condition 1.1(a). Further, d satisfies Conditions 1.5(a)-(c)in probability when w satisfies Conditions 1.1(a)-(c).

Proof This is [Volume 1, Theorem 7.19]. The weak convergence in Condition1.5(a) is Theorem 1.3.

Theorem 1.7 allows us to prove many results for the generalized random graphby first proving them for the configuration model, and then extending them tothe generalized random graph. See [Volume 1, Sections 6.6 and 7.5] for moredetails. This will prove to be a convenient proof strategy to deduce results for thegeneralized random graph from those for the configuration model that will alsobe frequently used in this book.


1.3.4 Switching algorithms for uniform random graphs

So far, we have focussed on obtaining a uniform random graph with a prescribeddegree sequence by conditioning the configuration model on being simple. As ex-plained above, this does not work so well when the degrees have infinite variance.Another setting where this method fails to deliver is when the average degree islarge rather than bounded, so that the graph is no longer sparse strict sense. Analternative method to produce a sample from the uniform distribution is by usinga switching algorithm. A switching algorithm is a Markov chain on the space ofsimple graphs, where, in each step, some edges in the graph are rewired. The uni-form distribution is the stationary distribution of this Markov chain, so lettingthe switching algorithm run infinitely long, we obtain a perfect sample from theuniform distribution. Let us now describe in some more detail how this algorithmworks. Switching algorithms can also be used rather effectively to compute prob-abilities of certain events for uniform random graphs with specified degrees, aswe explain afterwards. As such, switching methods form an indispensable tool instudying uniform random graphs with prescribed degrees. We start by explainingthe basic switching algorithms and its relation to uniform sampling.

The switch Markov chain

The switch Markov chain is a Markov chain on the space of simple graphs withprescribed degrees given by d. Fix a simple graph G = ([n], E(G)) for which thedegree of vertex i equals di for all i ∈ [n]. We assume that such a simple exists. Inorder to describe the dynamics of the switch chain, choose two edges u, v andx, y uniformly at random from E(G). The possible switches of these two edgesare (1) u, x and v, y; (2) v, x and u, y; and (3) u, v and x, y (so thatno change is made). Choose each of these three options with probability equalto 1

3, and write the chosen edges as e1, e2. Accept the switch when the resulting

graph with edges e1, ee∪(E(G)\u, v, x, y

is simple, and reject the switch

otherwise (so that the graph remains unchanged under the dynamics).

It is not very hard to see that the resulting Markov chain is aperiodic andirreducible. Further, the switch chain is doubly stochastic, since it is reversible. Asa result, its stationary distribution is the uniform random graph with prescribeddegree sequence d which we denoted by UGn(d).

The above method works rather generally, and will, in the limit of infinitelymany switches, produce a sample from UGn(d). As a result, this chain is themethod of choice to produce a sample of UGn(d) when the probability of sim-plicity of the configuration model vanishes. However, it is unclear how often oneneeds to switch in oder for the Markov chain to be sufficiently close to the uni-form (and thus stationary) distribution. See Section 1.6 for a discussion of thehistory of the problem, as well as the available results about its convergence tothe stationary distribution.


Switching methods for random graphs with prescribed degrees

Switching algorithms can also be used to prove properties about uniform ran-dom graphs with prescribed degrees. Here, we explain how switching can be useto estimate the connection probability between vertices of specific degrees in auniform random graph. Recall that `n =

∑i∈[n] di. We write u ∼ v for the

event that vertex u is connected to v. In UGn(d), there can be at most one edgebetween two vertices. Then, the edge probabilities for UGn(d) are given in thefollowing theorem:

Theorem 1.8 (Edge probabilities for uniform random graphs with prescribeddegrees) Assume that the empirical distribution Fn of d satisfies

[1− Fn](x) ≤ CFx−(τ−1), (1.3.43)

for some CF > 0 and τ ∈ (2, 3). Assume further that `n/n 9 0. Let U denote aset of unordered pairs of vertices and let EU denote the event that x, y is anedge for every x, y ∈ U . Then, assuming that |U | = O(1), for every u, v /∈ U ,

P(u ∼ v | EU) = (1 + o(1))(du − |Uu|)(dv − |Uv|)

`n + (du − |Uu|)(dv − |Uv|), (1.3.44)

where Ux denote the set of pairs in U that contain x.

Remark 1.9 (Relation to ECMn(d) and GRGn(w)) Theorem 1.8 shows thatwhen dudv `n, then

1− P(u ∼ v) = (1 + o(1))`ndudv

. (1.3.45)

In the erased configuration model, on the other hand,

1− P(u ∼ v) ≤ e−dudv/`n . (1.3.46)

Thus, the probability that two high-degree vertices are not connected is muchsmaller for ECMn(d) than for UGn(d). Instead, P(u ∼ v) ≈ dudv

`n+dudv, as it would

be for GRGn(w) when w = d, which indicated once more that GRGn(w) andUGn(d) are closely related.

We now proceed to prove Theorem 1.8. We first prove a useful lemma aboutthe number of two-paths starting from a specified vertex.

Lemma 1.10 (The number of two-paths) Assume that d satisfies (1.3.43) forsome C > 0 and τ ∈ (2, 3). For any graph G whose degree sequence is d, thenumber of two-paths starting from any specified vertex is o(n).

Proof Without loss of generality we may assume that d1 ≥ d2 ≥ · · · ≥ dn. Forevery 1 ≤ i ≤ n, the number of vertices with degree at least di is at least i. By(1.3.43), we then have

CFnd1−τi ≥ n[1− Fn](di) ≥ i (1.3.47)


y

vu

x

a b

y

vu

x

a b

Figure 1.6 Forward and backward switchings

for every i ∈ [n]. Thus, di ≤ (CFn/i)1/(τ−1)

. Then the number of two-paths from

any vertex is bounded by∑d1

i=1 di, which is at most

d1∑i≥1

(CFn

i

)1/(τ−1)

= (CFn)1/(τ−1)

d1∑i=1

i−1/(τ−1) (1.3.48)

= O(n1/(τ−1)

)dτ−2τ−1

1 = O(n(2τ−3)/(τ−1)2

),

since d1 ≤ (CFn)1/(τ−1). Since τ ∈ (2, 3), the above is o(n).

Proof of Theorem 1.8. To estimate P(u ∼ v | EU), we will switch between twoclasses of graphs S and S. S consists of graphs where all edges in u, v ∪ U arepresent, whereas S consists of all graphs where every x, y ∈ U is present, butu, v is not. Note that

P(u ∼ v | EU) =|S|

|S|+ |S| =1

1 + |S|/|S| . (1.3.49)

In order to estimate the ratio |S|/|S|, we will define an operation called a forwardswitching which converts a graph in G ∈ S to a graph G′ ∈ S. The reverseoperation converting G′ to G is called a backward switching. Then we estimate|S|/|S| by counting the number of forward switchings that can be applied to agraph G ∈ S, and the number of backward switchings that can be applied to agraph G′ ∈ S. Since we wish to have control on whether u, v is present or not,we change the switching considerably compared to the switching discussed above.

The forward switching is defined by choosing two edges and specifying theirends as x, a and y, b. The choice must satisfy the following constraints:

(1) None of u, x, v, y, or a, b is an edge;(2) x, a, y, b 6∈ U ;(3) All of u, v, x, y, a, and b must be distinct except that x = y is permitted.

Given a valid choice, the forward switching replaces the three edges u, v, x, a,and y, b by u, x, v, y, and a, b, while ensuring that the graph after switch-ing is simple. Note that the forward switching preserves the degree sequence, andconverts a graph in S to a graph in S. The inverse operation of a forward switchingis called a backward switching. See Figure 1.6 for an illustration.


Next, we estimate the number of ways to perform a forward switching to agraph G in S, denoted by f(G), and the number of ways to perform a backwardswitching to a graph G′ in S, denoted by b(G). The number of total switchingsbetween S and S is equal to |S|E[f(G)] = |S|E[b(G′)], where the expectation isover a uniformly random G ∈ S and G′ ∈ S respectively. Consequently,

|S||S| =

E[f(G)]

E[b(G′)]. (1.3.50)

Given an arbitrary graph G ∈ S, the number of ways of carrying out a forwardswitching is at most `2n, since there are at most `n ways to choose x, a, andat most `n ways to choose y, b. Note that choosing x, a for the first edgeand y, b for the second edge results in a different switching than vice versa. Tofind a lower bound on the number of ways of performing a forward switching, wesubtract from L2

n an upper bound on the number of invalid choices for x, a andy, b. These can be summarized as follows:

(a) At least one of u, x, a, b, v, y is an edge,

(b) At least one of x, a or y, b is in U ,

(c) Any vertex overlap other than x = y (i.e. if one of a or b is equal to one of xor y, or if a = b, or if one of u or v are one of a, b, x, y).To find an upper bound for (a), note that any choice in case (a) must involve

a single edge, and a two-path starting from a specified vertex. By Lemma 1.10,the number of choices for (a) then is upper bounded by 3 · o(`n) · `n = o(`2n). Thenumber of choices for case (b) is O(`n) as |U | = O(1), and there are at most Lnways to choose the other edge which is not restricted to be in U . To bound thenumber of choices for (c), we investigate each case:

(C1) a or b is equal to x or y; or a = b. In this case, x, y, a, b forms a two-path. Thus,there are at most 5 · n · o(`n) = o(`2n) choices (noting that n = O(`n)), wheren is the number of ways to choose a vertex, and o(`n) bounds the number oftwo-paths starting from this specified vertex;

(C2) one of u and v is one of a, b, x, y. In this case, there is one two-path startingfrom u or v, and a single edge. Thus, there are at most 8 · `ndmax = o(`2n)choices, where dmax bounds the number of ways to choose a vertex adjacent tou or v and `n bounds the number of ways to choose a single edge.

Thus, the number of invalid choices for x, a and y, b is o(`2n), so that thenumber of forward switchings which can be applied to any G ∈ S is (1 + o(1))`2n.Thus,

E[f(G)] = `2n(1 + o(1)). (1.3.51)

Given a graph G′ ∈ S, consider the backward switchings that can be applied toG′. There are at most `n(du−|Uu|)(dv−|Uv|) ways to do the backward switching,since we are choosing an edge which is adjacent to u but not in U , an edge whichis adjacent to v but not in U , and another “oriented” edge a, b (oriented in the


sense that each edge has two ways to specify its end vertices as a and b). For alower bound, we consider the following forbidden choices:

(a′) at least one of x, a or y, b is an edge,(b′) a, b ∈ U ,(c′) any vertices overlap other than x = y (i.e., when a, b ∩ u, v, x, y 6= ∅).

For (a′), suppose that x, a is present, giving the two-path x, a, a, b in G′.There are at most (du−|Uu|)(dv−|Uv|) ways to choose x and y. Given any choicefor x and y, there are at most o(`n) ways to choose a two-path starting from x,and hence o(`n) ways to choose a, b. Thus, the total number of choices is at mosto((du − |Uu|)(dv − |Uv|)`n). The case that y, b is an edge is symmetric.

For (b′), there are O(1) choices for choosing a, b since |U | = O(1), and atmost (du− |Uu|)(dv − |Uv|) choices x and y. Thus, the number of choices for case(b′) is O((du − |Uu|)(dv − |Uv|)) = o((du − |Uu|)(dv − |Uv|)`n).

For (c′), the case that a or b is equal to x or y corresponds to a two-path startingfrom u or v together with a single edge from u or v. Since o(`n) bounds the numberof two-paths starting from u or v and du− |Uu|+ dv− |Uv| bounds the number ofways to choose the single edge, there are o(`n(dv− |Uv|)) + o(Ln(du− |Uu|)) totalchoices. If a or b is equal to u or v, there are (du−|Uu|)(dv−|Uv|) ways to choosex and y, and at most du + dv ways to choose the last vertex as a neighbor of uor v. Thus, there are O((du − |Uu|)(dv − |Uv|)dmax) = o((du − |Uu|)(dv − |Uv|)`n)total choices, since dmax = O(n1/(τ−1)) = o(n) = o(Ln). This concludes that thenumber of backward switchings that can be applied to any graph G′ ∈ S′ is(du − |Uu|)(dv − |Uv|)`n(1 + o(1)), so that also

E[b(G′)] = (du − |Uu|)(dv − |Uv|)`n(1 + o(1)). (1.3.52)

Combining (1.3.50), (1.3.51) and (1.3.52) results in

|S|/|S| = (1 + o(1))`2n

(du − |Uu|)(dv − |Uv|)`n, (1.3.53)

and thus (1.3.49) yields

P(u ∼ v | EU) =1

1 + |S|/|S| = (1 + o(1))(du − |Uu|)(dv − |Uv|)

`n + (du − |Uu|)(dv − |Uv|). (1.3.54)

1.3.5 Preferential attachment models

Most networks grow in time. Preferential attachment models describe growingnetworks, where the numbers of edges and vertices grow linearly with time. Pref-erential attachment models were first introduced by Barabasi and Albert (1999),whose model we will generalize. Bollobas, Riordan, Spencer and Tusnady (2001)studied the model by Barabasi and Albert (1999), and later many other papersfollowed on this, and related, models. See [Volume 1, Chapter 8] for details. Herewe give a brief introduction.


The model that we investigate produces a graph sequence that we denote by(PA(m,δ)

t )t≥1 and which, for every time t, yields a graph of t vertices and mt edgesfor some m = 1, 2, . . . We start by defining the model for m = 1 when the graphconsists of a collection of trees. In this case, PA(1,δ)

1 consists of a single vertex witha single self-loop. We denote the vertices of PA(1,δ)

t by v(1)

1 , . . . , v(1)

t . We denote thedegree of vertex v(1)

i in PA(1,δ)

t by Di(t), where, by convention, a self-loop increasesthe degree by 2.

We next describe the evolution of the graph. Conditionally on PA(1,δ)

t , thegrowth rule to obtain PA(1,δ)

t+1 is as follows. We add a single vertex v(1)

t+1 havinga single edge. This edge is connected to a second end point, which is equal tov(1)

t+1 with probability (1 + δ)/(t(2 + δ) + (1 + δ)), and to vertex v(1)

i ∈ PA(1,δ)

t withprobability (Di(t) + δ)/(t(2 + δ) + (1 + δ)) for each i ∈ [t], where δ ≥ −1 is aparameter of the model. Thus,

P(v(1)

t+1 → v(1)

i | PA(1,δ)

t

)=

1 + δ

t(2 + δ) + (1 + δ)for i = t+ 1,

Di(t) + δ

t(2 + δ) + (1 + δ)for i ∈ [t].

(1.3.55)

The above preferential attachment mechanism is called affine, since the attach-ment probabilities in (1.3.55) depend in an affine way on the degrees of therandom graph PA(1,δ)

t .

The model with m > 1 is defined in terms of the model for m = 1 as follows.Fix δ ≥ −m. We start with PA(1,δ/m)

mt , and denote the vertices in PA(1,δ/m)

mt byv(1)

1 , . . . , v(1)

mt. Then we identify or collapse the m vertices v(1)

1 , . . . , v(1)

m in PA(1,δ/m)

mt

to become vertex v(m)

1 in PA(m,δ)

t . In doing so, we let all the edges that are incidentto any of the vertices in v(1)

1 , . . . , v(1)

m be incident to the new vertex v(m)

1 in PA(m,δ)

t .Then, we collapse the m vertices v(1)

m+1, . . . , v(1)

2m in PA(1,δ/m)

mt to become vertex v(m)

2

in PA(m,δ)

t , etc. More generally, we collapse the m vertices v(1)

(j−1)m+1, . . . , v(1)

jm in

PA(1,δ/m)

mt to become vertex v(m)

j in PA(m,δ)

t . This defines the model for generalm ≥ 1. The resulting graph PA(m,δ)

t is a multigraph with precisely t vertices andmt edges, so that the total degree is equal to 2mt. The original model by Barabasiand Albert (1999) focused on the case δ = 0 only, which is sometimes called theproportional model. The inclusion of the extra parameter δ > −1 is relevantthough, as we will see later.

The preferential attachment model (PA(m,δ)

t )t≥1 is increasing in time, in thesense that vertices and edges, once they have appeared, remain there forever.Thus, the degrees are monotonically increasing in time. Moreover, vertices witha high degree have a higher chance of attracting further edges of later vertices.Therefore, the model is sometimes called the rich-get-richer model. It is not hardto see that Di(t)

a.s.−→ ∞ (see Exercise 1.10). As a result, one could also callthe preferential attachment model the old-get-richer model, which may be moreappropriate.

Let us continue to discuss the degree structure in (PA(m,δ)

t )t≥1.


Degrees of fixed vertices

We start by investigating the degrees of fixed vertices as t → ∞, i.e., we studyDi(t) for fixed i as t → ∞. To formulate our results, we define the Gamma-function t 7→ Γ(t) for t > 0 by

Γ(t) =

∫ ∞0

xt−1e−xdx. (1.3.56)

The following theorem describes the evolution of the degree of fixed vertices (see[Volume 1, Theorem 8.2 and (8.3.11)]):

Theorem 1.11 (Degrees of fixed vertices) Fix m ≥ 1 and δ > −m. Then,Di(t)/t

1/(2+δ/m) converges almost surely to a random variable ξi as t→∞.

It turns out that also t−1/(2+δ/m) maxi∈[t]Di(t)a.s.−→M for some limiting positive

and finite random variable M (see [Volume 1, Section 8.7]). In analogy to i.i.d.random variables, the fact that t−1/(2+δ/m) maxi∈[t]Di(t)

a.s.−→ M suggests thatthe degree of a random vertex satisfies a power law with power-law exponentτ = 3 + τ/m, and that is our next item on the agenda.

The degree sequence of the preferential attachment model

The main result in this section establishes the scale-free nature of preferentialattachment graphs. In order to state it, we need some notation. We write

Pk(t) =1

t

t∑i=1

1Di(t)=k (1.3.57)

for the (random) proportion of vertices with degree k at time t. For m ≥ 1 andδ > −m, we define (pk)k≥0 by pk = 0 for k = 0, . . . ,m− 1 and, for k ≥ m,

pk = (2 + δ/m)Γ(k + δ)Γ(m+ 2 + δ + δ/m)

Γ(m+ δ)Γ(k + 3 + δ + δ/m)(1.3.58)

It turns out that (pk)k≥0 is a probability mass function (see [Volume 1, Section8.3]). The probability mass function (pk)k≥0 arises as the limiting degree distri-bution for PA(m,δ)

t , as shown in the following theorem:

Theorem 1.12 (Degree sequence in preferential attachment model) Fix m ≥ 1and δ > −m. There exists a constant C = C(m, δ) > 0 such that, as t→∞,

P(

maxk|Pk(t)− pk| ≥ C

√log t

t

)= o(1). (1.3.59)

We next investigate the scale-free properties of (pk)k≥0 by investigating theasymptotics of pk for k large. By (1.3.58) and Stirling’s formula, as k →∞,

pk = cm,δk−τ (1 +O(1/k)), (1.3.60)


where

τ = 3 + δ/m > 2, and cm,δ = (2 + δ/m)Γ(m+ 2 + δ + δ/m)

Γ(m+ δ). (1.3.61)

Therefore, by Theorem 1.12 and (1.3.60), the asymptotic degree sequence ofPA(m,δ)

t is close to a power law with exponent τ = 3 + δ/m. We note that anyexponent τ > 2 is possible by choosing δ > −m and m ≥ 1 appropriately.

Extensions to the preferential attachment rule

In this book, we also sometimes investigate the related (PA(m,δ)

t (b))t≥1 model, inwhich the self-loops for m = 1 in (1.3.55) are not allowed, so that

P(v(1)

t+1 → v(1)

i | PA(1,δ)

t (b))

=Di(t) + δ

t(2 + δ)for i ∈ [t]. (1.3.62)

The model for m ≥ 2 is again defined in terms of the model (PA(1,δ)

t (b))t≥1 for m =1 by collapsing blocks of m vertices. The advantage of (PA(m,δ)

t (b))t≥1 compared to(PA(m,δ)

t )t≥1 is that (PA(m,δ)

t (b))t≥1 is naturally connected, while (PA(m,δ)

t )t≥1 maynot be.

Another adaptation of the preferential attachment rule is when no self-loopsare ever allowed while the degrees are updated when the m edges incident to thenew vertex is being attached. We denote this model by (PA(m,δ)

t (c))t≥1. In thiscase, the model for m = 1 is the same as (PA(1,δ)

t (b))t≥1, while for m ≥ 2 andj ∈ 0, . . . ,m− 1, we attach the (j + 1)st edge of vertex v(m)

t+1 to vertex v(m)

i fori ∈ [t] with probability

P(v(m)

t+1,j+1 → v(m)

i | PA(m,δ)

t,j (c))

=Di(t, j) + δ

t(2m+ δ)for i ∈ [t]. (1.3.63)

Here, Di(t, j) is the degree of vertex v(m)

i after the connection of the edges incidentto the first t vertices, as well as the first j edges incident to vertex v(m)

t+1. Manyother adaptations are possible, and have been investigated in the literature, suchas settings where the m edges incident to v(m)

t+1 are independently connected as in(1.3.63) when j = 0, but we refrain from discussing these. It is not hard to verifythat Theorem 1.12 remains to hold for all these adaptations, which explains whyauthors have often opted for the version of the model that is most convenientfor them. On the other hand, in Theorem 1.11 there will be minor adaptations,particularly since the limiting random variables (ξi)i≥1 do depend on the precisemodel.

1.3.6 A Bernoulli preferential attachment model

In this section, we discuss a model that is quite a bit different from the otherpreferential attachment models discussed above. The main difference is that inthis model, the number of edges is not fixed, but instead there is much moreindependence in the edge attachments. A preferential attachment models with


conditionally independent edges is investigated by Dereich and Morters (2009,2011, 2013). We call this model the Bernoulli preferential attachment model, asthe attachment indicators are all conditionally independent Bernoulli variables.Let us now give the details.

Fix a preferential attachment function f : N0 7→ (0,∞). Then, the graph evolvesas follows. Start with BPA(f)

1 being a graph containing one vertex v1 and noedges. At each time t ≥ 2, we add a vertex vt. Conditionally on G(t − 1), andindependently for every i ∈ [t− 1], we connect this vertex to i by a directed edgewith probability

f(Di(t− 1))

t− 1, (1.3.64)

where Di(t − 1) is the in-degree of vertex i at time t − 1. This creates the ran-dom graph BPA(f)

t . Note that the number of edges in the random graph process(BPA(f)

t )t≥1 is not fixed, and equal a random variable. In particular, it makes adifference whether we use the in-degree in (1.3.64).

We consider functions f : N 7→ (0,∞) that satisfy that f(k + 1)− f(k) < 1 forevery k ≥ 0. Under this assumption and when f(0) ≤ 1, Morters and Dereichshow that the empirical degree sequence converges as t→∞, i.e.,

Pk(t) ≡1

t

∑i∈[t]

1Di(t)=kP−→ pk, where pk =

1

1 + f(k)

k−1∏l=0

f(l)

1 + f(l).

(1.3.65)In particular, log(1/pk)/ log(k)→ 1+1/γ when f(k)/k → γ ∈ (0, 1) (see Exercise1.12). Remarkably, when f(k) = γk + β, the power-law exponent of the degreedistribution does not depend on β. The restriction that f(k + 1) − f(k) < 1 isneeded to prevent the degrees from exploding. Further, log(1/pk) ∼ k1−α/(γ(1−α)) when f(k) ∼ γkα for some α ∈ (0, 1) (see Exercise 1.13). Interestingly,Morters and Dereich also show that when

∑k≥1 1/f(k)2 < ∞, then there exists

a persistent hub, i.e., a vertex that has maximal degree for all but finitely manytimes. When

∑k≥1 1/f(k)2 =∞, this does not happen.

1.3.7 Universality of random graphs

There are tons of other graph topologies where one can expect similar resultsas in the random graphs discussed above. We will discuss a few related models inChapter 9 below, where we include several aspects that are relevant in practice,such as directed graphs, adding community structure to random graphs, and ge-ometry. The random graph models that we investigate are inhomogeneous, andone can expect that the results depend sensitively on the amount of inhomogene-ity present. This is reflected in the results that we will prove, where the preciseasymptotics is different when the vertices have heavy-tailed degrees rather thanlight-tailed degrees. However, interestingly, what is ‘heavy tailed’ and what is‘light tailed’ depends on the precise model at hand. Often, as we will see, thedistinction depends on how many moments the degree distribution has.


We have proposed many random graph models for real-world networks. Sincethese models are aiming at describing similar real-world networks, one wouldhope that they also give similar answers. Indeed, for a real-world network withpower-law degree sequences, we could model its static structure by the configu-ration model with the same degree sequence, and its dynamical properties by thepreferential attachment model with similar scale-free degrees. How to interpretthe modeling when these attempts give completely different predictions?

Universality is the phrase physicists use when different models display similarbehavior. Models that show similar behavior are then in the same universalityclass. Enormous effort is going into decided whether various random graph mod-els are in the same universality class, or rather in different ones, and why. We willsee that for some features the degree distribution decides the universality classfor a wide range of models, as one might possibly hope. This also explains whythe degree distribution plays such a dominant role in the investigation of randomgraphs. Let us discuss some related graphs that have attracted substantial at-tention in the literature as models for networks. In Chapter 9, we discuss severalrelated random graph models that have attracted the attention in the literature.

1.4 Power-laws and their properties

In this text, we frequently deal with random variables having an (asymptotic)power-law distribution. For such random variables, we often need to investigatetruncated moments. We study two of such truncated moment bounds here. Westart with the tail of the mean:

Lemma 1.13 (Truncated moments) Let X be a non-negative random variablewhose distribution function satisfies that for every x ≥ 1,

1− FX(x) ≤ CXx−(τ−1). (1.4.1)

Then, there exists a constant C = CX(a) such that, for a < τ − 1 and all ` ≥ 1,

E[Xa1X>`] ≤ C`a−(τ−1), (1.4.2)

while, for for a > τ − 1 and all ` ≥ 1,

E[Xa1X≤`] ≤ C`a−(τ−1). (1.4.3)

Proof We note that for any cumulative distribution function x 7→ F (x) on the

1.5 Notation 33

non-negative reals, we have the partial integration identity∫ ∞u

f(x)F (dx) = f(u)[1− F (u)] +

∫ ∞u

[f(x)− f(u)]F (dx) (1.4.4)

= f(u)[1− F (u)] +

∫ ∞u

∫ x

0

f ′(y)dyF (dx)

= f(u)[1− F (u)] +

∫ ∞u

f ′(y)

∫ ∞y

F (dx)dy

= f(u)[1− F (u)] +

∫ ∞u

f ′(y)[1− F (y)]dy.

provided that either y 7→ f ′(y)[1− F (y)] is absolutely integrable, or x 7→ f(x) iseither non-decreasing or non-increasing. Here, the interchange of the summationorder is allowed by Fubini’s Theorem for non-negative functions (see (Halmos,1950, Section 3.6, Theorem B)) when x 7→ f(x) is non-decreasing, and by Fubini’sTheorem (Halmos, 1950, Section 3.6, Theorem C) when y 7→ f ′(y)[1 − F (y)] isabsolutely integrable.

When D ≥ 0, using (1.4.1) and (1.4.4), for a < τ − 1 and ` > 0,

E[Xa1X>`

]= xaP(X > `) +

∫ ∞`

axa−1P(X > x)dx (1.4.5)

≤ CXxa−(τ−1) + aCX

∫ ∞`

xa−1x−(τ−1) ≤ Ca,τ`a−(τ−1).

as required.

An important notion in many graphs is the size-biased version X? of a non-negative random variable X that is given by

P(X? ≤ x) =E[X1X≤x]

E[X]. (1.4.6)

Let F ?X

denote the distribution function of X?. The following lemma gives boundson the tail of the distribution function F ?

X:

Lemma 1.14 (Size-biased tail distribution) Let X be a non-negative randomvariable whose distribution function satisfies that for every x ≥ 1,

1− FX(x) ≤ CXx−(τ−1). (1.4.7)

Assume that τ > 2, so that E[X] < ∞. Further, assume that E[X] > 0. Then,there exists a constant C such that

1− F ?X

(x) ≤ C?Xx−(τ−2). (1.4.8)

Proof This follows immediately from (1.4.6), by using (1.4.2) with a = 1.

1.5 Notation

Let us introduce some standard notation used throughout this book.


Random variables

We use special notation for certain random variables. We write X ∼ Ber(p)when X has a Bernoulli distribution with success probability p, i.e., P(X = 1) =1 − P(X = 0) = p. We write X ∼ Bin(n, p) when the random variable X has abinomial distribution with parameters n and p, and we write X ∼ Poi(λ) whenX has a Poisson distribution with parameter λ. We write X ∼ Exp(λ) when Xhas an exponential distribution with mean 1/λ. We write X ∼ Gam(λ, r) when Xhas a Gamma distribution with density fX(x) = λrxr−1e−λr/Γ(r), where r, λ > 0.The random variable Gam(λ, r) has mean r/λ and variance r/λ2. Finally, we writeX ∼ Beta(a, b) when X has a Beta distribution with parameters a, b > 0, so that

X has density fX(x) = xa−1(1− x)b−1 Γ(a+b)

Γ(a)Γ(b).

We sometimes abuse notation, and write e.g., P(Bin(n, p) = k) to denoteP(X = k) when X ∼ Bin(n, p). We call a sequence of random variables (Xi)i≥1

independent and identically distributed (i.i.d.) when they are independent, andXi has the same distribution as X1 for every i ≥ 1.

Convergence of random variables

We say that a sequence of events (En)n≥1 occurs with high probability (whp)when limn→∞ P(En) = 1. We further write f(n) = O(g(n)) if |f(n)|/|g(n)| isuniformly bounded from above by a positive constant as n→∞, f(n) = Θ(g(n))if f(n) = O(g(n)) and g(n) = O(f(n)), f(n) = Ω(g(n)) if 1/f(n) = O(1/g(n))and f(n) = o(g(n)) if f(n)/g(n) tends to 0 as n→∞. We say that f(n) g(n)when g(n) = o(f(n)).

For sequences of random variables (Xn)n≥1, we let Xnd−→ X denote that Xn

converges in distribution to X, while XnP−→ X denotes that Xn converges in

probability to X and Xna.s.−→ X denotes that Xn converges almost surely to X.

We write that Xn = OP(Yn) when |Xn|/Yn is a tight sequence of random variables

and Xn = OP(Yn) when |Xn|/Yn P−→ 0. Finally, we write that Xn = oP(Yn) when

Xn/YnP−→ 0.

Trees

In this book, we will often deal with trees, and then it is important to be clearabout what we mean exactly with a tree. Trees are rooted and ordered. It willbe convenient to think of a tree t with root ∅ as being labelled in the Ulam-Harris way, so that a vertex v in generation k has a label ∅v1 · · · vk, where ai ∈N. Naturally, there are some restrictions in that if ∅v1 · · · vk ∈ V (t), then also∅v1 · · · vk−1 ∈ V (t) and ∅v1 · · · (vk − 1) ∈ t when vk ≥ 2. We refer to [Volume 1,Chapter 3] for details.

It will sometimes also be useful to explore trees in a breadth-first exploration.This corresponds to the lexicographical ordering in the Ulam-Harris encodingof the tree. Ulam-Harris trees are also sometimes called plane trees (see e.g.,(Drmota, 2009, Chapter 1)).

Let us now make the breadth-first ordering of the tree precise. For v ∈ V (t), let

1.6 Notes and discussion 35

|v| be its height. Thus |v| = k when v = ∅v1 · · · vk and |∅| = 0. Let u, v ∈ V (t).Then u < v when either |u| < |v|, or when |u| = |v| and u = ∅u1 · · ·uk andv = ∅v1 · · · vk are such that (u1, . . . , uk) < (v1, . . . , vk) in the lexicographic sense.

We next explain the breadth-first exploration of t. For a tree t of size |V (t)| = t,we et (ak)

tk=0 be the elements of V (t) ordered according to the breadth-first

ordering of t. For i ≥ 1, let xi denote the number of children of vertex ai. Thus,with dv denoting the degree of v ∈ V (t) in the tree t, we have x1 = da0

= d∅ andxi = dai − 1 for i ≥ 2. The recursion

si = si−1 + xi − 1 for i ≥ 1, with s0 = 1, (1.5.1)

describes the evolution of the number of unexplored vertices in the breadth-firstexploration. For a finite tree t of size |V (t)| = t, we thus have that si > 0 fori ∈ 0, . . . , t− 1, and st = 0. The sequence (xi)

ti=1 gives an alternative encoding

of the tree t that will often be convenient.

1.6 Notes and discussion

Notes on Sections 9.1.3–1.3

These sections are summaries of chapters in Volume 1, to which we refer for notesand discussion. The exception is Section 1.3.4 on switching algorithms for uniformrandom graphs with prescribed degrees. Switching algorithms have a long history,dating back at least to McKay (1981), see also Gao and Wormald (2016); McKayand Wormald (1990), as well as McKay (2011) and the references therein for anoverview. The material in Section 1.3.4 is an adaptation of some of the materialin Gao et al. (2018), where Theorem 1.8 was used to compute the number oftriangles in uniform random graphs with power-law degree distributions of infinitevariance.

Notes on Section 1.4

1.7 Exercises for Chapter 1

Exercise 1.1 (Probability mass function typical degree) Prove (1.1.3).

Exercise 1.2 (Uniform random graph) Consider ERn(p) with p = 1/2. Showthat the result is a uniform graph, i.e., it has the same distribution as a uniformchoice from all the graphs on n vertices.

Exercise 1.3 (Thin tails Poisson) Show that, for every α > 0, and with pk =

e−λ λk

k!the Poisson probability mass function,

limk→∞

eαkpk = 0. (1.7.1)

Exercise 1.4 (Weight of uniformly chosen vertex) Let o be a vertex chosenuniformly at random from [n]. Show that the weight wo of o has distributionfunction Fn.


Exercise 1.5 (Maximal weight bound) Assume that Conditions 1.1(a)-(b) hold.Show that maxi∈[n]wi = o(n). Further, show that maxi∈[n]wi = o(

√n) when

Conditions 1.1(a)-(c) hold.

Exercise 1.6 (Domination weights) Let Wn have distribution function Fn from(1.3.17). Show that Wn is stochastically dominated by the random variable Whaving distribution function F .

Exercise 1.7 (Degree of uniformly chosen vertex in GRGn(w)) Prove that,under the conditions of Theorem 1.3, (1.3.22) holds.

Exercise 1.8 (Power-law degrees in generalized random graphs) Prove that,under the conditions of Theorem 1.3, (1.3.24) follows from (1.3.23). Does theconverse also hold?

Exercise 1.9 (Number of erased edges) Assume that Conditions 1.5(a)-(b)hold. Show that Theorem 1.6 implies that the number of erased edges is oP(n).

Exercise 1.10 (Degrees grow to infinity a.s.) Fix m = 1 and i ≥ 1. Provethat Di(t)

a.s.−→ ∞, by using that∑t

s=i Is Di(t), where (It)t≥i is a sequence ofindependent Bernoulli random variables with P(It = 1) = (1+δ)/(t(2+δ)+1+δ).What does this imply for m > 1?

Exercise 1.11 (Degrees of fixed vertices) Prove Theorem 1.11 for m = 1 andδ > −1 using the martingale convergence theorem and the fact that

Mi(t) =Di(t) + δ

1 + δ

t−1∏s=i−1

(2 + δ)s+ 1 + δ

(2 + δ)(s+ 1)(1.7.2)

is a martingale.

Exercise 1.12 (Degrees distribution of affine Bernoulli PAM) Show that pk ∼cγ,βk

−(1+1/γ) when f(k) = γk + β. What is cγ,β?

Exercise 1.13 (Degrees distribution of sublinear Bernoulli PAM) Show thatlog(1/pk) ∼ k1−α/(γ(1− α)) when f(k) ∼ γkα for some α ∈ (0, 1).

Exercise 1.14 (Power-law degree sequence) Prove (1.3.61) by using Stirling’sformula.

Chapter 2

Local convergenceof random graphs

Abstract

In this chapter, we discuss local weak convergence, as introduced byBenjamini and Schramm (2001) and independently by Aldous andSteele (2004). Local weak convergence described the intuitive notionthat a finite graph, seen from the perspective of a typical vertex, lookslike a certain limiting graph. This is already useful to make the no-tion that a finite cube in Zd with large side length is locally muchalike Zd itself precise. However, it plays an even more profound role inrandom graph theory. For example, local convergence to some limitingtree, which often occurs in random graphs as we will see throughoutthis book, is referred to as locally tree-like behavior. In this chapter, wediscuss local weak convergence in general.We give general definitions of local convergence in several probabilisticsenses, and show that local convergence is equivalent to the conver-gence of subgraph counts. Then we discuss several implications of localconvergence, concerning local neighborhoods, clustering, assortativityand PageRank. We further investigate the relation between local con-vergence and the size of the giant, making the statement that the giantis almost local precise.

Organisation of this chapter

This chapter is organised as follows. In Section 2.1, we start by discussing the met-ric space of rooted graphs, that will play a crucial role in local weak convergence.In Section 2.2, we give the formal definition of local weak convergence. There,we think of our graphs as being deterministic and we discuss their convergenceto some (surprisingly possibly random) limit. Here, the randomness originatesfrom the fact that we consider the graph rooted at a random vertex, and thisrandomness may persist in the limit. In Section 2.3, we then extend the notionof local convergence to random graphs, for which there are several notions ofconvergence alike for convergence of real-valued random variables, such as localweak convergence and local convergence in probability. In Section 2.4, we dis-cuss consequences of local convergence to local functionals, such as clusteringand assortativity. For the latter, we extend the convergence of neighborhoods ofa uniformly chosen vertex to those of a uniformly chosen edge. In Section 2.5, wediscuss the consequences of local convergence on the giant component. While theproportion of vertices in the giant is not a continuous functional, one could arguethat it is ‘almost local’, in a way that can be made precise. We close this chapterwith notes and discussion in Section 2.6, and with exercises in Section 2.7.

37

38 Local convergence of random graphs

2.1 The metric space of rooted graphs

Local weak convergence is a notion of weak convergence for finite graphs. Ingeneral, weak convergence is equivalent to convergence of expectations of con-tinuous functions. For continuity, one needs a topology. Therefore, we start bydiscussing the topology of rooted graphs that is at the center of local weak con-vergence. We start with some definitions:

Definition 2.1 (Locally finite and rooted graphs) A rooted graph is a pair(G, o), where G = (V (G), E(G)) is a graph with vertex set V (G) and edge setE(G), and o ∈ V (G) is a vertex. Further, a rooted or non-rooted graph is calledlocally finite when each of its vertices has finite degree (though not necessarilyuniformly bounded).

In Definition 2.1, graphs can have finitely or infinitely many vertices, but wewill always have graphs that are locally finite in mind. Also, in the definitionsbelow, the graphs are deterministic and we will clearly indicate when we moveto random graphs instead.

Definition 2.2 (Neighborhoods as rooted graphs) For a rooted graph (G, o),we let Br(o) = B(G)

r (o) denote the (rooted) subgraph of (G, o) of all verticesat graph distance at most r away from o. Formally, this means that Br(o) =((V (B(G)

r (o)), E(B(G)

r (o)), o), where

V (B(G)

r (o)) = u : dG(o, u) ≤ r, (2.1.1)

E(B(G)

r (o)) = u, v ∈ E(G) : dG(o, u), dG(o, v) ≤ r.

We continue by introducing the notion of isomorphism between graphs, whichbasically describes that graphs ‘look the same’. Here is the formal definition:

Definition 2.3 (Graph isomorphism) Two (finite or infinite) graphs G1 =(V (G1), E(G1)) and G2 = (V (G2), E(G2)) are called isomorphic, which we writeas G1 ' G2, when there exists a bijection φ : V (G1) 7→ V (G2) such that u, v ∈E(G1) precisely when φ(u), φ(v) ∈ E(G2).

Similarly, two rooted (finite or infinite) graphs (G1, o1) and (G2, o2) where Gi =(V (Gi), E(Gi)) for i ∈ 1, 2 are called isomorphic, which we write as (G1, o1) '(G2, o2), when there exists a bijection φ : V (G1) 7→ V (G2) such that φ(o1) = o2

and u, v ∈ E(G1) precisely when φ(u), φ(v) ∈ E(G2).

Exercises 2.1 and 2.2 below investigate the notion of graph isomorphism. Welet G? denote the space of rooted graphs modulo isomorphisms, as introducedby Aldous and Steele (2004). This means that G? contains the set of equivalenceclasses of rooted connected graphs. We often omit the equivalence classes, andwrite (G, o) ∈ G?, bearing in mind that all (G′, o′) such that (G′, o′) ' (G, o) areconsidered to be the same. Thus, formally we deal with equivalence classes ofrooted graphs. In the literature, the equivalence class containing (G, o) is some-times denoted as [G, o].

2.2 Local weak convergence of graphs 39

These notions allow us to turn the space of connected rooted graphs into ametric space:

Definition 2.4 (Metric on rooted graphs) Let (G1, o1) and (G2, o2) be tworooted connected graphs, and write B(Gi)

r (oi) for the neighborhood of vertex oi inGi. Let

R? = supr : B(G1)

r (o1) ' B(G2)

r (o2). (2.1.2)

Define

dG?

((G1, o1), (G2, o2)

)= 1/(R? + 1). (2.1.3)

The space G? of rooted graphs is a nice metric space under the metric dG?in

(2.1.3), in that it is separable and thus Polish. Here we recall that a metric spaceis called separable when there exists a countable dense subset of elements. We willsee later on that such a countable dense set can be created by looking at finiterooted graphs. Since graphs that agree up to distance r are at distance at most1/(r + 1) from each other (see Exercise 2.3), this is indeed a dense countablesubset. We discuss the metric structure of the space of rooted graphs in moredetail in Appendix A.2. See Exercises 2.4 and 2.5 below for two exercises thatstudy such aspects.

The value R? is the largest value of r for which B(G1)

r (o1) is isomorphic toB(G2)

r (o2). When R? = ∞, then B(G1)

r (o1) is isomorphic to B(G2)

r (o2) for everyr ≥ 1, and then the rooted graphs G1 and G2 are the same apart from anisomorphism, see Lemma A.9 in the appendix where this is worked out in detail.

2.2 Local weak convergence of graphs

In this section, we discuss local weak convergence of deterministic graphs. Thissection is organised as follows. In Section 2.2.1, we give the definitions of localweak convergence of (possibly disconnected) finite graphs. In Section 2.2.2, weprovide a convenient criterion to prove local weak convergence and discuss tight-ness. In Section 2.2.3, we show that when the limit has full support on somesubset of rooted graphs, that then convergence can be restricted to that set. Fi-nally, in Section 2.2.4, we discuss two examples of graphs that converge locallyweakly.

2.2.1 Definition of local weak convergence

Above, we have worked with connected graphs. When dealing with local weakconvergence, we often wish to apply this to disconnected graphs. For such ex-amples, we think of the corresponding rooted connected graph (Gn, on) as corre-sponding to the connected component C (on) of on in Gn. Here, we define, similarly


to (2.1.1), the rooted graph C (on) = ((V (C (on)), E(C (on)), on), where

V (C (on)) = u : dG(o, u) <∞, (2.2.1)

E(C (on)) = u, v ∈ E(G) : dG(o, u), dG(o, v) <∞.Then, for h : G? → R, by convention, we extend the definition to all (not-necessarilyconnected) graphs by letting

h(Gn, on) ≡ h(C (on)). (2.2.2)

Using this, we next define local weak convergence of finite graphs:

Definition 2.5 (Local weak convergence) Let Gn = (V (Gn), E(Gn)) denotea finite (possibly disconnected) graph. Let (Gn, on) be the rooted graph obtainedby letting on ∈ V (Gn) be chosen uniformly at random and restricting Gn to theconnected component of on in Gn. We say that (Gn, on) converges in the localweak sense to the connected rooted graph (G, o), which is a (possibly random)element of G? having law µ, when, for every bounded and continuous functionh : G? 7→ R,

E[h(Gn, on

)]→ Eµ

[h(G, o

)], (2.2.3)

where the expectation on the right-hand side of (2.2.3) is w.r.t. (G, o) having lawµ, while the expectation En is w.r.t. the random vertex on.1 We denote the above

convergence by (Gn, on)d−→ (G, o).

Of course, the values h(Gn, on) only give you information about C (on), whichmay only be a small portion of the graph with Gn is disconnected. However, sincewe are sampling on ∈ V (Gn) uniformly at random, we actually may ‘see’ everyconnected component, so in distribution we do observe the graph as a whole.

Since we will later apply local weak convergence ideas to random graphs, westrive to be absolutely clear about what we take the expectation with. Indeed,the expectation in (2.2.3) is equal to

E[h(Gn, on

)]=

1

|V (Gn)|∑

u∈V (Gn)

h(Gn, u). (2.2.4)

The notion of local weak convergence is hard to grasp, and it also may appearto be rather weak. In the sequel, we discuss examples of graphs that convergelocally weakly. Further, in Section 2.4 we discuss examples of how local weakconvergence may be used to obtain interesting consequences for graphs, such as itsclustering and its degree-degree dependences, measured through the assortativitycoefficient. The notion of local weak convergence will also play a central role inthis book. We continue by discussing a convenient criterion for proving local weakconvergence.

1 In the next section, our graphs may be random, and then we need to be very clear as to what we

take the expectation.


2.2.2 Criterion for local weak convergence and tightness

We next provide a convenient criterion for local weak convergence:

Theorem 2.6 (Criterion for local weak convergence) The sequence of finiterooted graphs ((Gn, on))n≥1 converges in the local weak sense to (G, o) having lawµ precisely when the sequence ((Gn, on))n≥1 is tight, and, for every rooted graphH? ∈ G? and all integers r ≥ 0,

p(Gn)(H?) =1

|V (Gn)|∑

u∈V (Gn)

1B(Gn)r (u)'H? → µ(B(G)

r (o) ' H?), (2.2.5)

where B(Gn)

r (u) is the rooted r-neighborhood of u in Gn, and B(G)

r (o) is the rootedr-neighborhood of o in the limiting graph (G, o).

Note that local weak convergence implies that (2.2.5) holds, since we can takeh(G, o

)= 1B(G)

r (o)'H?, and h : G? 7→ 0, 1 is continuous (see Exercise 2.6).

Proof This is a standard weak convergence argument. By tightness, every subse-quence of ((Gn, on))n≥1 has a further subsequence that converges in distribution.We will work along that subsequence, and note that the limiting law is that of(G, o), since the laws of B(G)

r (o) uniquely identify the law of (G, o) (see Proposi-tion A.13 in Appendix A.2.5). Since this is true for every subsequence, the localweak limit is (G, o).

Theorem 2.6 shows that the proportion of vertices in Gn whose neighborhoodlooks like H? converges to a (possibly random) limit. In the literature, it is claimedthat the tightness condition on (Gn)n≥1 is not necessary, this would follow fromthe convergence in (2.2.5). We hope to return to this in the near future.

See Exercise 2.8, where you are asked to construct an example where the localweak limit of a sequence of deterministic graphs actually is random. You are askedto prove local weak convergence for some examples in Exercises 2.9 and 2.10.

Tightness is discussed in more detail in Appendix A.2.6. We next derive a con-venient tightness criterion for local weak convergence. We recall that a sequence(Xn)n≥1 of random variables is called uniformly integrable when

limK→∞

lim supn→∞

E[|Xn|1|Xn|>K] = 0. (2.2.6)

We continue by giving a tightness criterium for local weak convergence:

Theorem 2.7 (Tightness criterion for local weak convergence) Let (Gn)n≥1 be asequence of graphs with V (Gn). Let d(n)

ondenote the degree of on in Gn, where on is

chosen uniformly at random from the vertex set V (Gn) of Gn. Then ((Gn, on))n≥1

is tight when (d(n)

on)n≥1 forms a uniformly integrable sequence of random variables.

Proof Let A be a family of finite graphs. For a graph G, let UG denote a ran-dom vertex drawn uniformly at random from V (G), and let U(G) = (G,UG)be the rooted graph obtained by rooting G at UG. We need to show that if

42 Local convergence of random graphsdegG(UG) : G ∈ A

is a uniformly integrable sequence of random variables, then

the family A is tight. Let

f(d) = supG∈A

E[degG(UG)1degG(UG)>d

]. (2.2.7)

By assumption, limd→∞ f(d) = 0. Writem(G) = E[degG(UG)]. Thus, 1 ≤ m(G) ≤

f(0) <∞. Write D(G) for the degree-biased probability measure on (G, v) : v ∈V (G), that is,

D(G)[(G, v)] =degG(v)

m(G)· U(G)[(G, v)], (2.2.8)

and DG for the corresponding root. Since U(G) ≤ m(G)D(G) ≤ f(0)D(G), itsuffices to show that D(G) : G ∈ A is tight. Note that degG(DG) : G ∈ A istight.

For r ∈ N, let FM

r (v) be the event such that there is some vertex at distanceat most r from v whose degree is larger than M . Let X be a uniform randomneighbor of DG. Because D(G) is a stationary measure for a simple random walk,FM

r (DG) and FM

r (X) have the same probability. Also,

P(FM

r+1(DG) | degG(DG))≤ degG(DG)P

(FM

r (X) | degG(DG)). (2.2.9)

We claim that, for all r ∈ N and ε > 0, there exists M <∞ such that

P(FM

r (X))≤ ε (2.2.10)

for all G ∈ A. This clearly implies that D(G) : G ∈ A is tight. We prove theclaim by induction on r.

The statement for r = 0 is trivial. Given that the property holds for r, let us nowshow it for r+ 1. Given ε > 0, choose d so large that P(degG(DG) > d) ≤ ε/2 forall G ∈ A. Also, choose M so large that P(FM

r (DG)) ≤ ε/(2d) for all G ∈ A. WriteF for the event such that degG(DG) > d. Then, by conditioning on degG(DG),we see that

P(FM

r+1(DG))≤ P(F ) + E

[1F cP

(FM

r+1(DG) | degG(DG))]

(2.2.11)

≤ ε/2 + E[1F cdegG(DG)P

(FM

r (DG) | degG(DG))]

≤ ε/2 + E[1F cdP

(FM

r (DG) | degG(DG))]

≤ ε/2 + dP(FM

r (DG))

≤ ε/2 + dε/(2d) = ε,

for all G ∈ A, which proves the claim.

The needed uniform integrability in Theorem 2.7 is quite suggestive. Indeed,in many random graph models, such as for example the configuration model, thedegree of a random neighbor of a vertex has the size-biased degree distribution.Previously, we have often written Dn = d(n)

onfor the degree of a uniform vertex


in our graph. When (d(n)

on)n≥1 forms a uniformly integrable sequence of random

variables, there exists a subsequence along which D?n, the size-biased version of

Dn = d(n)

on, converges in distribution (see Exercise 2.11 below).

For the configuration model, Conditions 1.5(a)-(b) imply that (d(n)

on)n≥1 is a

tight sequence of random variables (see Exercise 2.12). Further, [Volume 1, The-orem 7.25] discussed how Conditions 1.5(a)-(b) imply the convergence of thedegrees of neighbors of the uniform vertex on, a distribution that is given by D?

n.Of course, for local weak convergence to hold, one needs at least that the degreesof neighbors converge in distribution. Thus, at least for the configuration model,we can fully understand why the uniform integrability of (d(n)

on)n≥1 is needed.

In general, however, local weak convergence does not imply that (d(n)

on)n≥1 is

uniformly integrable (see Exercises 2.13–2.14).We continue by discussing settings where the limiting measure has full support

on a certain subset of the space of rooted graphs. Then it turns out that it sufficesto study convergence on that subset.

2.2.3 Local weak convergence and completeness of the limit

For many settings, the local weak limit is almost surely contained in a smallerset of rooted graphs. In this case, it turns out to be enough to prove the conver-gence in Definition 2.9 only for those elements that the limit (G, o) takes valuesin. Let us explain this in more detail.

Let T? ⊂ G? be a subset of the space of rooted graphs. Let T?(r) ⊆ T? be thesubset of T? of graphs for which the distance between any vertex and the root isat most r. Then, we have the following result:

Theorem 2.8 (Local weak convergence and subsets) Let (Gn)n≥1 be a sequenceof rooted graphs. Let (G, o) be a random variable on G? having law µ. Let T? ⊂ G?be a subset of the space of rooted graphs. Assume that µ

((G, o) ∈ T?

)= 1. Then,

(Gn, on)d−→ (G, o) when (2.3.9) holds for all H? ∈ T?(r) and all r ≥ 1.

We will apply Theorem 2.8 in particular when the limit is a.s. a tree. Then,Theorem 2.8 implies that we only have to investigate H? that are trees themselves.

Proof The set T?(r) is countable. Therefore, since µ((G, o) ∈ T?

)= 1, for every

ε > 0, there exists an m = m(ε) and a subset T?(r,m) of size at most m suchthat µ

(Br(o) ∈ T?(r,m)

)≥ 1− ε. Fix this set. Then we bound

P(B(Gn)

r (on) 6∈ T?(r))

= 1− P(B(Gn)

r (on) ∈ T?(r))

(2.2.12)

≤ 1− P(B(Gn)

r (on) ∈ T?(r,m)).

Therefore,

lim supn→∞

P(B(Gn)

r (on) 6∈ T?(r))≤ 1− lim inf

n→∞P(B(Gn)

r (on) ∈ T?(r,m))

(2.2.13)

= 1− µ(B(G)

r (o) ∈ T?(r,m))≤ 1− (1− ε) = ε.


Since ε > 0 is arbitrary, we conclude that P(B(Gn)

r (on) 6∈ T?(r))→ 0. In particu-

lar, this means that, for any H? 6∈ T?(r),

P(B(Gn)

r (on) ' H?

)→ 0 = µ

(B(G)

r (o) ' H?

). (2.2.14)

Thus, when the required convergence holds for every H? ∈ T?(r), it follows forevery H? 6∈ T?(r) with limit zero when µ

((G, o) ∈ T?

)= 1.

2.2.4 Examples of local weak convergence

We close this section by discussing two relevant examples of local weak conver-gence. We start with uniform points in large boxes in Zd, and then discuss thelocal weak limit of finite trees.

Local weak convergence of boxes in Zd

Consider the nearest-neighbor box [n]d, where x = (x1, . . . , xd) ∈ Zd is a neighborof y ∈ Zd precisely when there is a unique i ∈ [d] such that |xi − yi| = 1. Take aroot uniformly at random, and denote the resulting graph by (Gn, on). We claim

that (Gn, on)d−→ (Zd, o), which we now prove. We rely on Theorem 2.6, which

shows that we need to prove tightness and convergence of subgraph proportions.We use Theorem 2.7 to see that (Gn, on) is tight, since (Gn, on) has uniformlybounded degree.

Let µ be the point measure on (Zd, o), so that µ(B(G)

r (o) ' B(Zd)

r (o)) = 1. Thus,by Theorem 2.8, it remains to show that p(Gn)(B(Zd)

r (o))→ 1 (recall (2.2.5)).

For this, we note that B(Gn)

r (on) ' B(Zd)

r (o) unless on happens to lie withindistance strictly smaller than r from one of the boundaries of [n]d. This meansthat either one of the coordinates of on is in [r − 1], or in [n] \ [n− r + 1]. Sincethe latter occur with vanishing probability, the claim follows.

In the above case, we see that the local weak limit is deterministic, as onewould have expected. One can generalize the above to convergence of tori as well.

Local weak convergence of truncated trees

Recall the notion of a tree in Section 1.5. Fix a degree d, and a height n that wewill take to infinity. We now define the regular tree Td,n truncated at height n.The graph Td,n has vertex set

V (Td,n) = ∅ ∪n⋃k=1

∅ × [d]× [d− 1]k−1, (2.2.15)

and edge set as follows. Let v = ∅v1 · · · vk and u = ∅u1 · · ·u` be two vertices. Wesay that u is the parent of v when ` = k − 1 and ui = vi for all i ∈ [k − 1]. Thenwe say that two vertices u and v are neighbors when u is the parent of v or viceversa. This is a graph with

|V (Td,n)| = 1 + d+ · · ·+ d(d− 1)n−1 (2.2.16)

2.3 Local convergence of random graphs 45

many vertices.

Let on denote a vertex chosen uniformly at random from V (Td,n). We consider(Td,n, on) and its local weak limit, that we now describe. We first consider theso-called canopy tree. For this, we take the graph Td,n, root it at a leaf and takethe limit of n→∞. Remove the label of the root leaf, and denote this graph byTd, which we consider to be an unrooted graph. This graph has a unique infinitepath from the root-leaf. Let o` be the `th vertex on this infinite path (the rootleaf having label 0), and consider (Td, o`). Define the limiting measure µ by

µ((Td, o`)) ≡ µ` = (d− 2)(d− 1)−`+1, ` ≥ 0. (2.2.17)

We claim that (Gn, on) ≡ (Td,n, on)d−→ (G, o) with law µ. We again rely on

Theorem 2.6, which shows that we need to prove tightness and convergence ofsubgraph proportions. We use Theorem 2.7 to see that (Td,n, on) is tight, since(Td,n, on) has uniformly bounded degree.

Further, by Theorem 2.8, it remains to show that p(Gn)(B(Td)

r (o`)) → µ` (re-call (2.2.5)). When n is larger than r (which we now assume), we have thatB(Gn)

r (on) ' B(Td)

r (o`) precisely when on has distance ` from the closest leaf.There are

d(d− 1)n−1−k (2.2.18)

vertices at distance k from the root, out of a total of |V (Td,n)| = d(d− 1)n/(d−2)(1 + o(1)). Thus,

p(Gn)(B(Td)

r (o`)) =d(d− 1)n−1−k

||V (Td,n)| → (d− 2)(d− 1)−`+1 = µ`, (2.2.19)

as required.

We see that, for the truncated tree, the local weak limit is random, where therandomness clearly originates from the choice of the random root of the graph.In this case, this is more particularly due to the choice how far away the chosenroot is from the leaves of the finite tree.

Having discussed the notion of local weak convergence for deterministic graphs,we now move on to random graphs. Here the situation becomes even more delicate,as now we have double randomness, both in the random root as well as the randomgraph. This gives rise to surprisingly subtle matters.

2.3 Local convergence of random graphs

We next discuss settings of random graphs. This section is organised as follows.In Section 2.3.1, we define what it means for a sequence of random graphs toconverge in the local weak sense, as well as which different versions exist thereof.In Section 2.3.2, we then give a useful criterion to verify local weak convergenceof random graphs that shows that the proportion of vertices whose neighborhood


counts are isomorphic to a certain rooted graph converges for all rooted graphs.In Section 2.2.3 we prove the completeness of the limit by showing that when thelimit is supported on a subset of rooted graphs, then one only needs to verifythe convergence for that subset. In many examples that we will encounter in thisbook, this subset is the collection of trees. We close in Section 2.3.5 by presentingthe example of local weak convergence in probability of the Erdos-Renyi randomgraph ERn(λ/n).

2.3.1 Definition of local convergence of random graphs

Even for random variables, there are different notions of convergence that arerelevant, such as convergence in distribution and in probability. Also for localweak convergence, there are several related notions of convergence that we mayconsider:

Definition 2.9 (Local weak convergence of random graphs) Let Gn = (V (Gn), E(Gn))denote a finite (possibly disconnected) random graph. Then,

(a) we say that Gn converges locally weakly to (G, o) having law µ when

E[h(Gn, on

)]→ Eµ

[h(G, o

)], (2.3.1)

for every bounded and continuous function h : G? 7→ R, while the expectation Eis w.r.t. the random vertex on and the random graph Gn. This is equivalent to

(Gn, on)d−→ (G, o);

(b) we say that Gn converges locally in probability to (G, o) having (possibly ran-dom) law µ when

E[h(Gn, on

)| Gn

] P−→ Eµ[h(G, o

)]. (2.3.2)

for every bounded and continuous function h : G? 7→ R. We write this asGn

P−loc−−−→ (G, o);(c) we say that Gn converges locally almost surely to (G, o) having law µ when

E[h(Gn, on

)| Gn

] a.s.−→ Eµ[h(G, o

)], (2.3.3)

for every bounded and continuous function h : G? 7→ R. We write this asGn

a.s.−loc−−−−→ (G, o).

As usual in convergence of random variables, the difference between theseclosely related definitions lies in what can be concluded from it, and how theycan be proved. We discuss the differences in the limits denoted by µ for conver-gence in distribution and µ for convergence in probability and almost surely inCorollary 2.11 below.

When we have local convergence in probability, then E[h(Gn, on

)| Gn

], which

is a random variable due to the dependence on the random graph Gn, convergesin probability to Eµ

[h(G, o

)], which possibly is also a random variable in that

µ might be a random probability distribution on G?. When we have local weak


convergence, instead, then only expectations w.r.t. the random graph of the formE[h(Gn, on

)]converge.

Remark 2.10 (Local convergence in probability and rooted vs. unrooted graphs)Usually, when we have a sequence of objects xn living in some space X , and xnconverges to x, then x also lives in the same space. In the above local convergencein probability and almost surely, however, we take a graph sequence (Gn)n≥1

that converges locally in probability to a rooted graph (G, o) ∼ µ. One might

have guessed that this is related to (Gn, on)P−→ (G, o), but it is quite different.

Indeed, (Gn, on)P−→ (G, o) is a very strong and not so useful statement, since

sampling on gives rise to variability in (Gn, on) that is hard to capture by thelimit (G, o).

The following observations turn the local convergence in probability into con-vergence of objects living on the same space, namely, the space of measures onrooted graphs. Denote the empirical neighborhood measure µn on G? by

µn(H?) =1

|V (Gn)|∑

v∈V (Gn)

1(Gn,v)∈H?, (2.3.4)

for any measurable subset H? of G?. Then, (Gn)n≥1 that converges locally inprobability to the random rooted graph (G, o) ∼ µ when

µn(H?)P−→ µ(H?), (2.3.5)

for every measurable subset H? of G?. Thus, it is equivalent to the convergencein probability of the empirical neighborhood measure. This explains why we viewlocal convergence in probability as a property of the graph Gn, rather than therooted graph (Gnon). A similar comment applies to local convergence almostsurely.

The limits µ for convergence in distribution and µ for convergence in probabilityand almost surely are closely related to each other:

Corollary 2.11 (Relation between local weak limits) Suppose that (Gn, on)d−→

(G, o) having law µ, and in (Gn)n≥1 converges locally probability to (G, o) havinglaw µ. Then µ(·) = E[µ(·)]. In particular,

Eµ[h(G, o

)]= E

[Eµ[h(G, o

)]]. (2.3.6)

Proof By construction,

E[h(Gn, on

)| Gn

]→ Eµ

[h(G, o

)]. (2.3.7)

Further, note that Eµ[h(G, o

)]is a bounded random variable, and so is En

[h(Gn, on

)].

Therefore, by Dominated Convergence [Volume 1, Theorem A.1], also the expec-tations converge. Therefore,

E[h(Gn, on

)]= E

[En[h(Gn, on

)]]→ E

[Eµ[h(G, o

)]], (2.3.8)


which, together with (2.3.7), completes the proof.

In most of our examples, the law µ of the local weak limit in probability isactually deterministic, in which case µ = µ. However, there are some cases wherethis is not true. A simple example arises as follows. For ERn(λ/n), the local weaklimit in probability will turn out to be a Poi(λ) branching process. Therefore,when considering ERn(X/n), where X is uniform on [0, 2], the local weak limitin probability will be a Poi(X) branching process. Here, the mean of all theoffsprings are random and related, as they are all equal to X. This is not thesame as a mixed-Poisson branching process with offspring distribution Poi(X),since for the local limit in probability of ERn(X/n), we draw X only once. SeeSection 2.3.5 for more details on local convergence for ERn(λ/n).

We have added the notion of local convergence in the almost sure sense, eventhough for random graphs this notion often is not highly useful. Indeed, almostsure convergence for random graphs is often already tricky, as for static modelssuch as the Erdos-Renyi random graph and the configuration model, there is noobvious relation between the graphs of size n and those of size n+1. This of courseis different for the preferential attachment model, which is a (consistent) graphprocess. However, even for preferential attachment models, local convergence isall about the neighborhood of a uniform vertex, and it is not obvious to relatethe uniform choices for graphs of different sizes.

2.3.2 Criterion for local convergence of random graphs and tight-ness

We next discuss a convenient criterion for local convergence, inspired by The-orem 2.6:

Theorem 2.12 (Criterion for local convergence of random graphs) Let (Gn)n≥1

be a sequence of rooted graphs. Then,

(a) (Gn, on)d−→ (G, o) having law µ when

((Gn, on)

)n≥1

is tight and when, for

every rooted graph H? ∈ G? and all integers r ≥ 0,

E[p(Gn)(H?)] =1

|V (Gn)|∑

v∈V (Gn)

P(B(Gn)

r (v) ' H?)→ µ(B(G)

r (o) ' H?). (2.3.9)

(b) Gn converges locally in probability to (G, o) having law µ when((Gn, on)

)n≥1

is tight and when, for every rooted graph H? ∈ G? and all integers r ≥ 0,

p(Gn)(H?) =1

|V (Gn)|∑

v∈V (Gn)

1B(Gn)r (v)'H?

P−→ µ(B(G)

r (o) ' H?). (2.3.10)

(c) Gn converges almost surely in the local weak sense to (G, o) having law µ when((Gn, on)

)n≥1

is almost surely tight and when, for every rooted graph H? ∈ G?


and all integers r ≥ 0,

p(Gn)(H?) =1

|V (Gn)|∑

v∈V (Gn)

1B(Gn)r (v)'H?

a.s.−→ µ(Br(o) ' H?). (2.3.11)

Proof This follows directly from Theorem 2.6.We next investigate tightness:

Theorem 2.13 (Tightness criterion for local convergence of random graphs)Let (Gn)n≥1 be a sequence of graphs with V (Gn). Let d(n)

ondenote the degree of on

in Gn, where on is chosen uniformly at random from the vertex set V (Gn) of Gn.Then

(a) ((Gn, on))n≥1 is tight when (d(n)

on)n≥1 forms a uniformly integrable sequence of

random variables;

(b) ((Gn, on))n≥1 is tight in probability when (d(n)

on)n≥1 forms a uniformly integrable

sequence of random variables;

(c) ((Gn, on))n≥1 is tight almost surely when (d(n)

on)n≥1 almost surely forms a uni-

formly integrable sequence of random variables.

Proof Part (a) is simply Theorem 2.7. For part (b), we note that Theorem 2.7implies that tightness in probability follows when

limK→∞

lim supn→∞

P(E[d(n)

on1d(n)

on >K| Gn] > ε) = 0, (2.3.12)

which, after applying the Markov inequality, follows from the uniform integrabil-ity of (d(n)

on)n≥1. For part (c), we again use Theorem 2.7. but now we need to show

that the convergence occurs almost surely. That means that, for every ε > 0,there exists a K = K(ε) > 0 such that the event E[d(n)

on1d(n)

on >K| Gn] > ε

occurs only finitely often.

In what follows, we will be mainly interested in convergence in probability inthe local weak sense, since this is the notion that is the most powerful in thesetting of random graphs.

2.3.3 Local convergence and completeness of the limit

For many random graph models, the local limit is almost surely contained ina smaller set of rooted graphs. The most common example is when the randomgraph converges locally to a tree (recall that trees are rooted, see Section 1.5), butit can apply more generally. In this case, and similarly to Theorem 2.8, it turnsout to be enough to prove the convergence in Definition 2.9 only for elementsthat the limit (G, o) takes values in. Let us explain this in more detail.

Let T? ⊂ G? be a subset of the space of rooted graphs. Recall that T?(r) ⊆ T?

is the subset of T? of graphs for which the distance between any vertex and theroot is at most r. Then, we have the following result:


Theorem 2.14 (Local weak convergence and subsets) Let (Gn)n≥1 be a se-quence of rooted graphs. Let (G, o) be a random variable on G? having law µ. LetT? ⊂ G? be a subset of the space of rooted graphs. Assume that µ

((G, o) ∈ T?

)= 1.

Then, (Gn, on)d−→ (G, o) when (2.3.9) holds for all H? ∈ T?(r) and all r ≥ 1.

Similar extensions hold for local convergence in probability in (2.3.10) and al-most surely in (2.3.11), with µ replaced by µ and (G, o) by (G, o).

Proof The proof for local weak convergence is identical to that of Theorem 2.8.The extensions to convergence in probability and almost surely follow similarly.

2.3.4 Local convergence of random regular graphs

In this section, we give the very first example of a random graph that convergeslocally in probability. This is the random regular graph, which is obtained bytaking the configuration model CMn(d), and letting di = d for all i ∈ [n]. Here,we assume that nd is even. The main result is the following:

Theorem 2.15 (Local weak convergence of random regular graphs) Fix d ≥ 1and assume that nd is even. The random regular graph of degree d and size nconverges locally in probability to rooted d-regular tree.

Looking back at the local weak limit of truncated trees discussed in Section2.2.4, we see that a random regular graph is a much better approximation to ad-regular tree than a truncation of it. One could see this as another example ofthe probabilistic method, where a certain amount of randomization is useful toconstruct deterministic objects.

Proof Let Gn be the random regular graph, and let (Td, o) be the rooted d-regular tree. Since the limit is constant, convergence in probability follows fromconvergence in distribution, which we now do. We use the convenient criterionin Theorem 2.12. Tightness follows by Theorem 2.13, since the degrees in Gn areconstant, so in particular bounded. We see that the only further requirement isto show that E[p(Gn)(B(Td)

r (o))]→ 1.Write

1− E[p(Gn)(B(Td)

r (o))] = P(B(Gn)

r (on) is not a tree). (2.3.13)

For B(Gn)

r (on) not to be a tree, a cycle needs to occur within distance r. Wegrow the neighborhood B(Gn)

r (on) by pairing the half-edges incident to discoveredvertices one by one. Since there is just a bounded number of unpaired half-edgesincident to the vertices found at any moment in the exploration, and since weneed to pair at most

d+ d2 + · · ·+ dr (2.3.14)

half-edges, the probability that any one of them creates a cycle vanishes. Weconclude that 1− E[p(Gn)(B(Td)

r (o))]→ 0, which completes the proof.


2.3.5 Local convergence of Erdos-Renyi random graphs

In this section, we work out one example and show that the Erdos-Renyi ran-dom graph ERn(λ/n) converges locally weakly in probability to a Poi(λ) branch-ing process:

Theorem 2.16 (Local convergence of Erdos-Renyi random graph) Fix λ > 0.ERn(λ/n) converges locally in probability to a Poisson branching process withmean offspring λ.

Proof We start by reducing the proof to a convergence in probability statement.

Setting the stage of the proof

We start by using the convenient criterion in Theorem 2.12. Tightness follows byTheorem 2.13, since the degrees in Gn = ERn(λ/n) are uniformly integrable, by(1.3.5) (see also Theorem 1.3), and the fact that

E[d(Gn)

on] =

2

nE[|E(Gn)] =

2

n

(n

2

)λ

n→ λ =

∑k≥0

kpk. (2.3.15)

Thus, we are left to prove (2.3.10) in Theorem 2.12.We then rely on Theorem 2.14, and prove convergence of subgraph proportions

as in (2.3.11) only for trees. Recall the discussion of trees in Section 1.5. There,we considered trees to be ordered, in the sense that every vertex has an orderedset of forward neighbors. Fix a tree t. Then, in order to prove that ERn(λ/n)converges locally weakly in probability to a Poi(λ) branching process, we need toshow that, for every tree t and all integers r ≥ 0,

p(Gn)(t) =1

n

∑u∈[n]

1B(Gn)r (u)'t

P−→ µ(B(G)

r (o) ' t), (2.3.16)

where Gn equals ERn(λ/n), and the law µ of (G, o) is that of a Poi(λ) branchingprocess. We see that in this case, µ is deterministic, as it will be in most examples.In (2.3.20), we may without loss of generality assume that t is a finite tree of depthat most r, since otherwise both sides are zero.

Ordering trees and subgraphs

Of course, for the event B(G)

r (o) ' t to occur, the order of the tree t is irrelevant.Recall the breadth-first exploration of the tree t, which is described in terms of(xi)

ti=0 as in (1.5.1) and the corresponding vertices (ai)

ti=0. Further, note that

(G, o) is, by construction, an ordered tree, and therefore so is B(G)

r (o). Therefore,we can write B(G)

r (o) = t to indicate that the two ordered trees B(G)

r (o) and tagree. In terms of this notation, one can compute

µ(B(G)

r (o) = t) =∏

i∈[t] : dist(∅,ai)<r

e−λλxi

xi!, (2.3.17)


where dist(∅, v) is the tree distance between v ∈ V (t) and the root ∅ ∈ V (t).We note that Br(o) ' t says nothing about the degrees of the vertices that areat distance exactly r away from the root ∅, which is why we restrict to vertices vwith dist(∅, ai) < r in (2.3.18). Further, for each ordered tree t′ that is isomorphicto the tree t, we have that µ(Br(o) = t′) = µ(Br(o) = t), since the root degreesand degree sequences of the non-root vertices are the same for all trees that areisomorphic to t, and the right-hand side of (2.3.17) only depends on the degreeof the root, and the degrees of all other non-root vertices. Therefore,

µ(B(G)

r (o) ' t) = #(t)∏


e−λλxi

xi!, (2.3.18)

where #(t) is the number of ordered trees that are isomorphic to t. This identifiesthe right hand side of (2.3.20). We note further that by permuting the labels ofall the children of any vertex in t, we obtain a rooted tree that is isomorphic to t,and there are

∏i∈[t] xi! such permutations. However, not all of them may lead to

distinct ordered trees. In our analysis, the precise value of #(t) is not relevant.It will be convenient to also order the vertices in B(Gn)

r (on), where Gn =ERn(λ/n). This will be achieved by ordering the forward children of a vertex inB(Gn)

r (on) according to their vertex labels. Then, we can again write B(Gn)

r (o) = tto indicate that the two ordered graphs B(Gn)

r (o) and t agree. This implies thatB(Gn)

r (o) is a tree (so there are no cycles within depth r), and that its orderedversion is equal to the ordered tree t. Then, as in (2.3.18),

p(Gn)(t) =1

n

∑u∈[n]

1B(Gn)r (u)'t = #(t)

1

n

∑u∈[n]

1B(Gn)r (u)=t. (2.3.19)

We will prove that

1

n

∑u∈[n]

1B(Gn)r (u)=t

P−→ µ(B(G)

r (o) = t). (2.3.20)

Second moment method: first moment

To prove (2.3.20), we use a second moment method. Denote

Nn(t) =∑u∈[n]

1B(Gn)r (u)=t. (2.3.21)

We will then apply a second moment method to show that Nn(t) is highly con-centrated around nµ(Br(o) = t).

We start by investigating the first moment of Nn(t), which equals

E[Nn(t)] =∑u∈[n]

P(B(Gn)

r (u) = t) = nP(B(Gn)

r (1) = t), (2.3.22)

where the latter step uses the fact that the distribution of neighborhoods of allvertices is the same. We recall the breadth-first description of an ordered tree inSection 1.5. Let vi ∈ [n] denote the vertex label of the ith vertex that is explored


in the breadth-first exploration. Let Xi denote the number of forward neighborsof vi, except when vi is at graph distance r from vertex 1, in which case Xi = 0 byconvention. Further, let Yi denote the number of edges to already found, but notyet explored, vertices. Then, B(Gn)

r (1) = t occurs precisely when (Xi, Yi) = (xi, 0)for all i ∈ [t]. Therefore,

P(B(Gn)

r (1) = t) = P((Xi, Yi) = (xi, 0) ∀i ∈ [t]) (2.3.23)

= P((X[t], Y[t]) = (x[t], 0[t]) ∀i ∈ [t]),

where we abbreviate x[t] = (x1, . . . , xt). We condition to write this as

P(B(Gn)

r (1) = t) =t∏i=1

P((Xi, Yi) = (x[i], 0[i]) | (X[i−1], Y[i−1]) = (x[i−1], 0[i−1])

).

(2.3.24)

Conditionally on (X[i−1], Y[i−1]) = (x[i−1], 0[i−1]), for all i for which vi is at distanceat most r − 1 from vertex 1,

Xi ∼ Bin(ni, λ/n), (2.3.25)

where ni = n − si−1 − i + 1, and Xi = 0 otherwise. Here, we recall from (1.5.1)that (si)i≥0 satisfies s0 = 1 and si = si−1 +xi−1 for i ≥ 1. Further, for all i ∈ [t],

Yi ∼ Bin(si−1 − 1, λ/n), (2.3.26)

since there are si−1 active vertices, and Yi counts the number of edges betweenvi and any other vertex. Finally, Xi and Yi are conditionally independent given(X[i−1], Y[i−1]) = (x[i−1], 0[i−1]) due to the independence of edges in ERn(λ/n).Note that the distance of vi from vertex 1 is exactly equal to the distance of thecorresponding vertex ai ∈ V (t) to the root ∅ ∈ V (t). Therefore,

P(B(Gn)

r (1) = t) =∏


P(Bin(ni, λ/n) = xi)×∏i∈[t]

(1− λ

n

)si−1

→∏


e−λλxi

xi!= µ(Br(o) = t). (2.3.27)

We conclude that

P(B(Gn)

r (1) = t)→ µ(Br(o) = t), (2.3.28)

so that1

nE[Nn(t)]→ µ(Br(o) = t). (2.3.29)

Second moment method: second moment

For the second moment of Nn(t), we compute

E[Nn(t)2] =∑

u1,u2∈[n]

P(B(Gn)

r (u1) = t, B(Gn)

r (u2) = t) (2.3.30)

= nP(B(Gn)

r (1) = t) + n(n− 1)P(B(Gn)

r (1) = t, B(Gn)

r (2) = t).


We have already computed the asymptotics of the first term, so we are left withthe second term. We claim that

P(B(Gn)

r (1) = t, B(Gn)

r (2) = t)→ µ(Br(o) = t)2. (2.3.31)

We are left to show (2.3.31). Let distGn(1, 2) denote the graph distance betweenvertices 1 and 2 in Gn = ERn(λ/n). We split

P(B(Gn)

r (1) = t, B(Gn)

r (2) = t) (2.3.32)

= P(B(Gn)

r (1) = t, B(Gn)

r (2) = t,distGn(1, 2) > 2r)

+ P(B(Gn)

r (1) = t, B(Gn)

r (2) = t,distGn(1, 2) ≤ 2r).

We bound these terms one by one. Using the fact that all vertices are exchange-able, so that vertex 2 has the same distribution as a uniform vertex unequal tovertex 1, the second term in (2.3.32) can be bounded by

P(B(Gn)

r (1) = t, B(Gn)

r (2) = t,distGn(1, 2) ≤ 2r) (2.3.33)

= E[1B(Gn)

r (1)=t|B(Gn)

2r (1)| − 1

n− 1

]≤ 1

nE[|B(Gn)

2r (1)|].

Write

E[|B(Gn)

2r (1)|]

=2r∑k=0

E[|∂B(Gn)

k (1)|]. (2.3.34)

It is not hard to show that (see Exercise 2.18)

E[|∂B(Gn)

k (1)|]≤ λk. (2.3.35)

This implies that

P(B(Gn)

r (1) = t, B(Gn)

r (2) = t, distGn(1, 2) ≤ 2r) = o(1). (2.3.36)

We rewrite the first term in (2.3.32) as

P(B(Gn)

r (1) = t,distGn(1, 2) > 2r)P(B(Gn)

r (2) = t | B(Gn)

r (1) = t, distGn(1, 2) > 2r).(2.3.37)

By (2.3.28) and (2.3.36),

P(B(Gn)

r (1) = t, distGn(1, 2) > 2r)→ µ(Br(o) = t). (2.3.38)

Further, recall that t = |V (t)| denotes the number of vertices in t. The event thatdistGn(1, 2) > 2r is the same as the event that B(Gn)

r (2) is disjoint from B(Gn)

r (1).We note that

P(B(Gn)

r (2) = t | B(Gn)

r (1) = t,distGn(1, 2) > 2r) (2.3.39)

= P(B(Gn)

r (2) = t in ERn−t(λ/n)).

Since t is bounded, and as in (2.3.28),

P(B(Gn)

r (2) = t | B(Gn)

r (1) = t, distGn(1, 2) > 2r)→ µ(Br(o) = t), (2.3.40)

which completes the proof of (2.3.31).

2.4 Consequences of local weak convergence: local functionals 55

Completion of the proof

The convergence in (2.3.31) implies that Var(Nn(t))/E[Nn(t)]2 → 0, so that, bythe Chebychev inequality [Volume 1, Theorem 2.18],

Nn(t)

E[Nn(t)]P−→ 1, (2.3.41)

In turn, by (2.3.29), this implies that

1

nNn(t)

P−→ µ(Br(o) = t), (2.3.42)

which implies that p(n)(t) = #(t)Nn(t)/nP−→ #(t)µ(Br(o) = t) = µ(Br(o) ' t),

as required.

In Chapters 3, 4 and 5 below, we will extend the above analysis to inhomo-geneous random graphs, the configuration model and preferential attachmentmodels. In many cases, the steps taken are a lot like the above. We always com-bine a second moment method for Nn(t) with explicit computations that allowus to show the appropriate adaptation of (2.3.42).

2.4 Consequences of local weak convergence: local functionals

In this section, we discuss some consequences of local weak convergence thatwill prove to be useful in the sequel, or describe how network statistics are deter-mined by the local weak limit.

2.4.1 Local weak convergence and convergence of neighborhoods

We start by describing that the number of vertices at distance up to m from auniform vertex weakly converges to the neighborhood sizes of the limiting rootedgraph:

Corollary 2.17 (Weak convergence of neighborhood sizes) Let (Gn)n≥1 be asequence of graphs whose sizes |V (Gn)| tend to infinity.(a) Assume that Gn converges locally weakly to (G, o) ∼ µ on G?. Then, for everym ≥ 1, (

|∂B(Gn)

r (on)|)mr=0

d−→(|∂B(G)

r (o)|)mr=1

. (2.4.1)

(b) Assume that Gn converges locally in probability to (G, o) ∼ µ on G?. Then, forevery m ≥ 1, with o(1)

n , o(2)

n two independent uniformly chosen vertices in V (Gn),((|∂B(Gn)

r (o(1)

n )|, |∂B(Gn)

r (o(2)

n )|))m

r=1

d−→((|∂B(G)

r (o(1))|, |∂B(G)

r (o(2))|))m

r=1,

(2.4.2)where the two limiting neighborhood sizes are independent given µ.

Proof Part (a) follows immediately, since the function

h(G, o) = 1|∂Br(o)|=`r∀r∈[m] (2.4.3)


is a bounded continuous function for every m and `1, . . . , `m (see Exercise 2.21).The proof of part (b) in (2.4.2) follows by noting that

P(B(Gn)

m (o(1)

n ) ' t1, B(Gn)

m (o(2)

n ) ' t2 | Gn

)= p(Gn)(t1)p(Gn)(t2), (2.4.4)

by the indepndence of o(1)

n , o(2)

n . Therefore,

P(B(Gn)

m (o(1)

n ) ' t1, B(Gn)

m (o(2)

n ) ' t2 | Gn

)P−→ µ(B(G)

m (o(1)) ' t1)µ(B(G)

m (o(2)) ' t2).

(2.4.5)Taking the expectation proves the claim (where you are asked to provide the finedetails of this argument in Exercise 2.22).

In the above argument, it is crucial to note that when µ is a random probabilitymeasure on G?, that the limits in (2.4.2) are corresponding to two independent(G, o) having law µ, but with the same µ. It is here that the randomness of µmanifests itself. Recall also the example below Corollary 2.11.

We continue by showing that local weak convergence implies that the graphdistance between two uniform vertices tends to infinity:

Corollary 2.18 (Large distances) Let (Gn)n≥1 be a sequence of graphs whosesizes |V (Gn)| tend to infinity. Let o(1)

n , o(2)

n be two vertices chosen independentlyand uniformly at random from [n]. Assume that Gn converges in distribution inthe local weak sense to (G, o) ∼ µ. Then,

distGn(o(1)

n , o(2)

n )P−→∞. (2.4.6)

Proof It suffices to prove that, for every r ≥ 1,

P(distGn(o(1)

n , o(2)

n ) ≤ r) = o(1). (2.4.7)

For this, we use that o(2)

n is chosen uniformly at random from V (Gn) independentlyof o(1)

n , so that

P(distGn(o(1)

n , o(2)

n ) ≤ r) = E[|B(Gn)

r (o(1)

n )|/n]

= E[|B(Gn)

r (on)|/n]. (2.4.8)

By Corollary 2.17, |B(Gn)

r (on)| is a tight random variable, so that |B(Gn)

r (on)|/n P−→0. Further, |B(Gn)

r (on)|/n ≤ 1 a.s. Thus, by Dominated Convergence ([Volume 1,Theorem A.1]), E

[|B(Gn)

r (on)|/n]

= o(1) for every r ≥ 1, so that the claim follows.

2.4.2 Local weak convergence and clustering coefficients

In this section, we discuss the convergence of various local and global clusteringcoefficients when a random graph converges locally weakly.

We start by recalling what the global clustering coefficient is, following [Volume1, Section 1.5]. For a graph Gn = (V (Gn), E(Gn)), we let

WGn=

∑i,j,k∈V (Gn)

1ij,jk∈E(Gn) =∑

v∈V (Gn)

Dv(Dv − 1) (2.4.9)

denote two times the number of wedges in the graph Gn. The factor of two comes


from the fact that the wedge ij, jk is the same as the wedge kj, ji, but it iscounted twice in (2.4.9). We further let

∆Gn=

∑i,j,k∈V (Gn)

1ij,jk,ik∈E(Gn) (2.4.10)

denote three times the number of triangles in Gn. The (global) clustering coeffi-cient CCGn in Gn is defined as

CCGn =∆Gn

WGn

. (2.4.11)

Informally, the global clustering coefficient measures the proportion of wedgesfor which the closing edge is also present. As such, it can be thought of as theprobability that from a randomly drawn individual and two of its friends, thetwo friends are friends themselves. The following theorem describes when theclustering coefficient converges:

Theorem 2.19 (Convergence of global clustering coefficient) Let (Gn)n≥1 be asequence of graphs whose sizes |V (Gn)| tend to infinity. Assume that Gn convergeslocally in probability to (G, o) ∼ µ. Further, assume that Dn = d(Gn)

onis such that

D2n is uniformly integrable, and that µ(do > 1) > 0. Then

CCGnP−→ Eµ[∆G(o)]

Eµ[do(do − 1)], (2.4.12)

where ∆G(o) =∑

u,v∈∂B1(o) 1u,v∈E(G) denotes twice the number of triangles inG that contain o as a vertex.

Proof We write

CCGn =E[∆Gn

(on) | Gn]

E[d(Gn)on (d(Gn)

on − 1) | Gn], (2.4.13)

where the expectation is with respect to the uniform choice of on ∈ [n], and d(Gn)

on

denotes the degree of on in Gn, while ∆Gn(on) =

∑u,v∈∂B(Gn)

1 (on)1u,v∈E(Gn)

denotes twice the number of triangles that on is part of.

By local weak convergence, ∆Gn(on)

d−→ ∆G(o) and d(Gn)

on(d(Gn)

on−1)

d−→ do(do−1). However, both are not bounded functionals, so that the convergence of theirexpectations over on does not follow immediately. It is here that we need to makeuse of the uniform integrability of D2

n, where Dn = d(Gn)

on. We split

E[d(Gn)

on(d(Gn)

on− 1) | Gn] (2.4.14)

= E[d(Gn)

on(d(Gn)

on− 1)1(d(Gn)

on )2≤K | Gn] + E[d(Gn)

on(d(Gn)

on− 1)1(d(Gn)

on )2>K | Gn].

By local convergence in probability (recall Corollary 2.17),

E[d(Gn)

on(d(Gn)

on− 1)1(d(Gn)

on )2≤K | Gn]P−→ Eµ[do(do − 1)1d2

o≤K], (2.4.15)

since h(G, o) = do(do − 1)1d2o≤K is a bounded continuous function. Further, by

uniform integrability of (d(Gn)

on)2 and with E denoting the expectation with respect


to on as well as the random graph, for every ε > 0, there exists an N = N(ε)sufficiently large such that, uniformly in n ≥ N(ε).

E[d(Gn)

on(d(Gn)

on− 1)1(d(Gn)

on )2>K] ≤ E[(d(Gn)

on)21(d(Gn)

on )2>K] ≤ ε2. (2.4.16)

Then, by the Markov inequality,

P(E[d(Gn)

on(d(Gn)

on− 1)1(d(Gn)

on )2>K | Gn] ≥ ε)≤ 1

εE[(d(Gn)

on)21(d(Gn)

on )2>K] ≤ ε.(2.4.17)

It follows that E[d(Gn)

on(d(Gn)

on− 1) | Gn]

P−→ Eµ[do(do − 1)], as required. Since

µ(do > 1) > 0, also Eµ[do(do − 1)] > 0, so that also 1/E[d(Gn)

on(d(Gn)

on− 1) | Gn]

P−→1/Eµ[do(do − 1)].

The proof that En[∆Gn(on)]

P−→ Eµ[∆G(o)] is similar, where now we split

E[∆Gn(on) | Gn] = E[∆Gn

(on)1(d(Gn)on )2≤K | Gn] + E[∆Gn

(on)1(d(Gn)on )2>K | Gn].

(2.4.18)Then, we observe that since ∆Gn

(on) ≤ d(Gn)

on(d(Gn)

on−1), again the first term is the

expectation of a bounded and continuous functional, and therefore converges inprobability. The second term, on the other hand, is bounded by E[(d(Gn)

on)21(d(Gn)

on )2>K],

which can be treated as above. This completes the proof.

We see that in order to obtain convergence of the clustering coefficient, we needan additional uniform integrability condition on the degree distribution. Indeed,more precisely, we need that D2

n is uniformly integrable, with Dn = d(Gn)

onthe

degree of a uniform vertex. This will also be a recurring theme below. We nextdiscuss a related clustering coefficient, where such additional assumptions are notneeded. For this, we define the local clustering coefficient for vertex v ∈ [n] to be

CCGn(v) =∆Gn

(v)

dv(dv − 1), (2.4.19)

where ∆Gn(v) =

∑s,t∈∂B(Gn)

1 (v)1s,t∈E(Gn) is again twice the number of triangles

that v is part of. Then, we let

CCGn =1

n

∑v∈[n]

CCGn(v) (2.4.20)

be the local clustering coefficient. Here, we can think of ∆Gn (v)

dv(dv−1)as the proportion

of edges present between neighbors of v, and then (2.4.20) takes the average ofthis. The following theorem implies its convergence without any further uniformintegrability conditions:

Theorem 2.20 (Convergence of local clustering coefficient) Let (Gn)n≥1 be asequence of graphs whose sizes |V (Gn)| tend to infinity. Assume that Gn convergeslocally in probability to (G, o) ∼ µ. Then

CCGnP−→ Eµ

[ ∆G(o)

do(do − 1)

]. (2.4.21)


Proof We now write

CCGn = E[ ∆Gn

(on)

d(Gn)on (d(Gn)

on − 1)| Gn

], (2.4.22)

and note that h(G, o) = ∆G(o)/[do(do − 1)] is a bounded continuous functional.Therefore,

E[ ∆Gn

(on)

d(Gn)on (d(Gn)

on − 1)| Gn

]P−→ Eµ

[ ∆G(o)]

do(do − 1)

], (2.4.23)

as required.

There are more versions of the clustering coefficient. A discussion of the con-vergence of the so-called clustering spectrum can be found in Section 2.6.

2.4.3 Neighborhoods of edges and degree-degree dependencies

In this section, we discuss the convergence of degree-degree dependencies whena random graph converges locally weakly. Often, this is described in terms ofthe so-called assortativity coefficient. For this, it is crucial to also discuss theconvergence of the local neighborhood of a uniformly chosen edge. We will againsee that an extra uniform integrability condition is needed for the assortativitycoefficient to converge.

We start by defining the neighborhood structure of edges. For this, it will beconvenient to consider directed edges. We let e = (u, v) be an edge directed from u

to v, and let ~E(Gn) = (u, v) : u, v ∈ E(Gn) denote the collection of directededges. This will be convenient, as it assigns a (root-)vertex to an edge.

For H? ∈ G?, let

p(Gn)

e (H?) =1

2|E(Gn)|∑

(u,v)∈~E(Gn)

1B(Gn)r (u)'H?. (2.4.24)

Note that

p(Gn)

e (H?) = P(B(Gn)

r (e) ' H? | Gn), (2.4.25)

where e = (e, e) is a uniformly chosen directed edge from ~E(Gn). Thus, p(Gn)

e (H?)is the edge-equivalent of p(Gn)(H?) in (2.2.5). We next study its asymptotics:

Theorem 2.21 (Convergence neighborhoods of edges) Let (Gn)n≥1 be a se-quence of graphs whose sizes |V (Gn)| tend to infinity. Assume that Gn convergeslocally in probability to (G, o) ∼ µ. Assume that (d(Gn)

on)n≥1 is a uniformly inte-

grable sequence of random variables, and that µ(do ≥ 1) > 0. Then, for everyH? ∈ G?,

p(Gn)

e (H?)P−→

Eµ[do1B(G)r (o)'H?]

Eµ[do]. (2.4.26)


Proof We recall (2.4.24), and note that

2

|V (Gn)| |E(Gn)| = 1

|V (Gn)|∑

v∈V (Gn)

d(Gn)

v = E[d(Gn)

on| Gn]. (2.4.27)

Therefore, since (d(Gn)

on)n≥1 is uniformly integrable, local convergence in probabil-

ity implies that

2

|V (Gn)| |E(Gn)| P−→ Eµ[do]. (2.4.28)

Since µ(do ≥ 1) > 0, it follows that Eµ[do] > 0.Further, we rewrite

1

|V (Gn)|∑

(u,v)∈~E(Gn)

1B(Gn)r (u)'H? =

1

|V (Gn)|∑

u∈V (Gn)

d(Gn)

u 1B(Gn)r (u)'H? (2.4.29)

= E[d(Gn)

on1B(Gn)

r (on)'H? | Gn

].

where on is uniformly chosen in V (Gn). Again, since (d(Gn)

on)n≥1 is uniformly inte-

grable and by local convergence in probability,

E[d(Gn)

on1B(Gn)

r (on)'H? | Gn

]P−→ Eµ[do1B(G)

r (o)'H?]. (2.4.30)

Therefore, by (2.4.31), taking the ratio of the terms in (2.4.28)–(2.4.30) provesthe claim.

We continue by considering the degree-degree distribution, again following [Vol-ume 1, Section 1.5]. For this, it is convenient to consider directed edges e = (u, v),and write e = u and e = v for the starting and endpoint of e. The degree-degreedistribution is given by

p(Gn)

k,l =1

2|E(Gn)|∑e

1d(Gn)e =k,d

(Gn)e =l. (2.4.31)

Thus, p(Gn)

k,l is the probability that a random directed edge connects a vertex ofdegree k with one of degree l. By convention, we define p(n)

k,l = 0 when k = 0. Thefollowing theorem proves that the degree-degree distribution converges when thegraph locally converges in probability:

Theorem 2.22 (Degree-degree convergence) Let (Gn)n≥1 be a sequence ofgraphs whose sizes |V (Gn)| tend to infinity. Let Gn converge locally in proba-bility to (G, o) ∼ µ. Assume that (d(Gn)

on)n≥1 is a uniformly integrable sequence of

random variables, and that µ(do ≥ 1) > 0. Then, for every k, l with k ≥ 1,

p(n)

k,l

P−→ kµ(do = k, dV = l), (2.4.32)

where V is a neighbor of o chosen uniformly at random.


Proof Recall (2.4.28). We rewrite

1

|V (Gn)|∑e

1d(Gn)e =k,d

(Gn)e =l = k

1

n

∑u∈[n]

1du=k

(1

k

∑v : v∼u

1d(Gn)u =l

)(2.4.33)

= kEn[1d(Gn)

on =k,d(Gn)V =l

],

where V is a uniformly chosen neighbor of o, which itself is uniformly chosen in [n].Again, by local weak convergence in probability and the fact that the distributionof (do, dV ) conditionally on (G, o) is a deterministic function of B(Gn)

2 (on),

1

|V (Gn)|∑e

1d(Gn)e =k,d

(Gn)e =l

P−→ kµ(do = k, dV = l). (2.4.34)

Therefore, by (2.4.31), taking the ratio of the terms in (2.4.28)–(2.4.34) provesthe claim.

We next discuss the consequences for the assortativity coefficient (recall [Vol-ume 1, Section 1.5]). We now write the degrees in Gn as (dv)v∈V (Gn) to avoidsome notational clutter. Define the assortativity coefficient as

ρGn =

∑i,j∈V (Gn)

(1ij∈~E(Gn) − didj/| ~E(Gn)|)didj∑

i,j∈V (Gn)

(di1i=j − didj/| ~E(Gn)|)

)didj

, (2.4.35)

where ~E(Gn) is the collection of directed edges. We can recognize ρGn in (2.4.35)as the empirical correlation coefficient of the two-dimensional sequence of vari-ables (de, de)e∈~E(Gn). As a result, it is the correlation between the coordinates of

the two-dimensional random variable of which (p(n)

k,l)k,l≥1 is the joint probabilitymass function. We can rewrite the assortativity coefficient ρGn more convenientlyas

ρGn =

∑ij∈~E(Gn) didj − (

∑i∈[n] d

2i )

2/| ~E(Gn)|∑i∈V (Gn) d

3i − (

∑i∈[n] d

2i )

2/| ~E(Gn)|. (2.4.36)

The following theorem gives conditions for the convergence of ρGn when Gn con-verges in probability in the local weak sense:

Theorem 2.23 (Assortativity convergence) Let (Gn)n≥1 be a sequence of graphswhose sizes |V (Gn)| tend to infinity. Assume that Gn converges locally in probabil-ity to (G, o) ∼ µ. Assume that Dn = d(Gn)

onis such that D3

n is uniformly integrable,and that µ(do ≥ 1) > 0. Then,

ρGnP−→ Eµ[d2

odV ]− Eµ[d2o]

2/Eµ[do]

Eµ[d3o]− Eµ[d2

o]2/Eµ[do]

, (2.4.37)

where V is a neighbor of o chosen uniformly at random.

Proof We start with (2.4.36), and consider the various terms. We divide all sums


by n. Then, by local weak convergence in probability and uniform integrabilityof (d(Gn)

on)3, which implies that also d(Gn)

onis uniformly integrable,

1

n| ~E(Gn)| = E[d(Gn)

on| Gn]

P−→ Eµ[do]. (2.4.38)

Again by local weak convergence and uniform integrability of (d(Gn)

on)3, which

implies that also (d(Gn)

on)2 is uniformly integrable,

1

|V (Gn)|∑

i∈V (Gn)

d2i = E[(d(Gn)

on)2 | Gn]

P−→ Eµ[d2o]. (2.4.39)

Further, again by local weak convergence in probability and uniform integrabilityof (d(Gn)

on)3,

1

|V (Gn)|∑

i∈V (Gn)

d3i = E[(d(Gn)

on)3 | Gn]

P−→ Eµ[d3o]. (2.4.40)

This identifies the limits of all but one of the sums appearing in (2.4.36). Detailsare left to the reader in Exercise 2.19.

We finally consider the last term involving the cross terms of the degrees acrossedges, i.e.,

1

|V (Gn)|∑

ij∈~E(Gn)

didj =1

|V (Gn)|∑

u∈V (Gn)

d2u

( 1

du

∑v : v∼u

dv)

(2.4.41)

= En[d2ondV ],

where V is a random neighbor of on. When the degrees are uniformly bounded,the functional h(G, o) = d2

oE[dV | G] is bounded and continuous, so that it wouldconverge. However, the degrees are not necessarily bounded, so a truncation ar-gument is needed.

We split

1

n

∑ij∈~E(Gn)

didj =1

n

∑ij∈~E(Gn)

didj1di≤K,dj≤K +1

n

∑ij∈~E(Gn)

didj(1− 1di≤K,dj≤K).

(2.4.42)

We now rewrite, similarly as above,

1

n

∑ij∈~E(Gn)

didj1di≤K,dj≤K = E[d2ondV 1don≤K,dV ≤K | Gn]. (2.4.43)

By local convergence in probability (or by Theorem 2.22), since the functional isnow bounded and continuous,

1

|V (Gn)|∑

ij∈~E(Gn)

didj1di≤K,dj≤KP−→ Eµ[d2

odV 1don≤K,dV ≤K]. (2.4.44)


We are left to show that the second contribution in (2.4.42) is small. We boundthis contribution as

1

|V (Gn)|∑

ij∈~E(Gn)

didj(1di>K + 1dj>K) =2

|V (Gn)|∑

ij∈~E(Gn)

didj1di>K.

(2.4.45)We now use Cauchy-Schwarz to bound this as

1

|V (Gn)|∑

ij∈~E(Gn)

didj(1di>K + 1dj>K) (2.4.46)

≤ 2

|V (Gn)|√ ∑ij∈~E(Gn)

d2i1di>K

√ ∑ij∈~E(Gn)

d2j

= 2E[(d(Gn)

on)31d(Gn)

on >K | Gn]1/2E[(d(Gn)

on)3 | Gn]1/2.

By uniform integrability of (d(Gn)

on)3, there exists K = K(ε) and N = N(ε) such

that, for all n ≥ N ,

E[(d(Gn)

on)31d(Gn)

on >K] ≤ ε4/4. (2.4.47)

In turn, by the Markov inequality, this implies that

P(E[(d(Gn)

on)31d(Gn)

on >K | Gn] ≥ ε3/4)≤ 4

ε3E[(d(Gn)

on)31d(Gn)

on >K] ≤ ε. (2.4.48)

As a result, with probability at least 1− ε and for ε > 0 sufficiently small to ac-commodate the factor En[(d(Gn)

on)3]1/2 (which is uniformly bounded by the uniform

integrability of (d(Gn)

on)3),

1

|V (Gn)|∑

ij∈~E(Gn)

didj(1di>K + 1dj>K) ≤ ε3/2E[(d(Gn)

on)3 | Gn]1/2 ≤ ε. (2.4.49)

This completes the proof.

2.4.4 PageRank and local weak convergence

Recall the definition of PageRank from [Volume 1, Section 1.5]. Here we re-strict our discussion to undirected graphs, see Section 9.2 for a discussion of themore interesting setting of directed graphs. Let Gn = (V (Gn), E(Gn)) denote anundirected graph where every vertex has degree at least one. Then, we let thevector of PageRanks (Ri)i∈[n] be the unique solution to the equation

Ri = α∑

i,j∈E(Gn)

Rjdj

+ 1− α, (2.4.50)

satisfying the normalization ∑i∈V (Gn)

Ri = |V (Gn)|. (2.4.51)


The parameter α ∈ (0, 1) is called the damping factor, and guarantees that(2.4.50) has a unique solution. This solution can be understood in terms of the sta-tionary distribution of a bored surfer. Indeed, denote πi = Ri/n, so that (πi)i∈[n]

is a probability distribution that satisfies a similar relation as (Ri)i∈[n] in (2.4.50),namely

πi = α∑

i,j∈E(G)

πjdj

+1− α|V (Gn)| , (2.4.52)

Therefore, (πi)i∈V (Gn) is the stationary distribution of a random walker that, withprobability α jumps according to a simple random walk, i.e., it chooses any ofthe edges with equal probability, while with probability 1− α, the walker jumpsto a uniform location.

The damping factor α is quite crucial. When α = 0, the stationary distributionis just πi = 1/n for every i, so that all pages have PageRank 1. This is not veryinformative. On the other hand, PageRank concentrates on dangling ends whenα is close to one, and this is also not what we want. Experimentally, α = 0.85seems to work well and strikes a nice balance between these two extremes.

We next investigate the convergence of the PageRank distribution on an undi-rected graph sequence (Gn)n≥1 that converges locally weakly:

Theorem 2.24 (Existence of asymptotic PageRank distribution) Consider asequence of undirected random graphs (Gn)n∈N whose size tends to infinity.

(i) If Gn converges locally weakly, then there exists a limiting distribution R∅,with Eµ[R∅] ≤ 1, such that

R(Gn)

on

d−→ R∅. (2.4.53)

(ii) If Gn converges locally in probability, then there exists a limiting distributionR∅, with Eµ[R∅] ≤ 1, such that, for every r > 0,

1

|V (Gn)|∑

v∈V (Gn)

1R(Gn)v >r

P−→ µ(R∅ > r). (2.4.54)

See Section 9.2 for a discussion of the more interesting setting of directed graphs.There, we also extend the discussion of local weak convergence to directed graphs,which is not as trivial as one might expect.

2.5 The giant component is almost local

We continue by investigating the size of the giant component when the graphconverges in the local weak sense. Here, we will simplify the notation by assumingthat Gn = (V (Gn), E(Gn) is such that |V (Gn)| = n.

2.5 The giant component is almost local 65

2.5.1 Asymptotics of the giant

Clearly, the proportion of vertices in the largest connected component |Cmax|/nis not continuous in the local convergence topology (see Exercise 2.23), as it isa global object. In fact, also |C (on)|/n does not converge in distribution when

(Gn, on)d−→ (G, o). However, local convergence still tells us a useful story about

the existence of a giant, as well as its size:

Corollary 2.25 (Upper bound on the giant) Let (Gn)n≥1 be a sequence ofgraphs whose sizes |V (Gn)| = n tend to infinity. Assume that Gn converges locallyin probability to (G, o) ∼ µ. Write ζ = µ(|C (o)| =∞) for the survival probabilityof the limiting graph (G, o). Then, for every ε > 0 fixed,

P(|Cmax| ≤ n(ζ + ε))→ 1. (2.5.1)

In particular, Corollary 2.25 implies that |Cmax|/n P−→ 0 when ζ = 0 (seeExercise 2.24).

Proof Define

Z≥k =∑

v∈V (Gn)

1|C (v)|≥k. (2.5.2)

Assume that Gn converges locally in probability to (G, o). Then, we concludethat with ζ≥k = µ(|C (o)| ≥ k) (see Exercise 2.25),

Z≥kn

P−→ ζ≥k. (2.5.3)

For every k ≥ 1,

|Cmax| ≥ k = Z≥k ≥ k, (2.5.4)

and, on the event that Z≥k ≥ 1, also |Cmax| ≤ Z≥k. Thus, for every k ≥ 1 andevery ε > 0, and all n large enough,

P(|Cmax| ≥ n(ζ≥k + ε/2)) = o(1). (2.5.5)

Therefore, with ζ = limk→∞ ζ≥k = µ(|C (o)| =∞), and for every ε > 0,

P(|Cmax| ≥ n(ζ + ε)) = o(1). (2.5.6)

We conclude that while local convergence cannot determine the size of thelargest connected component, it can prove an upper bound on |Cmax|. In this book,

we will often extend this to |Cmax|/n P−→ ζ, but this is no longer a consequenceof local convergence alone. In Exercise 2.26, you are asked to give an examplewhere |Cmax|/n P−→ a < ζ. Therefore, in general, more involved arguments mustbe used. We next prove that one, relatively simple, condition suffices:


Theorem 2.26 (The giant is almost local) Let Gn = ([n], E(Gn)) denote afinite (possibly disconnected) random graph. Assume that Gn converges locally inprobability to (G, o) ∼ µ. Assume that

limk→∞

lim supn→∞

1

n2E[#

(x, y) ∈ V (Gn) : |C (x)|, |C (y)| ≥ k, x←→/ y]

= 0. (2.5.7)

Then, with Cmax and C(2) denoting the largest and second largest connected com-ponents (with ties broken arbitrarily),

|Cmax|n

P−→ ζ = µ(|C (o)| =∞),|C(2)|n

P−→ 0. (2.5.8)

Theorem 2.26 shows that a relatively mild condition as in (2.5.7) suffices forthe giant to have the expected limit. In fact, it is necessary and sufficient, as youcan see in Exercise 2.28. Theorem 2.26 will be most useful when we can easilyshow that vertices with large clusters are likely to be connected.

We now start with the proof of Theorem 2.26. We first note that, by Corollary2.25, it suffices to prove Theorem 2.26 for ζ > 0, which we assume from now on.

We recall that the vector (|C(i)|)i≥1 denotes the cluster sizes ordered in size,from large to small with ties broken arbitrarily, so that C(1) = Cmax. The followinglemma gives a useful estimate on the sum of squares of these ordered cluster sizes.In its statement, we write Xn,k = ok,P(1) when

limk→∞

lim supn→∞

P(|Xn,k| > ε) = 0. (2.5.9)

Lemma 2.27 (Convergence of sum of squares of cluster sizes) Under the con-ditions of Theorem 2.26,

1

n2

∑i≥1

|C(i)|2 P−→ ζ2, (2.5.10)

and

1

n2

∑i≥1

|C(i)|21|C(i)|≥k = ζ2 + ok,P(1). (2.5.11)

Proof We use that, by local convergence in probability and for any k ≥ 1 fixed(recall (2.5.3))

1

n

∑i≥1

|C(i)|1|C(i)|≥k =1

n

∑v∈[n]

1|C (v)|≥k =1

nZ≥k

P−→ ζ≥k, (2.5.12)

where we recall that ζ≥k = P(|C (o)| ≥ k). Then we obtain that

1

n

∑i≥1

|C(i)|1|C(i)|≥k = ζ + ok,P(1). (2.5.13)


Further,

1

n2#

(x, y) ∈ V (Gn) : |C (x)|, |C (y)| ≥ k, x←→/ y

(2.5.14)

=1

n2

∑i,j≥1

i6=j

|C(i)||C(j)|1|C(i)|,|C(j)|≥k.

By the Markov inequality,

limk→∞

lim supn→∞

P(1

n2#

(x, y) ∈ V (Gn) : |C (x)|, |C (y)| ≥ k, x←→/ y≥ ε) (2.5.15)

≤ limk→∞

lim supn→∞

1

εn2E[#

(x, y) ∈ V (Gn) : |C (x)|, |C (y)| ≥ k, x←→/ y]

= 0,

by our main assumption in (2.5.7). As a result,

1

n2#

(x, y) ∈ V (Gn) : |C (x)|, |C (y)| ≥ k, x←→/ y

= ok,P(1). (2.5.16)

We conclude that

1

n2

∑i≥1

|C(i)|21|C(i)|≥k =( 1

n

∑i≥1

|C(i)|1|C(i)|≥k

)2

+ ok,P(1) (2.5.17)

= ζ2 + ok,P(1),

by (2.5.12). This proves (2.5.11). Finally,

1

n2

∑i≥1

|C(i)|21|C(i)|<k ≤ k1

n2

∑i≥1

|C(i)|1|C(i)|<k ≤k

n, (2.5.18)

so that1

n2

∑i≥1

|C(i)|2 =1

n2

∑i≥1

|C(i)|21|C(i)|≥k +O(kn

), (2.5.19)

which, together with (2.5.11), completes the proof of (2.5.10).

We are now ready to complete the proof of Theorem 2.26, and we start by ex-plaining the ideas behind this. Define the (possibly random) probability measure(qi,n)i≥1 by

qi,n =|C(i)|1|C(i)|≥k∑j≥1 |C(j)|1|C(j)|≥k

. (2.5.20)

By (2.5.12),∑

j≥1 |C(j)|1|C(j)|≥k/n = ζ+ok,P(1), and thus, by (2.5.11),∑

i≥1 q2i,n =

1 + ok,P(1). We conclude that we have a probability mass function such that itssum of squares is close to 1. This is only possible when maxi≥1 qi,n = 1 + ok,P(1).Since i 7→ qi,n is non-increasing, this means that

q1,n =|Cmax|1|Cmax|≥k∑j≥1 |C(j)|1|C(j)|≥k

= 1 + ok,P(1),

which, together with∑

j≥1 |C(j)|1|C(j)|≥k = nζ(1+ok,P(1)) implies that |Cmax|/n P−→


ζ. Further, q2,n = ok,P(1), which implies that |C(2)|/n P−→ 0. This is what we aimto prove. We now fill in the details:

Proof of Theorem 2.26. By Lemma 2.27,

1

n2

∑i≥1

|C(i)|2|1|C(i)|≥k = ζ2 + ok,P(1). (2.5.21)

Denote xi,n = |C(i)|1|C(i)|≥k/n. We wish to show that x1,n = ζ+ok,P(1). Clearlyx1,n ≤ ζ + ok,P(1). Since x1,n is bounded, it has a subsequence that converges inprobability. Now suppose by contradiction that, along such a subsequence, thatx1,n = |C(1)|1|C(1)|≥k/n

P−→ ak, where ak → a = ζ − δ for some δ > 0, sothat x1,n = ζ − δ + ok,P(1). Let us work along this subsequence. Then, also using(2.5.21),∑i≥2

(xi,n)2 =∑i≥1

x2i,n−x2

1,n = ζ2− (ζ−δ)2 +ok,P(1) = δ(2ζ−δ)+ok,P(1). (2.5.22)

Also,∑

i≥2 xi,n = δ + ok,P(1). As a result,

δ2 + ok,P(1) =(∑i≥2

xi,n)2

=∑i≥2

x2i,n +

∑i,j≥2: i6=j

xi,nxj,n. (2.5.23)

Clearly, by our main assumption (2.5.7),∑i,j≥2: i 6=j

xi,nxj,n ≤(∑i≥1

xi,n)2

−∑i≥1

x2i,n = ok,P(1), (2.5.24)

so that combining (2.5.22)–(2.5.24) leads to δ2 = δ(2ζ − δ) which gives δ =ζ, which implies x1,n = ok,P(1). By (2.5.11) and xi,n ≤ x1,n, we get ζ = 0,contradicting the assumption ζ > 0 in the statement of the theorem.

2.5.2 Properties of the giant

We next extend Theorem 2.26 somewhat, and investigate the structure of thegiant. For this, we first let vk(Cmax) denote the number of vertices with degree kin the giant component, and we recall that E(Cmax) denotes the number of edgesin the giant component:

Theorem 2.28 (Properties of the giant) Under the assumptions of Theorem2.26, when ζ = µ(|C (o)| =∞) > 0,

v`(Cmax)

nP−→ µ(|C (o)| =∞, do = `). (2.5.25)

Further, assume that Dn = d(n)

onis uniformly integrable. Then,

|E(Cmax)|n

P−→ 12Eµ[do1|C (o)|=∞

]. (2.5.26)


Proof We now define, for k ≥ 1 and A ⊆ N and with dv the degree of v in Gn,

ZA,≥k =∑v∈[n]

1|C (v)|≥k,dv∈A. (2.5.27)

Assume that Gn converges in probability in the local weak sense to (G, o). Then,we conclude that with ζA,≥k = µ(|C (o)| ≥ k, do ∈ A),

ZA,≥kn

P−→ µ(|C (o)| ≥ k, do ∈ A). (2.5.28)

Since |Cmax| ≥ k whp, we thus obtain, for every A ⊆ N,

1

n

∑a∈A

va(Cmax) ≤ ZA,≥kn

P−→ µ(|C (o)| ≥ k, do ∈ A). (2.5.29)

Applying this to A = `c, we obtain that, for all ε > 0,

limn→∞

P( 1

n

[|Cmax| − v`(Cmax)

]≤ µ(|C (o)| ≥ k, do 6= `) + ε/2

)= 1. (2.5.30)

We argue by contradiction. Suppose that, for some `,

lim infn→∞

P(v`(Cmax)

n≤ µ(|C (o)| =∞, do = `)− ε

)= κ > 0. (2.5.31)

Then, along the subsequence (nl)l≥1 that attains the liminf in (2.5.31), withstrictly positive probability, and using (2.5.30),

|Cmax|n

=1

n[|Cmax| − v`(Cmax)] +

v`(Cmax)

n≤ µ(|C (o)| ≥ k)− ε/2, (2.5.32)

which contradicts Theorem 2.26. We conclude that (2.5.31) cannot hold, so that(2.5.25) follows.

For (2.5.26), we note that

|E(Cmax)| = 12

∑`≥1

`v`(Cmax). (2.5.33)

We divide by n and split the sum over ` into small and large ` as

|E(Cmax)|n

=1

2n

∑`∈[K]

`v`(Cmax) +1

2n

∑`>K

`v`(Cmax). (2.5.34)

For the first term in (2.5.34), by (2.5.25),

12n

∑`∈[K]

`v`(Cmax)P−→ 1

2

∑`∈[K]

µ(|C (o)| =∞, do = `) (2.5.35)

= 12Eµ[do1|C (o)|=∞,do∈[K]

].

For the second term in (2.5.34), we bound, with n` the number of vertices in Gn

of degree `,

1

2n

∑`>K

`v`(Cmax) ≤ 1

2

∑`>K

`n`n

=1

2E[d(n)

on1d(n)

on >K| Gn

]. (2.5.36)


By uniform integrability,

limK→∞

lim supn→∞

E[d(n)

on1d(n)

on >K

]= 0. (2.5.37)

As a result, for every ε > 0, there exists a K = K(ε) <∞ such that

P(En[d(n)

on1d(n)

on >K

]> ε

)→ 0. (2.5.38)

This completes the proof of (2.5.26).

It is not hard to extend the above analysis to the local convergence in proba-bility of the giant, as well as its outside, as formulated in the following theorem:

Theorem 2.29 (Local limit of the giant) Under the assumptions of Theorem2.26, when ζ = µ(|C (o)| =∞) > 0,

1

n

∑v∈Cmax

1B(Gn)r (v)'H?

P−→ µ(|C (o)| =∞, B(G)

r (o) ' H?), (2.5.39)

and

1

n

∑v 6∈Cmax

1B(Gn)r (v)'H?

P−→ µ(|C (o)| <∞, B(G)

r (o) ' H?). (2.5.40)

Proof The convergence in (2.5.40) follows from that in (2.5.39) combined withthe fact that, by assumption,

1

n

∑v∈V (Gn)

1B(Gn)r (v)'H?

P−→ µ(B(G)

r (o) ' H?). (2.5.41)

The convergence in (2.5.40) can be proved as for Theorem 2.28, now using that

1

nZH?,≥k =

1

n

∑v∈[n]

1|C (v)|≥k,B(Gn)r (v)∈H?

P−→ µ(|C (o)| ≥ k,B(G)

r (o) ∈H?),

(2.5.42)

and, since |Cmax|/n P−→ ζ > 0,

1

n

∑v∈V (Gn)

1B(Gn)r (v)∈H? ≤ ZH?,≥k. (2.5.43)

We leave the details to the reader.

2.5.3 The condition (2.5.7) revisited

The condition (2.5.7) is sometimes not so easy to verify, and we now give analternative form that is often easier to check. That is the content of the followinglemma:


Lemma 2.30 (Condition (2.5.7) revisited) Under the assumptions of Theorem2.26, the condition in (2.5.7) holds when

limr→∞

lim supn→∞

1

n2E[#

(x, y) ∈ V (Gn) : |∂Br(x)|, |∂Br(x)| ≥ r, x←→/ y]

= 0,

(2.5.44)when there exists r = rk →∞ such that

µ(|C (o)| ≥ k, |∂Br(o)| < r)→ 0, µ(|C (o)| < k, |∂Br(o)| ≥ r)→ 0. (2.5.45)

Proof Denote

Pk = #

(x, y) ∈ V (Gn) : |C (x)|, |C (y)| ≥ k, x←→/ y, (2.5.46)

P (2)

r = #

(x, y) ∈ V (Gn) : |∂Br(x)|, |∂Br(x)| ≥ r, x←→/ y. (2.5.47)

Then,

|Pr − P (2)

k | ≤ 2n[Z<r,≥k + Z≥r,<k

], (2.5.48)

where

Z<r,≥k =∑

v∈V (Gn)

1|∂Br(v)|<r,|C (v)|≥k, Z≥r,<k =∑

v∈V (Gn)

1|∂Br(v)|≥r,|C (v)|<k.

(2.5.49)Therefore, by local convergence in probability,

1

n2|Pk − P (2)

r | ≤2

n

[Z<r,≥k + Z≥r,<k

](2.5.50)

P−→ 2µ(|C (o)| ≥ k, |∂Br(o)| < r) + 2µ(|C (o)| < k, |∂Br(o)| ≥ r).Take r = rk as in (2.5.45). Then, the right hand side vanishes, so that, by Domi-nated Convergence [Volume 1, Theorem A.1] also

limk→∞

lim supn→∞

1

n2E[|Pk − P (2)

rk|]

= 0. (2.5.51)

We arrive at

limk→∞

lim supn→∞

1

n2E[Pk]≤ lim

k→∞lim supn→∞

1

n2E[P (2)

rk

]= 0, (2.5.52)

by (2.5.44) and since rk →∞ when k →∞.

The assumption in (2.5.45) is often easily verified. For example, for the Erdos-Renyi random graph ERn(λ/n) with λ > 1, to which we will apply it below, wecan take r = k and use that on the event of survival (recall [Volume 1, Theorem3.9]),

λ−r|∂B(G)

r (o)| a.s.−→M. (2.5.53)

Here (G, o) denotes a Poisson branching process with mean λ offspring, andwhere M > 0 on the event of survival by [Volume 1, Theorem 3.10]. Therefore,µ(|C (o)| ≥ k, |∂B(G)

k (o)| < k)→ 0 as k →∞. Further, µ(|C (o)| < k, |∂B(G)

k (o)| ≥k) = 0 trivially. However, there are examples where (2.5.45) fails, and then alsothe equivalence of (2.5.7) and (2.5.44) may be false.


2.5.4 The giant in Erdos-Renyi random graphs

In this section, we use the local convergence in probability of ERn(λ/n) inTheorem 2.16, combined with the fact that the ‘giant is almost local’ in Theorem2.26, to identify the phase transition and size of the giant in the Erdos-Renyirandom graph ERn(λ/n):

Theorem 2.31 (Phase transition Erdos-Renyi random graph) Fix λ > 0, andlet Cmax be the largest connected component of the Erdos-Renyi random graphERn(λ/n), and C(2) the second largest connected component (breaking ties arbi-trarily). Then,

|Cmax|n

P−→ ζλ,|C(2)|n

P−→ 0, (2.5.54)

where ζλ is the survival probability of a Poisson branching process with meanoffspring λ. In particular, ζ > 0 precisely when λ > 1.

Further, for λ > 0, with ηλ = 1− ζλ and for all ` ≥ 0,

v`(Cmax)

nP−→ e−λ

λ`

`![1− η`λ], (2.5.55)

and|E(Cmax)|

nP−→ 1

2λ[1− η2

λ]. (2.5.56)

The law of large numbers for the giant in Theorem 2.31 for λ > 1 has also beenproved in [Volume 1, Theorem 4.8], where a more precise bound was given on theconvergence rate. There, the proof was given using explicit computations, here weshow that it follows rather directly from local weak convergence considerations.We refer to [Volume 1, Section 4.6] for a discussion of the history of the phasetransition for ERn(λ/n).

Proof The main work will reside in showing that the condition (2.5.7) in The-orem 2.26 holds. In turn, (2.5.7) in Theorem 2.26 can be replaced by (2.5.44) inLemma 2.5.44, which turns out to be more convenient as stated there.

Indeed, local convergence in probability follows from Theorem 2.16. Then, theclaim in (2.5.54) follows directly from Theorem 2.26, while (2.5.55)–(2.5.56) followfrom Theorem 2.28, and the observations that, for a Poisson branching processwith mean offspring λ > 1,

µ(|C (o)| =∞, do = `) = e−λλ`

`![1− η`λ], (2.5.57)

and thus

Eµ[do1|C (o)|=∞

]=∑`

`µ(|C (o)| =∞, do = `) (2.5.58)

=∑`

`e−λλ`

`![1− η`λ] = λ[1− ηλe−λ(1−ηλ)],

= λ[1− η2λ],


since ηλ satisfies ηλ = e−λ(1−ηλ). Therefore, we are left to show that (2.5.44) holds.The proof proceeds in several steps.

Step 1: Concentration of binomials

Note that, with o1, o2 ∈ [n] chosen independently and uniformly at random,

1

n2E[#

(x, y) ∈ V (Gn) : |∂Br(x)|, |∂Br(x)| ≥ r, x←→/ y]

(2.5.59)

= P(|∂Br(o1)|, |∂Br(o2)| ≥ r, o1 ←→/ o2).

We let Pr denote the conditional distribution of ERn(λ/n) given |∂Br(o1)| = b(i)

0

with b(i)

0 ≥ r, and |Br−1(oi)| = s(i)

0 , so that

P(|∂Br(o1)|, |∂Br(o2)| ≥ r, o1 ←→/ o2) (2.5.60)

=∑

b(1)0 ,b

(2)0 ,s

(1)0 ,s

(2)0

Pr(o1 ←→/ o2

)P(|∂Br(oi)| = b(i)

0 , |Br−1(oi)| = s(i)

0 , i ∈ 1, 2).

Our aim is to show that, for every ε > 0, we can find r = rε such that, for everyb(1)

0 , b(2)

0 ≥ r and s(1)

0 , s(2)

0 fixed,

lim supn→∞

Pr(o1 ←→/ o2

)≤ ε. (2.5.61)

Under Pr,|∂Br+1(o1)| ∼ Bin(n(1)

1 , p(1)

1 ), (2.5.62)

where

n(1)

1 = n− b(1)

0 − s(1)

0 − s(2)

0 , p(1)

1 = 1−(1− λ

n

)b(1)0

. (2.5.63)

Here, we note that the vertices in ∂Br(o2) play a different role than those in∂Br(o1), as they can be in ∂Br+1(o1), but those in ∂Br(o1) cannot. This explainsthe slightly asymmetric form with respect to vertices 1 and 2 in (2.5.63). Weare lead to studying concentration properties of binomial random variables. Forthis, we will rely on the following lemma, which follows from [Volume 1, Theorem2.21]:

Lemma 2.32 (Concentration binomials) Let X ∼ Bin(m, p). Then, for everyδ > 0,

P(∣∣X − E[X]

∣∣ ≥ δE[X])≤ 2 exp

(− δ2E[X]

2(1 + δ/3)

). (2.5.64)

Proof This is a direct consequence of [Volume 1, Theorem 2.21].

Lemma 2.32 ensures that whp the boundary of |∂Br+1(o1)| is close to λ|∂Br(o1)|,so that the boundary grows by a factor λ > 1. Further applications lead to thestatement that |∂Br+k(o1)| ≈ λk|∂Br(o1)|. Thus, in roughly a logλ n steps, theboundary will have expanded to na vertices. However, in order to make this pre-cise, we need that (1) the sum of complementary probabilities in Lemma 2.32 is


still quite small; and (2) we have good control over the number of vertices in theboundaries, not just in terms of lower bounds, but also in terms of upper bounds,as that will give control over the number of vertices that have not yet been used.For the latter, we will also need to deal with the ε-dependence in (2.5.64).

We will prove (2.5.61) by first growing |∂Br+k(o1)| for k ≥ 1 until |∂Br+k(o1)|is very large (much larger than

√n will do), and then, outside of Br+k(o1) for the

appropriate k, growing |∂Br+k(o2)| for k ≥ r until also |∂Br+k(o2)| is very large(now

√n will do). Then, it is very likely that there is a direct edge between the

resulting boundaries. We next provide the details.

Step 2: Erdos-Renyi neighborhood growth of first vertex

To make the above analysis precise, we start by introducing some notation. We

let b(1)

k = b(1)

0 [λ(1 + ε)]k and b(1)

k = b(1)

0 [λ(1 − ε)]k denote the upper and lowerbounds on |∂Br+k(o1)|, and let

s(1)

k−1 = s(1)

0 +k−1∑l=0

b(1)

l (2.5.65)

denote the resulting upper bound on |Br+k−1(o1)|. We let k ≤ kε = kε(n) =a logλ(1−ε) n, and note that

s(i)

k−1 ≤b(i)

0

1− λ(1 + ε)na(1+ε)/(1−ε) (2.5.66)

uniformly in k ≤ kε. We will choose a ∈ ( 12, 1) so that a(1 + ε)/(1− ε) ∈ ( 1

2, 1).

Define the good event by

E (1)

r,[k] =⋂l∈[k]

E (1)

r,l , where E (1)

r,k = b(1)

k ≤ |∂Br+k(o1)| ≤ b(1)

k . (2.5.67)

We then bound, for k = kε,

Pr(E (1)

r,[k]

)=∏l∈[k]

Pr(E (1)

r,l | E (1)

r,[l]

), (2.5.68)

so that

Pr(E (1)

r,[k]

)≥ 1−

∑l∈[k]

Pr(

(E (1)

r,l )c | E (1)

r,[l]

). (2.5.69)

With the above choices, we note that, conditionally on |∂Br+l−1(o1)| = b(1)

l−1 ∈[b(1)

l−1, b(1)

l−1] and |Br+l−1(o1)| = s(1)

l−1 ≤ s(1)

l−1,

|∂Br+l(o1)| ∼ Bin(n(1)

l , p(1)

l ), (2.5.70)

where

n(1)

l = n− s(1)

l−1 − s(2)

0 , p(1)

l = 1−(1− λ

n

)b(1)l−1

. (2.5.71)


We aim to apply Lemma 2.32 to |∂Br+l(o1)| and with δ = ε/2, for which it sufficesto prove bounds on the (conditional) expectation n(1)

l p(1)

l . We use that

b(1)

l−1λ

n− (b(1)

l−1λ)2

2n2≤ p(1)

l ≤b(1)

l−1λ

n. (2.5.72)

Therefore,

n(1)

l p(1)

l ≤ nb(1)

l−1λ

n= λb(1)

l−1, (2.5.73)

which provides the upper bound on n(1)

l p(1)

l . For the lower bound, we use thelower bound in (2.5.72) to note that p(1)

l ≥ (1−ε/4)λb(1)

l−1/n, since we are on E (1)

r,[l].

Further, n(1)

l ≥ (1− ε/4)n on E (1)

r,[l], uniformly in l ≤ kε. We conclude that

n(1)

l p(1)

l ≥ (1− ε/4)2nb(1)

l−1λ

n≥ (1− ε/2)λb(1)

l−1. (2.5.74)

By Lemma 2.32 with δ = ε/2, therefore,

Pr(

(E (1)

r,l )c | E (1)

r,[l]

)(2.5.75)

≤ Pr(∣∣|∂Br+l(o1)| − E[|∂Br+l(o1)| | E (1)

r,[l]]∣∣ ≥ (ε/2)E[|∂Br+l(o1)| | E (1)

r,[l]]

∣∣∣∣ E (1)

r,[l]

)≤ 2 exp

(−ε

2(1− ε/2)λb(1)

l−1

8(1 + ε/6)

)= 2 exp

(−qλb(1)

l−1

),

where q = ε2(1− ε/2)/[8(1 + ε/6)] > 0.We conclude that

Pr(E (1)

r,[k]

)≥ 1− 2

k−1∑l=1

e−qλb(1)l−1 . (2.5.76)

Step 3: Erdos-Renyi neighborhood growth of second vertex

We next grow the neighborhoods from vertex 2 in a similar way, and we focus onthe differences only. In the whole argument below, we condition on |∂Br+l(o1)| =b(1)

l ∈ [b(1)

l , b(1)

k ] and |Br+l(o1)| = s(1)

l ≤ s(1)

l for all l ≤ kε. Further, rather than ex-ploring (∂Br+k(o2))k≥0, we will explore these neighborhoods outside of Br+k(o1),and will denote the corresponding neighborhoods by (∂B′r+k(o2))k≥0.

We define

E (2)

r,[k] =⋂l∈[k]

E (2)

r,l , where E (2)

r,k = b(2)

k ≤ |∂B′r+k(o2)| ≤ b(2)

k . (2.5.77)

We then note that, conditionally on the above, as well as on |∂B′r+k−1(o2)| =

b(2)

k−1 ∈ [b(2)

k−1, b(2)

k−1] and |B′r+k−1(o2)| = s(2)

k−1 ≤ s(2)

k−1,

|∂B′r+k(o2)| ∼ Bin(n(2)

k , p(2)

k ), (2.5.78)

where now

n(2)

k = n− s(1)

kε− s(2)

k−1, p(2)

k = 1−(1− λ

n

)b(2)k−1

. (2.5.79)


Let Pr,k denote the conditional probability given |∂Br(o1)| = b(i)

0 with b(i)

0 ≥ r,and |Br−1(oi)| = s(i)

0 and E (1)

r,[k]. Following the argument in the previous paragraph,we are lead to the conclusion that

lim infn→∞

Pr,k(E (2)

r,[k]

)≥ 1− 2

k−1∑l=1

e−qλb(2)l−1 , (2.5.80)

where again q = ε2(1− ε/2)/[8(1 + ε/6)] > 0.

Completion of the proof of Theorem 2.31

We use Lemma 2.5.44, and conclude that we need to show (2.5.61). Recall (2.5.80),and that b(i)

k = b(i)

0 [λ(1 − ε)]k, where b(i)

0 ≥ r = rε. Fix k = kε = kε(n) =a logλ(1−ε) n as above. For this, k, take r = rε so large that

k−1∑l=1

[e−qλb(1)l−1 + e−qλb

(2)l−1 ] ≤ ε/2. (2.5.81)

Denote the good event by

Erε,[kε] = E (1)

rε,[kε]∩ E (2)

rε,[kε], (2.5.82)

where we recall (2.5.67) and (2.5.77). Then,

lim infn→∞

Pr(Erε,[kε]

)≥ 1− ε. (2.5.83)

On Erε,[kε], we have that |∂Br+kε(o1)| ≥ b(1)

kε= b(1)

0 [λ(1− ε)]k ≥ rεna, where we

choose a ∈ ( 12, 1) so that a ∈ ( 1

2, 1). An identical bound holds for |∂B′r+kε(o2)|.

We use that, on Erε,[kε],

|∂Brε+kε(o1)| ≥ b(1)

kε≥ rε[λ(1− ε)]kε = rεn

a, (2.5.84)

and the same bound holds for |∂B′rε+kε(o2)|. Therefore, the number of directedges between ∂Brε+kε(o1) and ∂B′rε+kε(o2) is at least (rεn

a)2 n when a > 12.

Therefore,

Pr(

distERn(λ/n)(o1, o2) > 2(kε + rε) + 1 | Erε,[kε])≤(1− λ

n

)(rεna)2

= o(1). (2.5.85)

We conclude that

Pr(

distERn(λ/n)(o1, o2) ≤ 2(kε + rε) + 1 | Erε,kε)

= 1− o(1). (2.5.86)

Thus, by (2.5.83),

Pr(o1 ←→/ o2) ≤ Pr(Ecrε,kε

)+ Pr

(o1 ←→/ o2 | Erε,kε

)≤ ε/2 + ε/2 ≤ ε. (2.5.87)

Since ε > 0 is arbitrary, the claim in (2.5.61) follows.


The small-world nature of ERn(λ/n)

In the above proof, we have also identifies an upper bound on the typical distancein ERn(λ/n), that is, the graph distance in ERn(λ/n) between o1 and o2, asformulated in the following theorem:

Theorem 2.33 (Small-world nature Erdos-Renyi random graph) Consider ERn(λ/n)with λ > 1. Then, conditionally on o1 ←→ o2,

distERn(λ/n)(o1, o2)

log nP−→ 1

log λ. (2.5.88)

Proof The lower bound follows directly from (2.3.35), which implies that

P(distERn(λ/n)(o1, o2) ≤ k

)= E

[|Bk(o1)|/n

]≤

k∑l=0

λl

n=λk+1 − 1

n(λ− 1). (2.5.89)

Applying this to k = (1− ε) logλ n shows that

P(distERn(λ/n)(o1, o2) ≤ (1− ε) logλ n

)→ 0 (2.5.90)

For the upper bound, we start by noting that

P(o1 ←→ o2 | ERn(λ/n)) =1

n2

∑i≥1

|C(i)|2 P−→ ζ2λ, (2.5.91)

by (2.5.10) in Lemma 2.27. Thus, conditioning on o1 ←→ o2 is asymptoticallythe same as o1, o2 ∈ Cmax. The upper bound then follows from the fact that theevent Erε,[kε] in (2.5.82) holds with probability at least 1 − ε, and, on the eventErε,[kε],

distERn(λ/n)(o1, o2) ≤ 2(kε + rε) + 1, (2.5.92)

whp by (2.5.86).

2.5.5 Outline of the remainder of this book

The results proved for the Erdos-Renyi random graph complete the preliminar-ies for this book in Part I. They further allow us to provide a brief glimpse intothe content of the remainder of this book. There, we will prove results about localconvergence, the existence of the giant component, and the small-world natureof various random graph models. We focus on inhomogeneous random graphssuch as the generalized random graph, on the configuration model and its relateduniform random graph with prescribed degrees, and the preferential attachmentmodel.

In Part II, consisting of Chapters 3, 4 and 5, we focus on local convergenceas in Theorem 2.16, but then applied to these models. Further, we investigatethe size of the giant component as in Theorems 2.26 and 2.31. The proofs areoften more involved than the corresponding results for the Erdos-Renyi randomgraph, since these random graphs models are inhomogeneous, and often also lack


the independence of the edge statuses. Therefore, the proofs will require detailedknowledge of the models in question.

In Part III, consisting of Chapters 6, 7 and 8, we focus on the small-worldnature of these random graph models, as in Theorem 2.33. It turns out that theexact scaling of the graph distances depends on level of inhomogeneity present inthe random graph model. In particular, we will see that when the second momentof the degrees remains bounded, then graph distances grow logarithmically as inTheorem 2.33. If, on the other hand, the second moment blows up with the graphsize, then distances are smaller. In particular, these typical distances are doublylogarithmic when the degrees obey a power-law with exponent τ that satisfiesτ ∈ (2, 3), so that even a moment of order 2−ε is infinite for some ε > 0. Anyonewho has done some numerical work will realize that there, in practice, is littledifference between log log n and a constant, even when n is quite large.

One of the main conclusions of the local convergence results in Part II is thatthe most popular random graph models for inhomogeneous real-world networksare locally tree-like, in that the majority of neighborhoods of vertices have nocycles. In many settings this is not realistic. Certainly in social networks, manytriangles and even cliques of larger sizes exist. Therefore, in Part IV, consistingof Chapter 9, we investigate some adaptations of the models discussed in PartsII and III. These models may incorporate clustering, community structure, maybe directed or be living in a geometric space. All these aspects have receivedtremendous attention in the literature. Therefore, with Part IV in hand, thereader will be able to access the literature more easily.



The discussion in this and the following section follow Aldous and Steele (2004)and Benjamini and Schramm (2001). We refer to Appendix A.2 for proofs ofvarious properties of the metric dG?

on rooted graphs, including the fact that itturns G? into a Polish space.


Various generalizations of local weak convergence are possible. For example, Al-dous and Steele (2004) introduce the notion of geometric rooted graphs, whichare rooted graphs where each edge e receives a weight `(e), turning the rootedgraph into a metric space itself. Benjamini, Lyons and Schramm (2015) allowfor more general marks. These marks are associated to the vertices as well asthe edges, and can take values in a general complete separable metric space,which are called marked graphs. Such more general set ups are highly relevantin many applications, for example when dealing with general inhomogeneousgraphs. Let us explain this in some more detail. A marked graph is a (multi-)graph G = (V (G), E(G)) together with a complete separable metric space Ξ,


called the mark space. Here Ξ maps from V (G) and E(G) to Ξ. Images in Ξare called marks. Each edge is given two marks, one associated to (‘at’) each ofits endpoints. The only assumption on degrees is that they are locally finite. Weomit the mark maps from our notation for networks. We next extend the metricto the above setting of marked graphs. Let the distance between (G1, o1) and(G2, o2) be 1/(1 + α), where α is the supremum of those r > 0 such that there issome rooted isomorphism of the balls of (graph-distance) radius brc around theroots of Gi, such that each pair of corresponding marks has distance less than

1/r. For probability measures µn, µ on G?, we write µnd−→ µ as n → ∞ when

µn converges weakly to µ with respect to this metric. See Exercise 2.17 for anapplication of marked graphs to directed graphs.

Aldous and Lyons (2007) also study the implications for stochastic processes,such as percolation and random walks, on unimodular graphs.

The tightness statement in Theorem 2.7 is (Benjamini et al., 2015, Theorem3.1). Benjamini et al. (2015) use the term network instead of a marked graph.We avoid the term networks here, as it may cause confusion with the complexnetworks in the real world that form inspiration for this book. A related proofcan be found in Angel and Schramm (2003).

Aldous (1991) investigates local weak convergence in the context of finite trees,and calls the trees rooted at a uniform vertex fringe trees. For fringe trees, theuniform integrability of the degree of a random vertex is equivalent to tightnessof the resulting tree in the local weak sense (see (Aldous, 1991, Lemma 4(ii))).

Dembo and Montanari (2010b) define a version of local weak convergence interms of convergence of subgraph counts (see (Dembo and Montanari, 2010b,Definition 2.1)), and also states that this holds for several models, includingERn(λ/n) and CMn(d) under appropriate conditions, while Dembo and Mon-tanari (2010a) provides more details. See e.g., (Dembo and Montanari, 2010a,Lemma 2.4) for a proof for the configuration model, and (Dembo and Montanari,2010a, Proposition 2.6) for a proof for the Erdos-Renyi random graph.


We thank Shankar Bhamidi for useful discussions about possibly random limitsin Definition 2.9. As far as we could tell, a precise definition of local convergencein its various senses has not appeared explicitly in the literature.

An intuitive analysis of P(B(Gn)

r (1) = t) in (2.3.28) has already appeared in[Volume 1, Section 4.1.2], see in particular [Volume 1, (4.1.12)].


While many of the results in this section are ‘folklore’, finding appropriate refer-ences for the results stated is not obvious.

Theorems 2.19 and 2.20 discuss the convergence of two clustering coefficients.In the literature, also the clustering spectrum has attracted attention. For this,we recall that nk denotes the number of vertices with degree k in Gn, and define


the clustering coefficient of vertices of degree k to be

cGn(k) =1

nk

∑v∈[n] : d

(n)v =k

∆Gn(v)

k(k − 1). (2.6.1)

It is not hard to adapt the proof of Theorem 2.20 to show that, under its as-sumptions, cGn(k)

P−→ cG(k), where

cG(k) = Eµ[ ∆G(o)

k(k − 1)| do = k

]. (2.6.2)

See Exercise 2.32.The convergence of the assortativity coefficient in Theorem 2.23 is restricted to

degree distributions that have uniformly integrable third moments. In general, anempirical correlation coefficient needs a finite variance of the random variablesto converge to the correlation coefficient. Litvak and the author (see van derHofstad and Litvak (2014) and (2013)) proved that when the random variablesdo not have finite variance, such convergence (even for an i.i.d. sample) can beto a proper random variable, that has support containing a subinterval of [−1, 0]and a subinterval of [0, 1], giving problems in the interpretation.

For networks, ρG in (2.4.36) is always well defined, and gives a value in [−1, 1].However, also for networks there is a problem with this definition. Indeed, vander Hofstad and Litvak (2014) and (2013) prove that if a limiting value of ρGn ex-ists for a sequence of networks and the third moment of the degree of a randomvertex is not uniformly integrable, then lim infn→∞ ρGn ≥ 0, so no asymptoti-cally disassortative graph sequences exist. Naturally, other ways of classifyingthe degree-degree dependence can be proposed, such as the correlation of theirranks. Here, for a sequence of numbers x1, . . . , xn with ranks r1, . . . , rn, xi is therith largest of x1, . . . , xn. Ties tend to be broken by giving random ranks for theequal values. For practical purposes, maybe a scatter plot of the values might bethe most useful way to gain insight into degree-degree dependencies.


The results in this section are new.


Exercise 2.1 (Graph isomorphisms fix vertex and edge numbers) Assume thatG1 ' G2. Show that G1 and G2 have the same number of vertices and edges.

Exercise 2.2 (Graph isomorphisms fix degree sequence) Let G1 and G2 be twofinite graphs. Assume that G1 ' G2. Show that G1 and G2 have the same degreesequences. Here, for a graph G, we let the degree sequence be (pk(G))k≥0, where

pk(G) =1

|V (G)|∑

v∈V (G)

1dv(G)=k, (2.7.1)

2.7 Exercises for Chapter 2 81

where dv(G) is the degree of v in G.

Exercise 2.3 (Distance to rooted graph ball) Let the ball Br(o) around o inthe graph G be defines as in (2.1.1). Show that dG?

(Br(o), (G, o)

)≤ 1/(r + 1).

When does equality hold?

Exercise 2.4 (Countable number of graphs with bounded radius) Fix k. Showthat there is a countable number of isomorphism classes of rooted graphs (G, o)with radius at most r. Here, we let the radius rad(G, o) of a rooted graph (G, o)be equal to rad(G, o) = maxv∈V (G) dG(o, v).

Exercise 2.5 (G? is separable) Use Exercise 2.4 above to show that the setof rooted graphs G? has a countable dense set, and is thus separable. [See alsoProposition A.10 in Appendix A.2.2.]

Exercise 2.6 (Continuity of local neighborhood functions) Fix H? ∈ G?. Showthat h : G? 7→ 0, 1 given by h

(G, o

)= 1Br(o)'H? is continuous.

Exercise 2.7 (Bounded number of graphs with bounded radius and degrees)Show that there are only a bounded number of isomorphism classes of rootedgraphs (G, o) with radius at most r for which the degree of every vertex is at mostk.

Exercise 2.8 (Random local weak limit) Construct the simplest (in your opin-ion) possible example where the local weak limit of a sequence of deterministicgraphs is random.

Exercise 2.9 (Local weak limit of line and cycle) Let Gn be given by V (Gn) =[n], E(Gn) =

i, i + 1 : i ∈ [n − 1]

be the line. Show that Gn converges to Z.

Show that the same is true for the cycle, for which E(Gn) =i, i + 1 : i ∈

[n− 1]∪1, n

.

Exercise 2.10 (Local weak limit of finite tree) Let Gn be the tree of depth k, inwhich every vertex except the 3× 2k−1 leaves have degree 3. Here n = 3(2k − 1).What is the local weak limit of Gn? Show that it is random, despite the fact thatthe graphs Gn are deterministic.

Exercise 2.11 (Uniform integrability and convergence of size-biased degrees)Show that when (d(n)

on)n≥1 forms a uniformly integrable sequence of random vari-

ables, there exists a subsequence along which D?n, the size-biased version of Dn =

d(n)

on, converges in distribution.

Exercise 2.12 (Uniform integrability and degree regularity condition) ForCMn(d), show that Conditions 1.5(a)-(b) imply that (don)n≥1 is a uniformly in-tegrable sequence of random variables.

Exercise 2.13 (Adding a small disjoint graph does not change the local weaklimit) Let Gn be a graph that converges in the local weak sense. Let an ∈ N besuch that an = o(n), and add a disjoint copy of an arbitrary graph of size anto Gn. Denote the resulting graph by G′n. Show that G′n has the same local weaklimit of Gn.


Exercise 2.14 (Local weak convergence does not imply uniform integrability ofthe degree of a random vertex) In the setting of Exercise 2.13, add a completegraph of size an to Gn. Let a2

n n. Show that the degree of a vertex chosenuniformly at random in G′n is not uniformly integrable.

Exercise 2.15 (Local weak limit of random 2-regular graph) Show that theconfiguration model CMn(d) with di = 2 for all i ∈ [n] converges in the local weaksense in probability to Z. Conclude that the same applies to the random 2-regulargraph.

Exercise 2.16 (Independent neighborhoods of different vertices) Let Gn con-verge in probability in the local weak sense to (G, o). Let (o(1)

n , o(2)

n ) be two indepen-dent uniformly chosen vertices in [n]. Show that (Gn, o

(1)

n ) and (Gn, o(2)

n ) jointlyconverge to two independent copies of (G, o).

Exercise 2.17 (Directed graphs as marked graphs) There are several ways todescribe directed graphs as marked graphs. Give one.

Exercise 2.18 (Expected boundary of balls in Erdos-Renyi random graphs)Prove that, for ERn(λ/n) and every k ≥ 0

E[|∂B(Gn)

k (1)|]≤ λk. (2.7.2)

This can be done, for example, by showing that, for every k ≥ 1

E[|∂B(Gn)

k (1)| | B(Gn)

k−1 (1)]≤ λ|∂B(Gn)

k−1 (1)|, (2.7.3)

and using induction.

Exercise 2.19 (Uniform integrability and moment convergence) Assume that

Dn = d(Gn)

onis such that D3

n is uniformly integrable. Prove that En[(d(Gn)

on)3]

P−→Eµ[d3

o]. Conclude that (2.4.39) and (2.4.40) hold. [Hint: You need to be very care-ful, as Eµ[d3

o] may be a random variable when µ is a random measure.]

Exercise 2.20 (Example of weak convergence where convergence in probabilityfails) Construct an example where Gn converges in distribution in the local weaksense to (G, o), but not in probability.

Exercise 2.21 (Continuity of neighborhood functions) Fix m ≥ 1 and `1, . . . , `m.Show that

h(G, o) = 1|∂Br(o)|=`k∀k≤m (2.7.4)

is a bounded continuous function.

Exercise 2.22 (Proof of (2.4.2)) Let Gn converge in probability in the localweak sense to (G, o). Prove (2.4.2) using Exercise 2.16.

Exercise 2.23 (|Cmax| is not a continuous functional) Show that |Cmax|/n isnot a continuous functional in G?.

Exercise 2.24 (Upper bound on |Cmax| using LWC) Let Gn = ([n], E(Gn))denote a finite (possibly disconnected) random graph. Let (G, o) be a random


variable on G? having law µ. Assume that Gn converges in probability in the localweak sense to (G, o), and assume that the survival probability of the limiting graph

(G, o) satisfies ζ = P(|C (o)| =∞) = 0. Show that |Cmax|/n P−→ 0.

Exercise 2.25 (Convergence of the proportion of vertices in clusters of size atleast k) Let Gn converge in probability in the local weak sense to (G, o). Show

that Z≥k in (2.5.2) satisfies that Z≥k/nP−→ ζ≥k = P(|C (o)| ≥ k) for every k ≥ 1.

Exercise 2.26 (Example where proportion in giant is smaller than survival prob-ability) Construct an example where Gn converges in probability in the local weak

sense to (G, o), while |Cmax|/n P−→ η < ζ = P(|C (o)| =∞).

Exercise 2.27 (Local weak convergence with a subcritical limit) Let Gn con-verge in probability in the local weak sense to (G, o). Assume that ζ = P(|C (o)| =∞) = 0. Use (2.5.6) to prove that |Cmax|/n P−→ 0.

Exercise 2.28 (Sufficiency of (2.5.7) for almost locality giant) Let Gn =([n], E(Gn)) denote a finite (possibly disconnected) random graph. Let (G, o) bea random variable on G? having law µ. Assume that Gn converges in probabilityin the local weak sense to (G, o) and write ζ = P(|C (o)| = ∞) for the survivalprobability of the limiting graph (G, o). Assume that

lim supk→∞

1

n2lim supn→∞

E[#

(x, y) ∈ V (Gn) : |C (x)|, |C (y)| ≥ k, x←→/ y]> 0.

(2.7.5)Then, prove that for some ε > 0,

lim supn→∞

P(|Cmax| ≤ n(ζ − ε)) > 0. (2.7.6)

Exercise 2.29 (Sufficiency of (2.5.7) for uniqueness giant) Under the conditionsin Exercise 2.28, show that

lim supn→∞

P(|C(2)| > εn) > 0. (2.7.7)

Exercise 2.30 (Lower bound on graph distances in Erdos-Renyi random graphs)Use Exercise 2.18 to show that, for every ε > 0,

P(distERn(λ/n)(o1, o2)

log n≤ 1− ε

log λ

)= 0. (2.7.8)

Exercise 2.31 (Lower bound on graph distances in Erdos-Renyi random graphs)Use Exercise 2.18 to show that

limK→∞

lim supn→∞

P(

distERn(λ/n)(o1, o2) ≤ log n

log λ−K

)= 0, (2.7.9)

which is a significant extension of Exercise 2.30.

Exercise 2.32 (Convergence of the clustering spectrum) Prove that, under theconditions of Theorem 2.20, the convergence of the clustering spectrum in (2.6.2)holds.

Part II

Connected componentsin random graphs

85

87

Overview of Part II.

In this part, we study connected components in random graphs. In more detail,we investigate the connected components of uniform vertices, thus describing thelocal weak limits of these random graphs. Further, we study the structure of thelargest connected component, sometimes also called the giant component, whichcontains a positive proportion of the graph. In many random graphs, such a giantcomponent exists when there are sufficiently many connections, while the largestconnected component is much smaller than the number of vertices when thereare few connections. Thus, these random graphs satisfy a phase transition. Weidentify the size of the giant component, as well as its structure in terms of thedegrees of its vertices. We also investigate when the graph is fully connected, thusestablishing the connectivity threshold for the random graphs in question.

In more detail, this part is organised as follows. We study general inhomo-geneous random graphs in Chapter 3, and the configuration model, as well theclosely related uniform random graph with prescribed degrees, in Chapter 4. Inthe last chapter of this part, Chapter 5, we study the connected components andlocal weak convergence of the preferential attachment model.

Chapter 3

Phase transition in generalinhomogeneous random graphs

Abstract

In this chapter, we introduce the general setting of inhomogeneous ran-dom graphs. The inhomogeneous random graph is a generalization ofthe Erdos-Renyi random graph ERn(p) as well as the inhomogeneousrandom graphs such as GRGn(w) studied in [Volume 1, Chapter 6].In inhomogeneous graphs, the status of edges is independent with un-equal edge occupation probabilities. While these edge probabilities aremoderated by vertex weights in GRGn(w), in the general setting theyare described in terms of a kernel. We start by motivating its choice,which is inspired by [Volume 1, Example 6.1]. The main results in thissection concern the degree structure of such inhomogeneous randomgraphs, multi-type branching process approximations to neighborhoodsand the phase transition in these random graphs. We also discuss vari-ous examples of such inhomogeneous random graphs, and indicate thatthey can have rather different behavior.

3.1 Motivation

In this chapter, we discuss general inhomogeneous random graphs, where theedge statuses are independent. We investigate their connectivity structure, andparticularly whether they have a giant component. This is inspired by the factthat many real-world networks are highly connected, in the sense that their largestconnected component contains a large proportion of the total vertices of thegraph. See Table 3.1 for many examples.

Table 3.1 raises the question how one can view these settings where giantcomponents exist. We know that there is a phase transition in the size of the giant

Subject % in Giant Total Original Source Data

AD-blood 0.8542 96 Goni et al. (2008) Goni et al. (2008)MS-blood 0.8780 205 Goni et al. (2008) Goni et al. (2008)

actors 0.8791 2180759 Boldi and Vigna (2004) Boldi et al. (2011)DBLP 0.8162 986324 Boldi and Vigna (2004) Boldi et al. (2011)Zebras 0.8214 28 Sundaresan et al. (2007) Sundaresan et al. (2007)

Table 3.1 The five rows correspond to the following real-life networks:(1) Model of the protein-protein connections in blood of people with Alzheimer-disease.(2) Model of the protein-protein connections in blood of people with Multiple Sclerosis.(3) Network of actors in 2011 in the Internet Movie Data base (IMDb), where actors are linkedif they appeared in the same movie.(4) Collaboration network based on DBLP. Two scientist are connected if they evercollaborated on a paper.(5) Network where nodes represent zebras. There is a connection between two zebras if theyever interacted with each other during the observation phase.

89

90 Phase transition in general inhomogeneous random graphs

component in the ERn(λ/n), recall [Volume 1, Chapter 4]. A main ingredient willbe to investigate when a giant component is present in general inhomogeneousrandom graphs. This will occur precisely when the local weak limit has a positiveprobability of surviving (recall Section 2.5). Therefore, we also investigate localweak convergence in this chapter.

We will study much more general models where edges are present indepen-dently than we have done so far (recall the generalized random graph in [Volume1, Chapter 6], see also Section 1.3.2). In the generalized random graph, verticeshave weights associated to them, and the edge occupation probabilities are pro-portional to the product of the weights of the vertices that the edge connects. Thismeans that vertices with high weights have relatively large occupation probabili-ties to all other vertices, a property that may not be always be appropriate. Letus illustrate this by an example, which is a continuation of [Volume 1, Example6.1]:

Example 3.1 (Population of two types: general setting) Suppose that we havea complex network in which there are n1 vertices of type 1 and n2 of type 2.Type 1 individuals have on average m1 neighbors, type 2 individuals m2, wherem1 6= m2. Further, suppose that the probability that a type 1 individual is afriend of a type 2 individual is quite different from the probability that a type 1individual is a friend of a type 1 individual.

In the model proposed in [Volume 1, Example 6.3], the probability that a typei individual is a friend of a type j individual (where i, j,∈ 1, 2) is equal tomimj/(`n +mimj), where `n = n1m1 +n2m2. Approximating this probability bymimj/`n, we see that the probability that a type 1 individual is friend of a type 2individual is highly related to the probability that a type 1 individual is friend ofa type 1 individual. Indeed, take two type 1 and two type 2 individuals. Then, theprobability that the type 1 individuals are friends and the type 2 individuals arefriends is almost the same as the probability that first type 1 individual is friendwith the first type 2 individual, and the second type 1 individual is friend of thesecond type 2 individual. Thus, there is some, possibly unwanted and artificial,symmetry in the model. How can one create instances where the edge probabilitiesbetween vertices of the same type are much larger, or alternatively much smaller,than they would be for the generalized random graph? In the two extremes, weeither have a bipartite graph where vertices are only connected to vertices of theother type, or a disjoint union of two Erdos-Renyi random graphs , consistingof the vertices of the two types and no edges between them. We aim to be ableto obtain anything in between. In particular, the problem with the generalizedrandom graph originates in the product structure of the edge probabilities. In thischapter, we will deviate from such a product structure.

As explained above, we wish to be quite flexible in our choices of edge proba-bilities. However, we also wish for settings where the random graph is sufficiently‘regular’, as for example exemplified by its degree sequences converging to somedeterministic distribution. In particular, we aim for settings where the random

3.2 Definition of the model 91

graphs are sparse. As a result, we need to build this regularity into the precisestructure of the edge probabilities. This will be achieved by introducing an ap-propriate kernel that moderates the edge probabilities, and that is sufficientlyregular. Let us introduce the model in detail in the next section.

Organization of this chapter

This chapter is organised as follows. In Section 3.2, we introduce general inhomo-geneous random graphs. In Section 3.3, we study the degree distribution in gen-eral inhomogeneous random graphs. In Section 3.4, we treat multi-type branchingprocesses, the natural generalization of branching processes for inhomogeneousrandom graphs. In Section 3.5, we use multi-type branching processes to identifythe local weak limit of inhomogeneous random graphs. In Section 3.6, we studythe phase transition of inhomogeneous random graphs. In Section 3.7, we statesome recent related results. We close this chapter with notes and discussion inSection 3.8, and exercises in Section 3.9.

3.2 Definition of the model

We assume that our individuals have types which are in a certain type spaceS. When there are individuals of just 2 types, as in Example 3.1, then it sufficesto take S = 1, 2. However, the model allows for rather general sets of typesof the individuals, both finite as well as (countably or uncountably) infinite. Anexample of an uncountably infinite type space could be types related to the agesof the individuals in the population. Also the setting of the generalized randomgraph with wi satisfying (1.3.15) correponds to the uncountable type-space settingwhen the distribution function F is that of a continuous random variable W . Wetherefore also need to know how many individuals there are of a given type. Thisis described in terms of a measure µn, where, for A ⊆ S, µn(A) denotes theproportion of individuals having a type in A.

In our general model, instead of vertex weights, the edge probabilities aremoderated by a kernel κ : S2 → [0,∞). The probability that two vertices oftypes x1 and x2 are connected is approximately κ(x1, x2)/n, and different edgesare present independently. Since there are many choices for κ, we arrive at arather flexible model, where vertices have types and connection probabilities arerelated to the types of the vertices involved.

We start by making the above definitions formal, by defining what our groundspace is and what a kernel is:

Definition 3.2 (Setting: ground space and kernel) (i) A ground space is a pair(S, µ), where S is a separable metric space and µ is a Borel probability measureon S.

(ii) A vertex space V is a triple (S, µ, (xn)n≥1), where (S, µ) is a ground space and,for each n ≥ 1, xn is a random sequence (x1, x2, . . . , xn) of n points of S, such


that

µn(A) = #i : xi ∈ A/n→ µ(A) (3.2.1)

for every µ-continuity set A ⊆ S. The convergence in (3.2.1) is denoted by

µnp−→ µ.

(iii) A kernel κ is a symmetric non-negative (Borel) measurable function on S2. Bya kernel on a vertex space (S, µ, (xn)n≥1) we mean a kernel on (S, µ).

Before defining the precise random graph model, we state the necessary con-ditions on our kernels. We write E(G) for the number of edges in a graph G.Note that for an inhomogeneous random graph with edge probabilities p =(pij)1≤i<j≤n,

E[E(IRGn(p))] =∑i<j

pij, (3.2.2)

so that our model has bounded degree in expectation precisely when 1n

∑i<j pij

remains bounded. In our applications, we wish that the average degree per vertexin fact converges, as this corresponds to the random graph model being sparse(recall [Volume 1, Definition 1.3]). Further, we do not wish the vertex set tobe subdivided into two sets of vertices where the probability to have an edgebetween them is zero, as this implies that the graph can be divided into twoentirely disjoint graphs. This explains the main conditions we pose on the kernelκ in the following definition:

Definition 3.3 (Setting: graphical and irreducible kernels) (i) A kernel κ is graph-ical if the following conditions hold:

(a) κ is continuous a.e. on S2;(b) ∫∫

S2

κ(x, y)µ(dx)µ(dy) <∞; (3.2.3)

(c)

1

nE[E(IRGn(κ))]→ 1

2

∫∫S2

κ(x, y)µ(dx)µ(dy). (3.2.4)

Similarly, a sequence (κn)n≥1 of kernels is called graphical with limit κ when

yn → y and zn → z imply that κn(yn, zn)→ κ(y, z), (3.2.5)

where κ satisfies conditions (a) and (b) above, and

1

nE[E(IRGn(κn))]→ 1

2

∫∫S2

κ(x, y)µ(dx)µ(dy). (3.2.6)

(ii) A kernel κ is called reducible if

∃A ⊆ S with 0 < µ(A) < 1 such that κ = 0 a.e. on A× (S\A);

otherwise κ is irreducible.


We now discuss the above definitions. The assumptions in (3.2.3), (3.2.4),(3.2.6) imply that the expected number of edges is proportional to n, and thatthe proportionality constant is precisely equal to

∫∫S2 κ(x, y)µ(dx)µ(dy). Thus,

in the terminology of [Volume 1, Chapter 1], IRGn(κ) is sparse. This sparsityalso allows us to approximate graphical kernels by bounded ones in such a waythat the number of removed edges is oP(n), a fact that will be crucially used inthe sequel. Indeed, bounded graphical kernels can be well approximated by stepfunctions in a similar way as continuous functions on R can be well approximatedby step functions. In turn, such step functions on S × S correspond to randomgraphs with finitely many different types.

We extend the setting to n-dependent sequences (κn)n≥1 of kernels in (3.2.5), asin many natural cases the kernels do indeed depend on n. In particular, it allows usto deal with several closely related and natural notions of the edge probabilities atthe same time (see e.g., (3.2.7), (3.2.8) and (3.2.9) below), showing that identicalresults hold in each of these cases.

Roughly speaking, κ is reducible if the vertex set of IRGn(κ) can be split in twoparts so that the probability of an edge from one part to the other is zero, andirreducible otherwise. When κ is reducible, then the random graph splits into twoindependent random graphs on the two disjoint subsets A and S \A. Therefore,we could have equally well started with each of them separately, explaining whythe notion of irreducibility is quite natural.

In many cases, we shall take S = [0, 1], xi = i/n and µ the Lebesgue-measureon [0, 1]. Then, clearly, (3.2.1) is satisfied. In fact, Janson (2009b) shows thatwe can always restrict to S = [0, 1] by suitably adapting the other choices ofour model. However, for notational purposes, it is more convenient to work withgeneral S. For example, where S = 1 is just a single type, the model reducesto the Erdos-Renyi random graph, and in the setting where S = [0, 1], this isslightly more cumbersome, as worked out in detail in Exercise 3.1.

Now we come to the definition of our random graph. Given a kernel κ, forn ∈ N, we let IRGn(κ) be the random graph on [n], each possible edge ij, wherei, j ∈ [n], is present with probability

pij(κ) = pij =1

n[κ(xi, xj) ∧ n], (3.2.7)

and the events that different edges are present are independent. Similarly, IRGn(κn)is defined with κn replacing κ in (3.2.7). Exercise 3.2 shows that the lower boundin (3.2.4) always holds for IRGn(κ) when κ is continuous. Further, Exercise 3.3shows that (3.2.4) holds for IRGn(κ) when κ is bounded and continuous.

Bollobas et al. (2007) also allow for the choices

p(NR)

ij (κn) = 1− e−κn(xi,xj)/n, (3.2.8)

and

p(GRG)

ij (κn) = pij =κ(xi, xj)

n+ κ(xi, xj). (3.2.9)


All results in Bollobas et al. (2007) remain valid for the choices in (3.2.8) and(3.2.9). When ∑

i,j∈[n]

κn(xi, xj)3 = o(n3/2), (3.2.10)

this follows immediately from [Volume 1, Theorem 6.18], see Exercise 3.6.

For CLn(w) with w = (wi)i∈[n] as in (1.3.15), we take S = [0, 1], xi = i/n and,with ψ(x) = [1− F ]−1(x),

κn(x, y) = ψ(x)ψ(y)n/`n. (3.2.11)

For CLn(w) with w = (wi)i∈[n] satisfying Condition 1.1, instead, we take S =[0, 1], xi = i/n and

κn(i/n, j/n) = wiwj/E[Wn]. (3.2.12)

Exercises 3.4–3.5 study the Chung-Lu random graph in the present framework.

In the next section, we discuss some examples of inhomogeneous random graphs.

3.2.1 Examples of inhomogeneous random graphs

The Erdos-Renyi random graph

If S is general and κ(x, y) = λ for every x, y ∈ S, then the edge probabilities pijgiven by (3.2.7) are all equal to λ/n (for n > λ). Then IRGn(κ) = ERn(λ/n).The simplest choice here is to take S = 1.

The homogeneous bipartite random graph

Let n be even, ket S = 0, 1, let xi = 0 for i ∈ [n/2] and xi = 1 for i ∈ [n]\ [n/2].Further, let κ be defined by κ(x, y) = 0 when x 6= y and κ(x, y) = λ when x = y.Then IRGn(κ) is the random bipartite graph with n/2 vertices in each class, whereeach possible edge between classes is present with probability λ/n, independentlyof the other edges. Exercise 3.7 investigates the validity of Definitions 3.2-3.3 forhomogeneous bipartite graphs.

The stochastic blockmodel

The stochastic block model generalizes the above setting. Let n be even, ketS = 0, 1, let xi = 0 for i ∈ [n/2] and xi = 1 for i ∈ [n] \ [n/2]. Further, let κ bedefined by κ(x, y) = b when x 6= y and κ(x, y) = b when x = y. This means thatvertices of the same type are connected with probability a/n, while vertices withdifferent types are connected with probabilities a/n. A major research effort hasbeen devoted to studying when it can be statistically detected that a 6= b. Below,we also investigate more general stochastic blockmodels.

Homogeneous random graphs

We call an inhomogeneous random graph homogeneous when λ(x) =∫S κ(x, y)µ(dy) ≡

λ. Thus, despite the inhomogeneity that is present, every vertex in the graph has


(asymptotically) the same number of expected offspring. Exercise 3.8 shows thatthe Erdos-Renyi random graph, the homogeneous bipartite random graph andthe stochastic block model are all homogeneous random graphs. In such settings,however, the level of inhomogeneity is limited.

Inhomogeneous random graphs with finitely many types

Fix r ≥ 2 and suppose we have a graph with r different types of vertices. LetS = 1, . . . , r. Let ni denote the number of vertices of type i, and let µn(i) =ni/n. Let IRGn(κ) be the random graph where two vertices of types i and j,respectively, joined by an edge with probability n−1κ(i, j) (for n ≥ maxκ). Thenκ is equivalent to an r× r matrix, and the random graph IRGn(κ) has vertices ofr different types (or colors). The finite-types case has been studied by Soderberg(2002, 2003a,b,c). We conclude that our general inhomogeneous random graphcovers the cases of a finite (or even countably infinite) number of types. Exercises3.9–3.11 study the setting of inhomogeneous random graphs with finitely manytypes.

It will turn out that this case is particularly important, as many of the othersettings can be arbitrarily well approximated by inhomogeneous random graphswith finitely many types. As such, this model will be the building block uponwhich most of the results are built.

Uniformly grown random graph

The uniformly grown random graph was proposed by Callaway, Hopcroft, Klein-berg, Newman, and Strogatz (2001). In their model, the graph grows dynamicallyas follows. At each time step, a new vertex is added. Further, with probability δ,two vertices are chosen uniformly at random and joined by an undirected edge.This process is repeated for n time steps, where n describes the number of ver-tices in the graph. Callaway et al. predict, based in physical reasonings, that inthe limit of large n, the resulting graph has a giant component precisely whenδ > 1/8, and the proportion of vertices in the giant component is of the ordere−Θ(1/(

√8δ−1). Such behavior is sometimes called an infinite order phase transi-

tion. Durrett (2003) discusses this model. We will discuss a variant of this modelproposed and analyzed by Bollobas, Janson and Riordan (2005). Their modelis an example of the general inhomogeneous random graphs as discussed in theprevious section. We take as a vertex space [n], and the edge ij is present withprobability

pij =c

maxi, j , (3.2.13)

all edge statuses being independent random variables. Equivalently, we can viewthis random graph as arising dynamically, where vertex t connects to a vertexs < t with probability 1/t independently for all s ∈ [t− 1].

Sum kernels

We have already seen that product kernels are special, as they give rise to theChung-Lu model or its close brothers, the generalized random graph or the


Norros-Reittu model. For sum kernels, instead, we take κ(x, y) = ψ(x) +ψ(y), sothat pij = min(ψ(i/n) + ψ(j/n))/n, 1.

We see that there are tons of examples of random graphs with independentedges that fall into the general class of inhomogeneous random graphs. In thesequel, we will investigate them in general. We start by investigating their degreestructure.

3.3 Degree sequence of inhomogeneous random graphs

We now start by investigating the degrees of the vertices of IRGn(κn). As weshall see, the degree of a vertex of a given type x is asymptotically Poisson witha mean

λ(x) =

∫Sκ(x, y)µ(dy) (3.3.1)

that depends on x. This leads to a mixed Poisson distribution for the degree Dof a (uniformly chosen) random vertex of IRGn(κ). We recall that Nk(n) denotesthe number of vertices of IRGn(κ) with degree k. Our main result is as follows:

Theorem 3.4 (The degree sequence of IRGn(κ)) Let (κn) be a graphical se-quence of kernels with limit κ as described in Definition 3.3(i). For any fixedk ≥ 0,

Nk(n)/nP−→∫S

λ(x)k

k!e−λ(x)µ(dx) (3.3.2)

where x 7→ λ(x) is defined by

λ(x) =

∫Sκ(x, y)µ(dy). (3.3.3)

Equivalently,

Nk/nP−→ P(D = k), (3.3.4)

where D has the mixed Poisson distribution with distribution Wλ given by

P(Wλ ≤ x) =

∫ x

0

λ(y)µ(dy). (3.3.5)

In the remainder of this section, we prove Theorem 3.4. This proof is also agood example of how such proofs will be carried out in the sequel. Indeed, westart by proving Theorem 3.4 for the finite-types case, which is substantially eas-ier. After this, we give a proof in the general case, for which we shall need toprove results on approximations of sequences of graphical kernels. These approx-imations will apply for bounded kernels, and thus we will also need to show thatunbounded kernels can be well-approximated by bounded kernels. It is here thatthe assumption (3.2.4) will be crucially used.

3.3 Degree sequence of inhomogeneous random graphs 97

3.3.1 Degree sequence of finitely-types case

We start by proving Theorem 3.4 in the finite-types case. Take a vertex v oftype i, let Dv be its degree, and let Dv,j be the number of edges from v to verticesof type j ∈ [r]. Then, clearly, Dv =

∑j Dv,j.

Recall that, in the finite-types case, the edge probability between vertices oftypes i and j is denoted by (κn(i, j)∧n)/n. Further, (3.2.5) implies that κn(i, j)→κ(i, j) for every i, j ∈ [r], while (3.2.1) implies that the number ni of vertices oftype i satisfies ni/n→ µi for some probability distribution (µi)i∈[r].

Assume that n ≥ maxκ. The random variables (Dv,j)j∈[r] are independent,

and Dv,j ∼ Bin(nj − δij, κ(i, j)/n)d−→ Poi(µjκ(i, j)), where nj are the number of

vertices with type j and µj = limn→∞ nj/n. Hence,

Dvd−→ Poi

(∑j

µjκ(i, j))

= Poi(λ(i)), (3.3.6)

where λ(i) =∫κ(i, j)dµ(j) =

∑j κ(i, j)µj. Consequently,

P(Dv = k)→ P(Poi(λ(i)) = k) =λ(i)k

k!e−λ(i). (3.3.7)

Let Nk,i be the number of vertices in IRGn(κn) of type i with degree k. Then,for fixed n1, . . . , nr,

1

nE[Nk,i] =

1

nniP(Dv = k)→ µiP(Poi(λ(i)) = k). (3.3.8)

It is easily checked that Var(Nk,i) = O(n) (see Exercise 3.12). Hence,

1

nNk,i

P−→ P(Poi(λ(i)) = k)µi = P(Ξ = k), (3.3.9)

and thus, summing over i,

1

nNk =

∑i

1

nNk,i

P−→∑i

P(Poi(λ(i)) = k)µi = P(D = k). (3.3.10)

This proves Theorem 3.4 in the finite-type case.

In order to prove Theorem 3.4 in the general case, we shall be approximatinga sequence of graphical kernels (κn) by appropriate regular finite kernels.

3.3.2 Finite-type approximations of bounded kernels

Recall that S is a separable metric space, and that µ is a Borel measure on Swith µ(S) = 1. Here the metric and topological structure of S will be important.We refer to Appendix A for more details on metric spaces that will be assumedthroughout this section.

In this section, we assume that (κn) is a graphical sequence of bounded kernelswith limit κ as described in Definition 3.3(i). Thus, we assume that

supn≥1

supx,y∈S

κn(x, y) <∞. (3.3.11)


Our aim is to find finite-type approximations of κn that bound κn from aboveand below. It is here that the metric structure of S, as well as the continuity of(x, y) 7→ κn(x, y), is crucially used. This is the content of the next lemma:

Lemma 3.5 (Finite-type approximations of general kernels) If (κn)n≥1 is agraphical sequence of kernels on a vertex space V with limit κ, then there existsa sequence (κ−m)m≥1 of finite-type kernels on V with the following properties:

(i) If κ is irreducible, then so is κ−m for all large enough m;(ii) κ−m(x, y) κ(x, y) for every x, y ∈ S.

Let us now give some details. We find these finite-type approximations bygiving a partition Pm of S on which κn(x, y) is almost constant when x and y areinside cells of the partition. Fix m ≥ 1 which will indicate the number of cells inthe partition of S. Given a sequence of finite partitions Pm = Am1, . . . , AmMm

,m ≥ 1, of S and an x ∈ S, we define the function x 7→ im(x) by requiring that

x ∈ Am,im(x). (3.3.12)

Thus, im(x) indicates the cell in Pm that x is in. For A ⊂ S we write diam(A)for sup|x− y| : x, y ∈ A. By taking Pm as the dyadic partition into intervals oflength 2−m in S, we easily see the following:

Lemma 3.6 (Approximating partition) Fix m ≥ 1. There exists a sequence offinite partitions Pm = Am1, . . . , AmMm

of S such that

(i) each Ami is measurable and µ(∂Ami) = 0;(ii) for each m, Pm+1 refines Pm, i.e., each Ami is a union

⋃j∈Jmi Am+1,j for some

set Jmi;(iii) for a.e. x ∈ S, diam(Am,im(x)) → 0 as m → ∞, where im(x) is defined by

(3.3.12).

Recall that a kernel κ is a symmetric measurable function on S × S. Fixing asequence of partitions with the properties described in Lemma 3.6, we can definesequences of lower and upper approximations to κ by

κ−m(x, y) = infκ(x′, y′) : x′ ∈ Am,im(x), y′ ∈ Am,im(y), (3.3.13)

κ+m(x, y) = supκ(x′, y′) : x′ ∈ Am,im(x), y

′ ∈ Am,im(y). (3.3.14)

We thus replace κ by its infimum or supremum on each Ami×Amj. As κ+m might

be +∞, we only use it for bounded κn as in (3.3.11). Obviously, κ−m and κ+m are

constant on Am,i×Am,j for every i, j, so that κ−m and κ+m correspond to finite-type

kernels (see Exercise 3.14).By Lemma 3.6(ii),

κ−m ≤ κ−m+1 and κ+m ≥ κ+

m+1. (3.3.15)

Furthermore, if κ is continuous a.e. then, by Lemma 3.6(iii),

κ−m(x, y)→ κ(x, y) and κ+m(x, y)→ κ(x, y) for a.e. (x, y) ∈ S2. (3.3.16)

3.3 Degree sequence of inhomogeneous random graphs 99

If (κn) is a graphical sequence of kernels with limit κ, we define instead

κ−m(x, y) := inf(κ ∧ κn)(x′, y′) : x′ ∈ Am,im(x), y′ ∈ Am,im(y), n ≥ m, (3.3.17)

κ+m(x, y) := sup(κ ∨ κn)(x′, y′) : x′ ∈ Am,im(x), y

′ ∈ Am,im(y), n ≥ m. (3.3.18)

Again, by Lemma 3.6, we have κ−m ≤ κ−m+1, and, by Lemma 3.6(iii) and 3.3(ii),

κ−m(x, y) κ(x, y) as m→∞, for a.e. (x, y) ∈ S2. (3.3.19)

Since κ−m ≤ κ, we can obviously construct our random graph so that IRGn(κ−m) ⊆IRGn(κn), and in the sequel we assume this. See also Exercise 3.15. Similarly,we shall assume that IRGn(κ+

m) ⊇ IRGn(κn) when κn is bounded as in (3.3.11).Moreover, when n ≥ m,

κn ≥ κ−m, (3.3.20)

and we may assume that IRGn(κ−m) ⊆ IRGn(κn). By the convergence of the se-quence of kernels (κn), we further obtain that also the number of edges converges.Thus, in bounding κn, we do not create or destroy too many edges. This providesthe starting point of our analysis, which we provide in the following section.

3.3.3 Degree sequences of general inhomogeneous random graphs

Now we are ready to complete the proof of Theorem 3.4 for general sequencesof graphical kernels (κn). Define κ−m by (3.3.17). Since we only use the lowerbounding kernel κ−m (which always exists), we need not assume that κn is bounded.

Let ε > 0 be given. From (3.2.6) and monotone convergence, there is an msuch that∫∫

S2

κ−m(x, y)µ(dx)µ(dy) >

∫∫S2

κ(x, y)µ(dx)µ(dy)− ε. (3.3.21)

For n ≥ m, we have κ−m ≤ κn by (3.3.20), so we may assume that IRGn(κ−m) ⊆IRGn(κn). Then, by (3.2.6) and (3.3.21),

1

nE(IRGn(κn) \ IRGn(κ−m)) (3.3.22)

=1

nE(G(n, κn))− 1

nE(IRGn(κ−m))

P−→ 1

2

∫∫S2

κ(x, y)µ(dx)µ(dy)− 1

2

∫∫S2

κ−m(x, y)µ(dx)µ(dy) <ε

2,

so that, whp E(IRGn(κn) \ IRGn(κ−m)) < εn. Let us write N (m)

k for the numberof vertices of degree k in IRGn(κ−m). It follows that, whp,

|N (m)

k −Nk| < 2εn. (3.3.23)

Writing Ξ(m) for the equivalent of Ξ defined using κ−m in place of κ, by the proof

for the finite-type case, N (m)

k /nP−→ P(D(m) = k). Thus, whp,

|N (m)

k /n− P(D(m) = k)| < ε. (3.3.24)


Finally, we have E[Ξ] =∫S λ(x)µ(dx) =

∫∫S2 κ(x, y)µ(dx)µ(dy). Since λ(m)(x) ≤

λ(x), we can couple the branching processes such that Ξ(m) ≤ Ξ, and thus

P(D 6= D(m)) = P(Ξ− Ξ(m) ≥ 1) ≤ E[Ξ− Ξ(m)] (3.3.25)

=

∫∫S2

κ(x, y)µ(dx)µ(dy)−∫∫S2

κ−m(x, y)µ(dx)µ(dy) < ε.

Combining (3.3.23), (3.3.24) and (3.3.25), we see that |Nk/n − P(Ξ = k))| < 4εwhp.

Now that we have identified the limit of the degree distribution, let us discussits proof as well as some of the properties of the limiting degree distribution.

Bounded kernels

First of all, the proof above will be exemplary of several proofs that we will use inthis chapter, as well as Chapter 6. The current proof is particularly simple, as itonly makes use of the lower bounding finite-type inhomogeneous random graph,while in many settings we also need the upper bound. This upper bound can onlyapply to bounded kernels κn as in (3.3.11). As a result, we will need to study theeffect of bounding κn, for example by approximating it by κn(x, y) ∧K.

Tail properties of the degree distribution

Let W = Wλ be the random variable λ(U), where U is a random variable on Shaving distribution µ. Then we can also describe the mixed Poisson distributionof D as Poi(W ). Under mild conditions, the tail probabilities P(D > t) andP(W > t) agree for large t. We state this for the case of power-law tails; manyof these result generalizes to regularly-varying tails. Let N≥k be the number ofvertices with degree at least k.

Corollary 3.7 (Power-law tails for the degree sequence) Let (κn) be a graphicalsequence of kernels with limit κ. Suppose that P(W > t) = µ

(x : λ(x) > t

)=

at−(τ−1)(1 + o(1)) as t→∞, for some a > 0 and τ > 2. Then

N≥k/nP−→ P(D ≥ k) ∼ ak−(τ−1), (3.3.26)

where the first limit is for k fixed and n → ∞, and the second for k → ∞. Inparticular, limn→∞ P(D ≥ k) ∼ ak−(τ−1) as k →∞.

Proof It suffices to show that P(D ≥ k) ∼ ak−(τ−1); the remaining conclusionsthen follow from Theorem 3.4. For any ε > 0, as t→∞,

P(Poi(W ) > t | Λ > (1 + ε)t)→ 1, (3.3.27)

and P(Poi(W ) > t |W < (1− ε)t) = o(t−(τ−1)).

It follows that P(D > t) = P(Poi(W ) > t) = at−(τ−1)(1+o(1)) as t→∞. Exercise3.16 asks you to fill the details on this argument.

This result shows that the general inhomogeneous random graph does includenatural cases with power-law degree distributions. Recall that we have already

3.4 Multi-type branching processes 101

observed in [Volume 1, Theorem 6.7] that this is the case for the GRGn(w) whenthe weights sequence w is chosen appropriately.

3.4 Multi-type branching processes

In order to study further properties of IRGn(κn), we need to understand theneighborhood structure of vertices. This will be crucially used in the next section,where we study the local weak convergence properties of IRGn(κn)For simplicity,let us restrict ourselves to the finite-types case. As we have seen, nice kernelscan be arbitrarily well approximated by finite-type kernels, so this should be agood start. Then, for a vertex of type i, the number of neighbors of type j isclose to Poisson distributed with approximate mean κ(i, j)µ(j). Even when weassume independence of the neighborhood structures of different vertices, we stilldo not arrive at a classical branching process as discussed in [Volume 1, Chapter3]. Instead, we can describe the neighborhood structure with a branching processin which we keep track of the type of each of the vertices. For general κ andµ, we can even have a continuum of types. Such branching processes are calledmulti-type branching processes. In this section, we discuss some of the basics andwe shall quickly go to the special case of multi-type branching processes whereevery offspring has a Poisson distribution.

3.4.1 multi-type branching processes with finitely many types

multi-type branching process can be analyzed using linear algebra. In order todo so, we first introduce some notation. We first assume that we are in the finitetypes case, and denote the number of types by r. We let j = (j1, . . . , jr) ∈ Nr0be a vector of non-negative integers, and denote by p(i)

j the probability that anindividual of type i gives rise to an offspring j, i.e., j1 children of type 1, j2children of type 2, etc. The offsprings of the different individuals are all mutuallyindependent. Denote by Z(i)

n,j the number of individuals of type j in generation nwhen starting from a single particle of type i and Z(i)

n = (Z(i)

n,1, . . . , Z(i)

n,j). We areinterested in the survival or extinction of multi-type branching processes, andin the growth of the generation sizes. In the multi-type case, we are naturallylead to a matrix setup. We now discuss the survival versus extinction of multi-type branching processes. We denote the survival probability of the multi-typebranching process when starting from a single individual of type i by

ζ(i) = P(Z(i)

n 6= 0 for all n), (3.4.1)

and we let ζ = (ζ(1), . . . , ζ(r)). Our first aim is to investigate when ζ = 0.

multi-type branching processes and generating functions

We write p(j) = (p(1)

j , . . . , p(r)

j ) and we let

G(i)(s) =∑j

p(i)

j

r∏a=1

sjaa (3.4.2)


be the joint moment generating function of the offspring of an individual of type i.We write G(s) = (G(1)(s), . . . , G(r)(s)) for the vector of generating functions. Wenow generalize Theorem 3.1 to the multi-type case. Let ζ satisfy ζ = 1−G(1−ζ).By convexity of s 7→ G(s), there is at most one non-zero solution to the equations = G(s) which is not equal to 0. Define

G(i)

n (s) = E[ r∏a=1

sZ(i)n,a

a

], (3.4.3)

and Gn(s) = (G(1)

n (s), . . . , G(r)

n (s). Then, we have that Gn+1(s) = Gn(G(s)) =G(Gn(s)) and ζ = 1 − limn→∞Gn(0). Naturally, the extinction probability de-pends sensitively on the type of the ancestor of the branching process. On theother hand, under reasonable assumptions, the positivity of the survival prob-ability is independent of the initial type. A necessary and sufficient conditionfor this property is that, with positive probability, an individual of type i arisesas a descendent of an individual of type j for each type i and j. See Exercise3.18. Exercise 3.19 relates this to the lth power of the mean offspring matrixTκ = (κijµ(j))ri,j=1.

We note that when G(s) = Ms for some matrix M, then each individual inthe Markov chain has precisely one offspring, and we call this case singular (seeExercise 3.20). When each particle has precisely one offspring, the multi-typebranching process is equivalent to a Markov chain, and the process a.s. survives.Thus, in this case, there is no survival vs. extinction phase transition. We shallassume throughout the remainder that the multi-type branching process is non-singular.

3.4.2 Survival vs. extinction of multi-type branching processes

We continue to describe the survival versus extinction of multi-type branchingprocesses in terms of the mean offspring. Let κij denote the expected offspring oftype j of a single individual of type i, and let Tκ = (κijµ(j))ri,j=1 be the matrix ofexpected offsprings. We shall assume that there exists an l such that the matrixTlκ has only strictly positive entries. This is sometimes called irreducibility, as it

implies that the Markov chain of the number of individuals of the various typesis an irreducible Markov chain. By the Perron-Frobenius theorem, the matrix Tκ

has a unique largest eigenvalue ‖Tκ‖ with non-negative left-eigenvector xκ, andthe eigenvalue ‖Tκ‖ can be computed as

‖Tκ‖ = supx : ‖x‖2≤1

‖Tκx‖2, where ‖x‖2 =

√√√√ r∑i=1

x2i . (3.4.4)

We note that

E[Z(i)

n+1 | Z(i)

n = z] = Tκz, (3.4.5)

so that

E[Z(i)

n+1] = Tnκe

(i). (3.4.6)


where Tnκ denotes the n-fold application of the matrix Tκ, and e(i) is the vector

which has on the ith position a 1, and further only zeroes. The identificationsin (3.4.5) and (3.4.6) have several important consequences concerning the phasetransition of multi-type branching processes, as we shall now discuss in moredetail.

First, when ‖Tκ‖ < 1,

E[Z(i)

n+1] ≤ ‖Tκ‖n‖e(i)‖2, (3.4.7)

which converges to 0 exponentially fast. Therefore, by the Markov inequality([Volume 1, Theorem 2.17]), the multi-type branching process dies out a.s. When‖Tκ‖ > 1, on the other hand, the sequence

Mn = xκZ(i)

n+1‖Tκ‖−n (3.4.8)

is a non-negative martingale, by (3.4.5) and the fact that xκ is a left-eigenvectorwith eigenvalue ‖Tκ‖, since xκTκ = ‖Tκ‖xκ. By the Martingale convergence the-orem (Theorem 2.24), the martingale Mn converges a.s. When we further assumesome further restrictions on Mn, for example that Mn has finite second moment,then we obtain that Mn

a.s.−→ M∞ and E[Mn] → E[M∞]. More precisely, there isa multi-type analog of the Kesten-Stigum Theorem ([Volume 1, Theorem 3.10]).Since E[Mn] = E[M0] = xκe

(i) > 0, we thus have that Z(i)

n+1 grows exponentiallywith a strictly positive probability, which implies that the survival probabilityis positive. [Volume 1, Theorem 3.1] can be adapted to show that Z(i)

n+1P−→ 0

when ‖Tκ‖ = 1. See e.g. (Harris, 1963, Sections II.6-II.7). We conclude that, fornon-singular and irreducible multi-type branching processes, we have that ζ > 0precisely when ‖Tκ‖ > 1. This is the content of the following theorem:

Theorem 3.8 (Survival vs. extinction of finite-type branching processes) Let(Z(i)

n )n≥0 be a multi-type branching process with offspring matrix Tκ on the typespace [r]. Assume that there exists an l such that the matrix Tl

κ has only strictlypositive entries, and that

∑j κijµj 6= 1 for every j ∈ [r]. Then the following holds:

(a) The survival probability ζ is the largest solution to ζ = 1 −G(1 − ζ), andζ > 0 precisely when ‖Tκ‖ > 1.(b) Assume that ‖Tκ‖ > 1. Let xκ be the unique positive left-eigenvector of Tκ.Then, the martingale Mn = xκZ

(i)

n+1‖Tκ‖−n converges to a.s. to a non-negativelimit on the event of survival precisely when E[Z(i)

1 log (Z(i)

1 )] <∞ for all i ∈ [r],where Z(i)

1 = ‖Z(i)

1 ‖1 is the total offspring of a type i individual.

3.4.3 Poisson multi-type branching processes

We now specialize to Poisson multi-type branching processes as these turnout to be the most relevant in the random graph setting. We call a multi-typebranching processes Poisson when all the number of children of each type areindependent Poisson random variables. Thus, Z(i) = (Z(i)

1,1, . . . , Z(i)

1,r) is a vectorof independent Poisson random variables with means (κ1,i, . . . , κr,i). As we seelater, Poisson multi-type branching processes arise naturally when exploring acomponent of IRGn(κ) starting at a vertex of type x. This is directly analogous


to the use of the single-type Poisson branching process in the analysis of theErdos-Renyi graph ERn(λ/n) as discussed in detail in [Volume 1, Chapters 4 and5].

Poisson multi-type branching processes with finitely many types

For Poisson multi-type branching processes with finitely many types, we obtainthat

G(i)(s) = E[r∏a=1

sZ

(i)1,a

a ] = e∑ra=1 κa,iµ(i)(sa−1) = e(Tκ(s−1))i . (3.4.9)

Thus, the vector of survival probabilities ζ satisfies

ζ = 1− e−Tκζ, (3.4.10)

where, for a matrix M, we recall that eM =∑

n≥0 Mn/n! denotes the matrixexponential. This leads us to the investigation of eigenfunctions of non-linearoperators of the form f 7→ 1− e−Tκf .

There is a beautiful property of Poisson random variables that allows us toconstruct a Poisson multi-type branching process in a particularly convenientway. This property follows from the following Poisson thinning property:

Lemma 3.9 (A Poisson number of multinomial trials) Let X have a Pois-son distribution with parameter λ. Perform X multinomial trials, where the ith

outcome appears with probability pi for some probabilities (pi)ki=1. Let (Xi)

ki=1,

where Xi denotes the total number of outcomes i. Then (Xi)ki=1 is a sequence of

independent Poisson random variables with parameters (λpi)ki=1.

Proof Let (xi)ki=1 denote a sequence of non-negative integers, denote x =

∑ki=1 xi

and compute

P((Xi)ki=1 = (xi)

ki=1) = P(X = x)P((Xi)

ki=1 = (xi)

ki=1 | X = x) (3.4.11)

= e−λλx

x!

(x

x1, x2, . . . , xk

)px1

1 · · · pxkk =k∏i=1

e−λxiλxi

(xi)!.

By Lemma 3.9, we can alternatively construct a Poisson branching process asfollows. For an individual of type i, let its total number of offspring Zi have aPoisson distribution with parameter

∑j∈[r] κijµ(j). Then give each of the chil-

dren a type j with probability κijµ(j)/λ(i). Let Zij denote the total number ofindividuals of type j thus obtained. Then, the offspring distribution Z(i) has thesame distribution as (Zij)j∈[r].

We now extend the above setting of finite-type Poisson multi-type branchingprocesses to the infinite-type case. Again, we prove results in the infinite-typescase by reducing to the finite-type case.


Poisson multi-type branching processes with infinitely many types

Let κ be a kernel. We define the Poisson multi-type branching processes withkernel κ as follows. Each individual of type x ∈ S is replaced in the next genera-tion by a set of individuals distributed as a Poisson process on S with intensityκ(x, y)µ(dy). Thus, the number of children with types in a subset A ⊆ S has aPoisson distribution with mean

∫Aκ(x, y)µ(dy), and these numbers are indepen-

dent for disjoint sets A and for different particles; see e.g., Kallenberg (2002).

Let ζκ(x) be the survival probability of the Poisson multi-type branching pro-cess with kernel κ, starting from an ancestor of type x ∈ S. Set

ζκ =

∫Sζκ(x)µ(dx). (3.4.12)

Again, it can be seen in a similar way as above that ζκ > 0 if and only if ‖Tκ‖ > 1,where now the linear operator Tκ is defined, for f : S → R,

(Tκf)(x) =

∫Sκ(x, y)f(y)µ(dy), (3.4.13)

for any (measurable) function f such that this integral is defined (finite or +∞)for a.e. x ∈ S.

Note that Tκf is defined for every f ≥ 0, with 0 ≤ Tκf ≤ ∞. If κ ∈ L1(S×S),as we shall assume throughout, then Tκf is also defined for every bounded f . Inthis case Tκf ∈ L1(S) and thus Tκf is finite a.e.

As we shall see, the analysis of multi-type branching processes with a possiblyuncountable number of types is a bit more functional analytic. Similarly to thefinite-type case in (3.4.4), we define

‖Tκ‖ = sup‖Tκf‖2 : f ≥ 0, ‖f‖2 ≤ 1

≤ ∞. (3.4.14)

When finite, ‖Tκ‖ is the norm of Tκ as an operator on L2(S); it is infinite ifTκ does not define a bounded operator on L2. The norm ‖Tκ‖ is at most theHilbert-Schmidt norm of Tκ:

‖Tκ‖ ≤ ‖Tκ‖HS = ‖κ‖L2(S×S) =

(∫∫S2

κ(x, y)2µ(dx)µ(dy)

)1/2

. (3.4.15)

We also define the non-linear operator Φκ by(Φκf

)(x) = 1− e−(Tκf)(x), x ∈ S, (3.4.16)

for f ≥ 0. Note that for such f we have 0 ≤ Tκf ≤ ∞, and thus 0 ≤ Φκf ≤ 1.We shall characterize the survival probability ζκ(x), and thus ζκ, in terms of thenon-linear operator Φκ, showing essentially that the function x 7→ ζκ(x) is themaximal fixed point of the non-linear operator Φκ (recall (3.4.10)). Again, thesurvival probability satisfies that ζκ > 0 precisely when ‖Tκ‖ > 1, recall thefinite-types case discussed in detail above.

We call a multi-type branching process supercritical when ‖Tκ‖ > 1, critical


when ‖Tκ‖ < 1, and subcritical when ‖Tκ‖ < 1. Then, the above discussioncan be summarized by saying that a multi-type branching process survives withpositive probability precisely when it is supercritical.

Poisson branching processes and product kernels: the rank-1 case

We continue by studying the rank-1 case, where the kernel is of product structure.Let κ(x, y) = ψ(x)ψ(y), so that Tκ(x, y) = ψ(x)ψ(y)µ(y). In this case, we see thatψ is an eigenvector with eigenvalue

‖Tκ‖ =

∫Sψ(y)2µ(dy) = ‖ψ‖2L2(µ). (3.4.17)

Further, the left eigenvector equals µ(x)ψ(x). Thus, the rank-1 multi-type branch-ing process is supercritical when ‖ψ‖2L2(µ) > 1, critical when ‖ψ‖2L2(µ) < 1, and

subcritical when ‖ψ‖2L2(µ) < 1.The rank-1 case is rather special, and not only since we can explicitly compute

the eigenvectors of the operator Tκ. It also turns out that the rank-1 multi-typecase reduces to a single type branching process with mixed Poisson offspringdistribution. For this, we recall the construction right below Lemma 3.9. Wecompute that

λ(x) =

∫Sψ(x)ψ(y)µ(dy) = ψ(x)

∫Sψ(y)µ(y), (3.4.18)

so that an offspring of an individual of type x receives mark y with probability

p(x, y) =Tκ(x, y)µ(y)

λ(x)=

ψ(x)ψ(y)µ(y)

ψ(x)∫S ψ(z)µ(dz)

=ψ(y)µ(y)∫S ψ(z)µ(dz)

. (3.4.19)

We conclude that every individual chooses its type independently of the type ofits parent. This means that this multi-type branching process reduces to a singletype branching process with offspring distribution that is Poi(Wλ), where

P(Wλ ∈ A) =

∫Aψ(y)µ(dy)∫S ψ(z)µ(dz)

. (3.4.20)

This makes the rank-1 setting particularly appealing.

Poisson branching processes and sum kernels

For the sum kernel, the analysis becomes slightly more involved, but can still besolved. Recall that κ(x, y) = α(x) + α(y) for the sum kernel. Anticipating a niceshape of the eigenvalues and -vectors, we let φ(x) = aα(x) + b, and verify theeigenvalue relation. This leads to

(Tκφ)(x) =

∫Sκ(x, y)φ(y)µ(dy) =

∫S[α(x) + α(y)](aα(y) + b))µ(dy) (3.4.21)

= α(x)(a‖α‖L1(µ) + b

)+(a‖α‖L2(µ) + b‖α‖L1(µ)) = λ(aα(x) + b).


Solving for a, b, λ leads to a‖α‖L1(µ) + b = aλ, a‖α‖2L2(µ) + b‖α‖L1(µ) = λb, so that

the vector (a, b)T is the eigenvector with eigenvalue λ of the matrix[‖α‖L1(µ) 1‖α‖2L2(µ) ‖α‖L1(µ)

](3.4.22)

Solving this equation leads to eigenvalues

λ = ‖α‖L1(µ) ± ‖α‖L2(µ), (3.4.23)

and the corresponding eigenvectors (1,±‖α‖L2(µ))T . Clearly, the maximal eigen-

value equals λ = ‖α‖L2(µ) +‖α‖L1(µ), with corresponding L2(µ)-normalized eigen-vector

φ(x) =α(x) + ‖α‖L2(µ)

2(‖α‖2L2(µ) + ‖α‖L1(µ)‖α‖L2(µ)

) . (3.4.24)

All other eigenvectors can be chosen to be orthogonal to α and 1, so that thiscorresponds to a rank-2 setting.

Unimodular Poisson branching processes

In our results, we will be interested in multi-type Poisson branching processesthat start from the type distribution µ. Thus, we fix the root ∅ in the branchingprocess tree, and give it a random type Q satisfying that P(Q ∈ A) = µ(A),for any measurable A ⊆ X . This corresponds to the unimodular setting that isimportant in random graph settings. The idea is that the total number of verticeswith types in A is close to nµ(A), so that if we pick a vertex uniformly at random,it will have a type in A with asymptotic probability equal to µ(A).

Branching process notation

We now introduce some notation that will be helpful. We let BP≤k denote thebranching process up to and including generation k, where of each individual v inthe kth generation, we record its type as Q(v). It will be convenient to think ofT as being labelled in the Ulam-Harris way (recall Section 1.5), so that a vertexv in generation k has a label ∅a1 · · · ak, where ai ∈ N. When applied to BP, wewill denote this process by (BP(t))t≥1, where BP(t) consists of precisely t vertices(with BP(1) equal to the root ∅, as well as its type Q(∅).

Monotone approximations of kernels

In what follows, we will often approximate general kernels by kernels with finitelymany types, as described in Lemma 3.5. For monotone sequences, we can provethe following convergence result:

Theorem 3.10 (Monotone approximations of multi-type Poisson branching pro-cesses) Let (κn) be a sequence of kernels such that κn(x, y) κ(x, y). Let BP(n)

≤kdenote the first k generations of the Poisson multi-type branching process withkernel κn and BP≤k that of the Poisson multi-type branching process with ker-

nel κ. Then BP(n)

≤kd−→ BP≤k. Further, let ζ(n)

≥k(x) denote the probability that anindividual of type x has at least k descendants. Then, ζ(n)

≥k(x) ζ≥k(x).


Proof Since κn(x, y) κ(x, y), we can write

κ(x, y) =∑n≥1

∆κn(x, y), where ∆κn(x, y) = κn−1(x, y)− κn−1(x, y).

(3.4.25)We can represent this by a sum of independent Poisson multi-type processes withintensities ∆κn(x, y) and associate a label n to each particle that arises from∆κn(x, y). Then, the branching process BP(n)

≤k is obtained by keeping all verticeswith labels at most n, while BP≤k is obtained by keeping all vertices. Conse-

quently, BP(n)

≤kd−→ BP≤k follows. Further, 1 − ζ(n)

≥k(x) = ζ(n)

<k(x) is a continuousfunction of BP(n)

≤k and thus also converges.

3.5 Local weak convergence for inhomogeneous random graphs

In this section, we prove the local weak convergence of IRGn(κn) in general.Our main result is as follows:

Theorem 3.11 (Locally tree-like nature IRGn(κn)) Assume that κn is an irre-ducible graphical kernel converging to some limiting kernel κ. Then IRGn(κn) con-verges in probability in the local weak sense to the unimodular multi-type markedGalton-Watson tree, where

B the root has offspring distribution Poi(Wλ) and type Q with distribution

P(Q ∈ A) = µ(A); (3.5.1)

B a vertex of type x has offspring distribution Poi(λ(x)), with

λ(x) =

∫Sκ(x, y)µ(dy), (3.5.2)

and each of its offspring receives an independent type with distribution Q(x)given by

P(Q(x) ∈ A) =

∫Aκ(x, y)µ(dy)∫

S κ(x, y)µ(dy). (3.5.3)

The proof of Theorem 3.11 follows a usual pattern. We start by proving The-orem 3.11 for the finite-type case, and then use finite-type approximations toextend the proof to the infinite-type case.

3.5.1 Local weak convergence: finitely many types

In order to get started for the proof of (2.3.10) for Theorem 3.11 in the finite-type case, we introduce some notation. Fix a rooted tree (t, q) of k generations,and where the vertex v ∈ V (t) has type q(v) for all v ∈ V (t). It will again beconvenient to think of t as being labelled in the Ulam-Harris way, so that a vertexv in generation k has a label ∅a1 · · · ak, where ai ∈ N.

3.5 Local weak convergence for inhomogeneous random graphs 109

Let

Nn(t, q) =∑v∈[n]

1B(n;Q)k (v)'(t,q) (3.5.4)

denote the number of vertices whose local neighborhood up to generation t in-cluding their types equals (t, q). Here, in B(n;Q)

k (v), we record of the types of thevertices in B(Gn)

k (v). Theorem 2.13 implies that in order to prove Theorem 4.1,we need to show that

Nn(t, q)

nP−→ P(BP≤k ' (t, q)). (3.5.5)

Indeed, since there are only a finite number of types, (3.5.5) also implies that

Nn(t)/nP−→ ρ(t), where

ρ(t) =∑q

P(BP≤k ' (t, q)) (3.5.6)

is the probability that the branching process produces a certain tree (and ig-noring the types). To this, we can then apply Theorem 2.13. Alternatively, wecan consider (3.5.5) as convergence of marked graphs as discussed in Section 1.6,which is stronger, but we do not pursue this direction here further.

To prove (3.5.5) , we will use a second moment method. We first prove thatthe first moment E[Nn(t, q)]/n → P(BP≤k ' (t, q)), after which we prove thatVar(Nn(t, q)) = o(n2). Then, by the Chebychev inequality ([Volume 1, Theorem2.18]), (4.1.5) follows.

Local weak convergence: first moment

We start by noting that

1

nE[Nn(t)] = P(B(n;Q)

k (o) ' (t, q)), (3.5.7)

where o ∈ [n] is a vertex chosen uniformly at random. Our aim is to prove thatP(B(n;Q)

k (o) ' (t, q))→ P(BP≤k ' (t, q)).

Let us start with the branching process and analyze P(BP≤k ' (t, q)). Fix avertex v ∈ t of type q(v). The probability of seeing a sequence of dv children of(ordered) types (q(v1), . . . , q(vd)) equals

e−λ(q(v))λ(q(v))dv

dv!

dv∏j=1

κ(q(v), q(vj))

λ(q(v)= e−λ(q(v)) 1

dv!

dv∏j=1

κ(q(v), q(vj))µ(vj),

(3.5.8)since we first draw a Poisson λ(q(v)) number of children, assign an order to themand then assign a type q to each of them with probability κ(q(v), q)/λ(q(v)). Thisis true independently for all v ∈ V (t) with |v| ≤ k − 1, so that

P(BP≤k ' (t, q)) =∏

v∈T : |v|≤k−1

e−λ(q(v)) 1

dv!

dv∏j=1

κ(q(v), q(vj))µ(vj). (3.5.9)


For a comparison to the graph exploration, it will turn out to be convenient torewrite this probability slightly. Let t≤k−1 = v : |v| ≤ k − 1 denote the verticesin the first k−1 generations of t and let |t≤k−1| denote its size. We can order the

elements of t≤k−1 in their lexicographic ordering as (vi)|t≤k−1|i=1 . The lexicographic

ordering it the ordering in which the vertices are explored in the breadth-firstexploration. Then we can rewrite

P(BP≤k ' (t, q)) =

|t≤k−1|∏i=1

e−λ(q(vi))1

dvi !

dvi∏j=1

κ(q(vi), q(vij))µ(vij). (3.5.10)

Let us now turn to IRGn(κn). Fix a vertex v ∈ [n] of type q(v). The probabilityof seeing a sequence of dv neighbors of (ordered) types (q(v1), . . . , q(vd)) equals

1

dv!

∏q∈S

(1− κn(q(v), q))nq−mqdv∏j=1

κn(q(v), q(vj))

n[nqj −mq(vj)(j − 1)], (3.5.11)

where mq = #i : q(vi) = q is the number of type q vertices in (q(v1), . . . , q(vd))and mq(vj)(j) = #i ≤ j : q(vi) = q is the number of type q vertices in(q(v1), . . . , q(vj)). Here, the first factor is since we assign an ordering uniformlyat random, the second since all other edges (except for the specified ones) needto be absent, and the third specifies that the edges to vertices of the (ordered) se-quence of types are present. When n→∞, and since nq/n→ µ(q), κn(q(v), q)→κ(q(v), q) for every q ∈ S,

1

dv!

∏q∈S

(1− κn(q(v), q)

n

)nq−mq dv∏j=1

κn(q(v), q(vij))[nqj −mq(vj)(j − 1)]

n

→ e−λ(q(v)) 1

dv!

dv∏i=1

κ(q(v), q(vj))µ(q(vj)), (3.5.12)

as required. The above computation, however, ignores the depletion-of-pointseffect that fewer vertices participate in the course of the exploration. To describe

this, recall the lexicographic ordering of the elements in t≤k−1 as (vi)|t≤k−1|i=1 , and,

for a type q, let mq(i) = #j ∈ [i] : q(vi) = q denote the number of type qindividuals in (t, q) encountered up to and including the ith exploration. Then,

P(B(n;Q)

k (o) ' (t, q)) =

|t≤k−1|∏i=1

1

dvi !

∏q∈S

(1− κn(q(vi), q)

n

)nq−mq(i−1)

(3.5.13)

×dvi∏j=1

κn(q(vi), q(vij))

n[nqj −mq(vj)(i+ j − 1)].

As n → ∞, this converges to the right hand side of (3.5.10), as required. Thiscompletes the proof of (3.5.7), and thus the convergence of the first moment.


Local weak convergence: second moment

Here, we study the second moment of Nn(t, q), and show that it is close to thefirst moment squared:

Lemma 3.12 (Concentration of the number of trees) As n→∞,

Var(Nn(t, q)2)

n2→ 0. (3.5.14)

Consequently, Nn(t, q)/nP−→ P(BP≤k ' (t, q)).

Proof We start by computing

E[Nn(t, q)2]

n2= P(B(n,Q)

o1(k) = B(n,Q)

o2(k) = (t, q)), (3.5.15)

where o1, o2 ∈ [n] are two vertices chosen uniformly at random from [n], indepen-dently.

We first claim that with high probability and any k fixed, distIRGn(κn)(o1, o2) >2k. For this, we take the expectation with respect to o2 to obtain that

P(distIRGn(κn)(o1, o2) ≤ 2k) =1

nE[|B(n;Q)

2k (o1)|]. (3.5.16)

By (3.5.13),

P(B(n;Q)

2k (o1) ' (t, q)) ≤|t≤k−1|∏i=1

1

dvi !

dvi∏j=1

κn(q(vi), q(vij))nqjn. (3.5.17)

Denote the mean-offspring matrix M(n) by M(n)

ij = κn(q(vi), q(vij))nqj/n, andwrite µ(n)

i = ni/n. Then,

P(distIRGn(κn)(o1, o2) ≤ 2k) ≤ 1

n

∑`≤2k

(µ(n))T (M(n))`1. (3.5.18)

Let λn denote the largest eigenvalue of M(n), so that

P(distIRGn(κn)(o1, o2) ≤ 2k) ≤ 1

n

∑`≤2k

‖µ(n)‖2λ`n‖1‖2 (3.5.19)

=‖µ(n)‖2‖1‖2

n

∑`≤2k

λ`n

=‖µ(n)‖2‖1‖2

n

λ2k+1n − 1

λn − 1.

As n → ∞, M(n)

ij → (Tκ)ij for every i, j ∈ [m], so that also λn → λ = ‖Tκ‖.We conclude that, for every k ∈ N fixed, P(distIRGn(κn)(o1, o2) ≤ 2k) = o(1). SeeExercises 3.23–3.24 for extensions of this result for distances in inhomogeneousrandom graphs with finitely many types.


We conclude that

E[Nn(t, q)2]

n2= P(B(n;Q)

k (o1), B(n;Q)

k (o2) ' (t, q), o2 /∈ B(n)

2k (o1)) + o(1). (3.5.20)

We now condition on B(n;Q)

o1(k) ' (t, q), and write

P(B(n;Q)

k (o1), B(n;Q)

k (o2) ' (t, q), o2 /∈ B(n)

2k (o1)) (3.5.21)

= P(B(n;Q)

k (o2) ' (t), q | B(n)

o1(k) ' (t, q), o2 /∈ B(n)

2k (o1))

× P(B(n;Q)

k (o1) ' (t, q), o2 /∈ B(n)

2k (o1)).

We already know that P(B(n;Q)

k (o1) ' (t, q))→ P(BP≤k ' (t, q)), so that also

P(B(n;Q)

o1(k) ' (t, q), o2 /∈ B(n)

2k (o1))→ P(BP≤k ' (t, q)). (3.5.22)

In Exercise 3.25, you prove that indeed (3.5.22) holds.We next investigate the conditional probability, by noting that, conditionally on

B(n;Q)

k (o1) ' (t, q) and o2 /∈ B(n)

2k (o1), the probability that B(n;Q)

k (o2) = (t, q) is thesame as the probability that B(n;Q)

k (o2) ' (t, q) in IRGn′(κn′) which is obtained byremoving the vertices in B(n;Q)

k (o1) as well as the edges from them. We concludethat the resulting random graph has n′ = n − |t| vertices, and n′q = nq − mq

vertices of type q ∈ [m], where mq is the number of type q vertices in (t, q).Further, κn′(i, j) = κn(i, j)n′/n. The whole point is that κn′(i, j) → κ(i, j) andn′q/n→ µ(q) still hold. Therefore, also

P(B(n;Q)

k (o2) ' (t, q) | B(n;Q)

k (o1) = (t, q), o2 /∈ B(n)

2k (o1))

(3.5.23)

→ P(BP≤k ' (t, q)).

and we have proved that E[Nn(t, q)2]/n2 → P(BP≤k ' (t, q))2. From this, (3.5.14)follows directly since E[Nn(t, q)]/n→ P(BP≤k ' (t, q)). As a result, E[Nn(t, q)]/n

is concentrated and thus Nn(t, q)/nP−→ P(BP≤k ' (t, q)), as required.

Lemma 3.12 completes the proof of Theorem 3.11 in the finite-types case.

3.5.2 Local weak convergence: infinitely many types

We next extend the proof of Theorem 3.11 to the infinite-types case. We followthe strategy in Section 3.3.3.

Fix a general sequence of graphical kernels (κn). Again define κ−m by (3.3.17),so that κn ≥ κ−m. Couple IRGn(κ−m) and IRGn(κn) such that E(IRGn(κ−m)) ⊆E(IRGn(κn)). Let ε′ > 0 be given. Recall (3.3.22), which shows that, whp, wecan take m so large that the bound

E(IRGn(κ−m)) ⊆ E(IRGn(κn)) =∑u∈[n]

(Du −D(m)

u ) ≤ ε′n (3.5.24)

holds whp We let K denote the maximal degree in T . Let N (m)

n (t, q) denoteNn(t, q) for the kernel κ−m (and keep Nn(t, q) for (3.5.4) for the kernel κn). If avertex v is such that B(n;Q)

k (v) ' (t, q) in IRGn(κ−m), but not in IRGn(κn), or


vice versa, then one of the vertices in B(n;Q)

k−1 (v) needs to have a different degreein IRGn(κ−m) than in IRGn(κn). Thus,

|N (m)

n (t, q)−Nn(t, q)| (3.5.25)

≤∑u,v

1u∈B(n;Q)k−1 (v),B

(n;Q)k (v)'(t,q) in IRGn(κ−m)1Du 6=D(m)

u

+∑u,v

1u∈B(n;Q)k−1 (v),B

(n;Q)k (v)'(t,q) in IRGn(κn)1D(m)

u 6=Du.

Recall that the maximal degree of any vertex in T is K. Further, if B(n;Q)

k (v) '(t, q) and u ∈ B(n;Q)

k−1 (v), then all the vertices on the path between u and v havedegree at most K. Therefore,∑

v

1u∈B(n;Q)k−1 (v),B

(n;Q)k (v)'(t,q) in IRGn(κ−m) ≤

∑`≤k−1

K` ≤ Kk − 1

K − 1, (3.5.26)

and in the same way,∑v

1u∈B(n;Q)k−1 (v),B

(n;Q)k (v)'(t,q) in IRGn(κn) ≤

Kk − 1

K − 1. (3.5.27)

We thus conclude that whp

|N (m)

n (t, q)−Nn(t, q)| ≤ 2Kk − 1

K − 1

∑u∈[n]

1D(m)u 6=Du (3.5.28)

≤ 2Kk − 1

K − 1

∑u∈[n]

(D(m)

u −Du) ≤ 2Kk − 1

K − 1ε′n.

Taking ε′ = ε(K − 1)/[2(Kk − 1)], we thus obtain that

|N (m)

n (t, q)−Nn(t, q)| ≤ εn, (3.5.29)

as required.For N (m)

n (t, q), we can use Theorem 3.11 in the finite-types case to obtain that

1

nN (m)

n (t, q)P−→ P(BP(m)

≤k ' (t, q)). (3.5.30)

The fact that m can be taken so large that |P(BP(m)

≤k ' (t, q))−P(BP≤k ' (t, q))| ≤ε follows from Theorem 3.10.

3.5.3 Comparison to branching processes

In this section, we describe a beautiful comparison of the neighborhoods of auniformly chosen vertex in rank-1 inhomogeneous random graphs, such as thegeneralized random graph, the Chung-Lu model and the Norros-Reittu model,and a marked branching process. This comparison is particularly pretty whenconsidering the Norros-Reittu model, where there is an explicit stochastic dom-ination result of these neighborhoods are bounded by a unimodular branching


process with a mixed Poisson offspring distribution. 1 We start by describing theresult for the rank-1 setting, after which we extend it to kernels with finitelymany types.

Stochastic domination of clusters by a branching process

We shall dominate the cluster of a vertex in the Norros-Reittu model by the totalprogeny of a two-stage branching processes with mixed Poisson offspring. Thisdomination is such that we also control the difference, and makes the heuristicargument below Theorem 3.17 precise.

We now describe the cluster exploration of a uniformly chosen vertex o ∈ [n].Define the mark distribution to be the random variable M with distribution

P(M = m) = wm/`n, m ∈ [n]. (3.5.31)

Let (Xw)w be a collection of independent random variables, where

(a) the number of children of the root X∅ has a mixed Poisson distribution withrandom parameter wM∅ , where M∅ is uniformly chosen in [n];

(b) Xw has a mixed Poisson distribution with random parameter wMw, where

(Mw)w 6=∅ are i.i.d. random marks with distribution (3.5.31) independently ofM∅.

We call (Xw,Mw)w a marked mixed-Poisson branching process (MMPBP).Clearly, wo = wM∅ has distribution Wn defined in (1.3.10), while the distribu-

tion of wMwfor each w with |w| ≥ 1 is i.i.d. with distribution wM given by

P(wM ≤ x) =n∑

m=1

1wm≤xP(M = m) =1

`n

n∑m=1

wm1wm≤x (3.5.32)

= P(W ?n ≤ x) = F ?

n(x),

where W ?n is the size-biased distribution of Wn and F ?

n is given by

F ?n(x) =

1

`n

n∑m=1

wm1wm≤x. (3.5.33)

When we are only interested in numbers of individuals, then we obtain a uni-modular branching process since the random variables (Xw)w are independent,and the random variables (Xw)w 6=∅ are i.i.d. However, in the sequel, we makeexplicit use of the marks (Mw)w 6=∅, as the complete information (Xw,Mw)w givesus a way to retrieve the cluster of the vertex M∅, something that would not bepossible on the basis of (Xw)w only.

In order to define the cluster exploration in NRn(w), we introduce a thinningthat guarantees that we only inspect a vertex once. We think of Mw as being thevertex label in NRn(w) of the tree vertex w, and Xw = Poi(wMw

) as its potentialnumber of children. These potential children effectively become children when

1 In van den Esker et al. (2008), the unimodular branching process is called a delayed branching

process, and also the term two-stage branchign process has been used.


their marks correspond to vertices in NRn(w) that have not yet appeared. Thethinning ensures this. To describe the thinning, we set ∅ unthinned, and, for wwith w 6= ∅, we thin w when either (i) one of the tree vertices on the (unique)path between the root ∅ and w has been thinned, or (ii) when Mw = Mw′ forsome unthinned vertex w′ < w. We now make the connection between the thinnedmarked mixed Poisson branching process and the cluster exploration precise:

Proposition 3.13 (Clusters as thinned marked branching processes) The clus-ter of a uniformly chosen vertex C (o) is equal in distribution to Mw : w unthinned,the marks of unthinned vertices encountered in the marked mixed Poisson branch-ing process up to the end of the exploration. Similarly, the set of vertices at graphdistance k from o has the same distribution as(

Mw : w unthinned, |w| = k)k≥0

. (3.5.34)

Proof We prove the two statements simultaneously. By construction, the dis-tribution of o is the same as that of M∅, the mark of the root of the markedmixed Poisson branching process. We continue by proving that the direct neigh-bors of the root ∅ agree in both constructions. In NRn(w), the direct neighborsare equal to j ∈ [n] \ l : Ilj = 1, where (Ilj)j∈[n]\l are independent Be(plj)random variables with plj = 1− e−wlwk/`n .

We now prove that the same is true for the marked mixed Poisson branchingprocess. Conditionally on M∅ = l, the root has a Poi(wl) number of children,where these Poi(wl) offspring receive i.i.d. marks. We make use of the fundamental‘thinning’ property of the Poisson distribution in Lemma 3.9. By Lemma 3.9, therandom vector (X∅,j)j∈[n], where X∅,j is the number of offspring of the rootthat receive mark j, is a vector of independent Poisson random variables withparameters wlwj/`n. Due to the thinning, a mark occurs precisely when X∅,j ≥ 1.Therefore, the mark j occurs, independently for all j ∈ [n], with probability1 − e−wlwj/`n = p(NR)

jk . This proves that the set of marks of children of the rootin the MMPBD has the same distribution as the set of neighbors of the chosenvertex in NRn(w).

Next, we look at the number of new elements of C (o) neighboring the vertexwhich has received word w. First, condition on Mw = l, and assume that w is notthinned. Conditionally on Mw = l, the number of children of w in the MMPBPhas distribution Poi(wl). Each of these Poi(wl) children receives an i.i.d. mark.Let Xw,j denote the number of children of w that receive mark j.

By Lemma 3.9, (Xw,j)j∈[n] is again a vector of independent Poisson randomvariables with parameters wlwj/`n. Due to the thinning, a mark appears withinthe offspring of individual w precisely when Xw,j ≥ 1, and these events areindependent. In particular, for each j that has not appeared as the mark of anunthinned vertex, the probability that it occurs equals 1 − e−wjwk/`n = p(NR)

jk , asrequired.


Stochastic domination by branching processes: finite-type case

The rank-1 setting described above is special, since the marks of vertices in thetree are independent random variables in that they do not depend on the type oftheir parent. This is in general not true. We next describe how the result can begeneralized. We restrict to the finite-type case.

Let us introduce some notation. Recall that ni denotes the number of verticesof type i ∈ [m], and write n≤i =

∑j≤i nj. Define the intervals Ii = [n≤i] \ [n≤i]

(where, by convention, I1 = [n1]. We note that all vertices in the intervals Ii playthe same role, and this will be used crucially in our coupling.

We now describe the cluster exploration of a uniformly chosen vertex o ∈ [n]of type i. Define the mark distribution to be the random variable M(j) withdistribution

P(M(j) = `) =1

ni, ` ∈ Ij. (3.5.35)

Let (Xw)w be a collection of independent random variables, where

(a) the number of children of the root X∅ has a mixed Poisson distribution withrandom parameter λn(i) =

∑j κn(i, j), and each of the children of ∅ receives

a type Tw, where Tw = j with probability κn(i, j)/λn(i);

(b) given that a vertex w has type j, it receives a mark Mw(j) with distributionin (3.5.35).

(b) Xw has a mixed Poisson distribution with random parameter wMw, where

(Mw)w 6=∅ are i.i.d. random marks with distribution (3.5.31) independently ofM∅.

We call (Xw, Tw,Mw)w a marked multi-type Poisson branching process. Then, thefollowing extension of Proposition 3.13 holds:

Proposition 3.14 (Clusters as thinned marked multiype branching processes)The cluster of a vertex C (v) of type i is equal in distribution to Mw : w unthinned,the marks of unthinned vertices encountered in the marked multi-type Poissonbranching process up to the end of the exploration. Similarly, the set of verticesat graph distance k from o has the same distribution as(

Mw : w unthinned, |w| = k)k≥0

. (3.5.36)

You are asked to prove Proposition 3.14 in Exercise 3.28.

3.5.4 Local weak convergence to unimodular trees for GRGn(w)

We close this section by investigating the locally tree-like nature of the gener-alized random graph. Our main result is as follows:

Theorem 3.15 (Locally tree-like nature GRGn(w)) Assume that Condition1.1(a)-(b) holds. Then GRGn(w) converges locally-weakly in probability to the

3.6 The phase transition for inhomogeneous random graphs 117

unimodular Galton-Watson tree with offspring distribution (pk)k≥0 given by

pk = P(D = k) = E[e−W

W k

k!

]. (3.5.37)

This result also applies to NRn(w) and CLn(w) under the same conditions.

Theorem 3.15 follows directly from Theorem 3.11. However, we will also givean alternative proof by following a different route, by relying on the local tree-likenature of CMn(d) proved in Theorem 4.1, and the relation between GRGn(w)and CMn(d) discussed in Section 1.3 and using Theorem 1.7. This approach isinteresting in itself, since it allows for general proofs for GRGn(w) by provingthe result first for CMn(d), and then merely extending it to GRGn(w). We willfrequently rely on such a proof.

3.6 The phase transition for inhomogeneous random graphs

In this section, we discuss the phase transition in IRGn(κ). The main resultshows that there is a giant component when the associated multi-type branchingprocess is supercritical, while otherwise there is not:

Theorem 3.16 (Giant component of IRG) Let (κn) be a sequence of irreduciblegraphical kernels with limit κ, and let Cmax denote the largest connected compo-nent of IRGn(κn). Then,

|Cmax|/n P−→ ζκ. (3.6.1)

In all cases ζκ < 1, while ζκ > 0 precisely when ‖Tκ‖ > 1.

Theorem 3.16 is a generalization of the law of large numbers for the largestconnected component in [Volume 1, Theorem 4.8] for ERn(λ/n) (see Exercise3.31).

We do not give a complete proof of Theorem 4.8 in this chapter. The upperbound follows directly from Theorem 3.11, together with the realization in (2.5.6).For the lower bound, it suffices to prove this for kernels with finitely-many types,by Lemma 3.5. This will be done in Chapter 6. We close this section by discussinga few examples of Theorem 3.16:

The bipartite random graph

We let n be even and take S = 1, 2 and

κn(x, y) = κ(x, y) = λ1x 6=y/2. (3.6.2)

Thus, for i < j, the edge probabilities pij given by (3.2.7) are equal to λ/(2n)(for 2n > λ) when i ∈ [n/2] and j ∈ [n] \ [n/2].

In this case, ‖Tκ‖ = λ with corresponding eigenfunction f(x) = 1 for all x ∈ S.Thus, Theorem 3.16 proves that there is a phase transition at λ = 2. Furthermore,the function ζλ(x) reduces to the single value ζλ/2, which is the survival probability


of a Poisson branching process with mean offspring λ/2. This is not surprising,since the degree of each vertex is Bin(n/2, λ/n), so the bipartite random graphof size n is quite closely related the Erdos-Renyi random graph of size n/2.

The finite-type case

The bipartite random graph can also be viewed as a random graph with two typesof vertices (i.e., the vertices [n/2] and [n]\[n/2]). We now generalize the results tothe finite-type case, in which we have seen that κn is equivalent to an r×r-matrix(κn(i, j))i,j∈[r], where r denotes the number of types. In this case, IRGn(κ) hasvertices of r different types (or colors), say ni vertices of type i, with two verticesof type i and j joined by an edge with probability n−1κn(i, j) (for n ≥ maxκn).This case has been studied by Soderberg Soderberg (2002, 2003a,c,b), who notedTheorem 3.16 in this case. Exercises 3.29–3.30 investigate the phase transition inthe finite-type case.

The random graph with prescribed expected degrees

We next consider the Chung-Lu model or expected degree random graph, whereκn is given by (3.2.12), i.e., κn(i/n, j/n) = wiwj/E[Wn].

We first assume that Condition 1.1(a)-(c) holds, so that in particular E[W 2] <∞, where W has distribution function F . A particular instance of this case is thechoice wi = [1 − F ]−1(i/n) in (1.3.15). In this case, the sequence (κn) convergesto κ, where the limit κ is given by (recall (3.9.2))

κ(x, y) = ψ(x)ψ(y)/E[W ], (3.6.3)

where ψ(x) = [1− F ]−1(x) Then, we note that for each f ≥ 0 with ‖f‖2 = 1,

(Tκf)(x) = ψ(x)

∫S ψ(x)f(x)µ(dx)∫S ψ(x)µ(dx)

, (3.6.4)

so that ‖Tκf‖2 =∫S ψ(x)f(x)µ(dx)/

∫S ψ(x)µ(dx), which is maximal when f(x) =

ψ(x)/‖ψ‖2. We conclude that ‖Tκ‖ = ‖ψ‖22/∫S ψ(x)µ(dx) = E[W 2]/E[W ]. Thus,

‖Tκ‖ = E[W 2]/E[W ], (3.6.5)

and we recover the results in Chung and Lu (2004, 2006b) in the case whereE[W 2] < ∞. In the case where E[W 2] = ∞, on the other hand, we see that‖Tκf‖2 =∞ for every f with ‖f‖2 = 1 such that

∫S ψ(x)f(x)µ(dx) 6= 0, so that

‖Tκ‖ =∞, so that CLn(w) is always supercritical in this regime.While Theorem 3.16 identifies the phase transition in IRGn(κn), for the rank-

1 setting, we will prove quite some stronger results as we discuss now. In itsstatement, we denote the complexity of a component C to be equal to E(C ) −V (C ) + 1, which equals the maximal number of edges that need to be removedto turn C into a tree. The main result is as follows:

Theorem 3.17 (Phase transition in generalized random graphs) Suppose thatCondition 1.1(a)-(b) hold and consider the random graphs GRGn(w),CLn(w) orNRn(w), letting n → ∞. Denote pk = P(Poi(W ) = k) as defined below (1.3.22).

3.6 The phase transition for inhomogeneous random graphs 119

Let Cmax and C(2) be the largest and second largest components of GRGn(w),CLn(w)or NRn(w).

(a) If ν = E[W 2]/E[W ] > 1, then there exist ξ ∈ (0, 1), ζ ∈ (0, 1) such that

|Cmax|/n P−→ ζ,

vk(Cmax)/nP−→ pk(1− ξk), for every k ≥ 0,

|E(Cmax)|/n P−→ 12E[W ](1− ξ2).

while |C(2)|/n P−→ 0 and |E(C(2))|/n P−→ 0. Further, 12E[W ](1− ξ2) < ζ, so that

the complexity of the giant is linear.(b) If ν = E[W 2]/E[W ] ≤ 1, then |Cmax|/n P−→ 0 and |E(Cmax)|/n P−→ 0.

The above results apply to GRGn(w) and CLn(w) under the same conditions.

The proof of Theorem 3.17 is deferred to Section 4.2.3 in Chapter 4, wherea similar result is proved for the configuration model. By the strong relationbetween the configuration model and the generalized random graph, this resultcan be seen to imply Theorem 3.17.

Let us discuss some implications of Theorem 3.17, focussing on the supercriticalcase where ν = E[W 2]/E[W ] > 1. In this case, the parameter ξ is the extinctionprobability of a branching process with offspring distribution p?k = P(Poi(W ?) =k), where W ? is the size-biased version of W . Thus,

ξ = GPoi(W?)(ξ) = E[eW?(ξ−1)], (3.6.6)

where GPoi(W?)(s) = E[sPoi(W?)] is the probability generating function of a mixed

Poisson random variable with mixing distribution W ?.Further, since vk(Cmax)/n

P−→ pk(1− ξk) and |Cmax|/n P−→ ζ, it must be that

ζ =∑k≥0

pk(1− ξk) = 1−GD(ξ), (3.6.7)

where GD(s) = E[sD] is the probability generating function of D = Poi(W ). We

also note that |E(Cmax)|/n P−→ η with η = 12E[W ](1− ξ2), so that

η = 12

∑k≥0

kpk(1− ξk) = 12E[W ]

∑k≥0

kpkE[W ]

(1− ξk) (3.6.8)

= 12E[W ]

(1− ξGPoi(W?)(ξ)

)= 1

2E[W ]

(1− ξ2),

as required.We now compare the limiting total number of edges to the limiting total size

of Cmax. We note that since f(k) = k and g(k) = 1− ξk are both increasing,∑k≥0

kpk(1− ξk) >∑k≥0

kpk∑k≥0

(1− ξk)pk = E[W ]ζ. (3.6.9)

As a result,

η > 12E[W ]ζ. (3.6.10)


Thus, the average degree η/ζ in the giant component is strictly larger than theaverage degree in the entire graph E[W ]/2.

We finally show that η > ζ, so that the giant has linear complexity. By con-vexity of x 7→ xk−1 and the fact that ξ < 1, for k ≥ 1,

k−1∑i=0

ξi ≤ k(1 + ξk−1)/2 (3.6.11)

with strict inequality for k ≥ 3. Multiply by 1− ξ to obtain

1− ξk ≤ k(1− ξ)(1 + ξk−1)/2, (3.6.12)

again for every k ≥ 1, again with strict inequality for k ≥ 3. Now multiply by pkand sum to get ∑

k

pk(1− ξk) ≤ (1− ξ)∑k

kpk(1 + ξk−1)/2. (3.6.13)

The lhs of (3.6.13) equals ζ by (3.6.7). We next investigate the rhs of (3.6.13).Recall that ∑

k

kpk = E[W ], (3.6.14)

and ∑k

kpkE[W ]

ξk−1 = ξ. (3.6.15)

Hence, the rhs of (3.6.13) is

(1− ξ)(E[W ] + E[W ]ξ)/2 = E[W ](1− ξ2)/2, (3.6.16)

which is the limit in probability of |E(Cmax)|/n.

We close this section by discussing the consequences of the phase transition forthe attack vulnerability of CLn(w):

Attack vulnerability of CLn(w)

Suppose an adversary attacks a network by removing some of its vertices. A cleveradversary would remove the vertices in a clever way, this is often referred to asa deliberate attack. On the other hand, the vertices might also be exposed torandom failures, which is often referred to as a random attack. The results asstated above do not specifically apply to these settings, but do= have intuitiveconsequences. We model a deliberate attack as the removal of a proportion of thevertices with highest weights, whereas a random attack is modeled by randomremoval of the vertices with a given probability. One of the aims is to quantify theeffect of such attacks, and in particular the difference in random and deliberateattacks. We denote the proportion of removed vertices by p. We shall alwaysassume that ν > 1, so that a giant component exists, and we investigate underwhat conditions on p and the graph CLn(w), the giant component remains toexist.

3.7 Related results for inhomogeneous random graphs 121

We start by addressing the case of random attack for the CLn(w) model un-der Condition 1.1(a)-(c), where E[W 2] < ∞. One of the difficulties of the aboveset-up is that we remove vertices rather than edges, so that the resulting graphis no longer an IRG. In percolation jargon, we deal with site percolation ratherthan with edge percolation. We start by relating the obtained graph to an inho-mogeneous random graph.

Note that when we explore a cluster of a vertex after an attack, then the vertexmay not have been affected by the attack, which has probability p. After this, inthe exploration, we always inspect an edge between a vertex which is unaffectedby the attack and a vertex of which we do not yet know whether it has beenattacked or not. As a result, for random attacks, the probability that it is affectedis precisely equal to p. Therefore, it is similar to the random graph where pij isreplaced with (1−p)×pij. For a branching process, this identification is exact, andwe have that ζκ,p = (1−p)ζ(1−p)κ, where ζκ,p denotes the survival probability of thebranching process where each individual is killed with probability p independentlyof all other randomness. For CLn(w), this equality is only asymptotic. In thecase where E[W 2] < ∞, so that ν < ∞, this means that there exists a criticalvalue pc = 1 − 1/ν, such that if p < pc, the CLn(w) where vertices are removedwith probability p, the giant component persists, while if p > pc, then the giantcomponent is destroyed. Thus, when E[W 2] < ∞, the CLn(w) is sensitive torandom attacks. When E[W 2] =∞, on the other hand, ν =∞, so that the giantcomponent persists for every p ∈ [0, 1), and the graph is called robust to randomattacks. Here we must note that the size of the giant component does decrease,since ζκ,p < pζκ!

For a deliberate attack, we remove the proportion p of vertices with highestweight. This means that w is replaced with w(p), which is equal to wi(p) =wi1i>np, and we denote the resulting edge probabilities by

pij(p) = max1, wi(p)wj(p)/`n. (3.6.17)

In this case, the resulting graph on [n]\ [np] is again a Chung-Lu model, for whichν is replaced with ν(p) given by

ν(p) = E[ψ(U)21U>p]/E[W ], (3.6.18)

where U is uniform on [0, 1] and we recall that we denote ψ(u) = [1 − F ]−1(u).Now, for any distribution function F , E[[1 − F ]−1(U)2

1U>p] < ∞, so that, forp sufficiently close to 1, ν(p) < 1 (see Exercise 3.38). Thus, the CLn(w) model isalways sensitive to deliberate attacks.

3.7 Related results for inhomogeneous random graphs

In this section, we discuss some related results for inhomogeneous randomgraphs. While we give intuition about their proofs, we do not include them in fulldetail.


The largest subcritical cluster

For the classical random graph ERn(λ/n), it is well known that in the subcriticalcase for which λ < 1, the stronger bound |Cmax| = Θ(log n) holds (see [Vol-ume 1, Theorems 4.4–4.5]), and that in the supercritical case for which λ > 1,|C(2)| = Θ(log n). These bounds do not always hold in the general framework weare considering here, but if we add some conditions, then we can improve theestimates in Theorem 3.16 for the subcritical case to O(log n):

Theorem 3.18 (Subcritical phase and duality principle of inhomogeneous ran-dom graphs) Consider the inhomogeneous random graph IRGn(κn), where (κn)is a graphical sequence of kernels with limit κ. Then,

(i) if κ is subcritical and supx,y,n κn(x, y) <∞, then |Cmax| = OP(log n).(ii) if κ is supercritical, κ is irreducible, and either infx,y,n κn(x, y) > 0 or supx,y,n κn(x, y) <∞, then |C(2)| = OP(log n).

When limn→∞ supx,y κn(x, y) = ∞, the largest subcritical clusters can haverather different behavior, as we now show for the rank-1 case. Note that, byTheorem 3.16 as well as the fact that ‖Tκ‖ = ν = E[W 2]/E[W ], a rank-1 modelcan only be subcritical when E[W 2] < ∞, i.e., in the case of finite variancedegrees. However, when W has a power-law tail, i.e., when P(W ≥ w) ∼ w−(τ−1),then the highest weight can be much larger than log n. When this is the case,then also the largest subcritical cluster is much larger than log n, as proved inthe following theorem:

Theorem 3.19 (Subcritical phase for rank-1 inhomogeneous random graphs)Let w satisfy Condition 1.1(a)-(c) with ν = E[W 2]/E[W ] < 1, and, further, thatthere exist τ > 3 and c2 > 0 such that

[1− Fn](x) ≤ c2x−(τ−1). (3.7.1)

Then, for NRn(w) with ∆ = maxj∈[n]wj,

|Cmax| =∆

1− ν + oP(n1/(τ−1)). (3.7.2)

Theorem 3.19 is most interesting in the case where the limiting distributionfunction F in Condition 1.1 has a power-law tail. For example, forw as in (1.3.15),let F satisfy

[1− F ](x) = cx−(τ−1)(1 + o(1)). (3.7.3)

Then, ∆ = w1 = [1− F ]−1(1/n) = (cn)1/(τ−1)(1 + o(1)). Therefore,

|Cmax| = (cn)1/(τ−1)/(1− ν) + o(n1/(τ−1). (3.7.4)

Thus, the largest connected component is much larger than for ERn(λ/n) withλ < 1.

Theorem 3.19 can be intuitively understood as follows. The connected com-ponent of a typical vertex is close to a branching process, so that it is withhigh probability bounded since the expected value of its cluster will be close to

3.7 Related results for inhomogeneous random graphs 123

1/(1− ν). Thus, the best way to obtain a large connected component is to startwith a vertex with high weight wi, and let all of its roughly wi children be in-dependent branching processes. Therefore, in expectation, each of these childrenis connected to another 1/(1 − ν) different vertices, leading to a cluster size ofroughly wi/(1− ν). This is clearly largest when wi = maxj∈[n]wj = ∆, leading toan intuitive explanation of Theorem 3.19.

Theorems 3.18 and 3.19 raise the question what the precise conditions for |Cmax|to be of order log n are. Intuitively, when ∆ log n, then |Cmax| = ∆/(1−ν)(1+oP(1)), whereas if ∆ = Θ(log n), then |Cmax| = ΘP(log n) as well. In Turova (2011),it was proved that |Cmax|/ log n converges in probability to a finite constant whenν < 1 and the weights are i.i.d. with distribution function F with E[eαW ] < ∞for some α > 0, i.e., exponential tails are sufficient.

The critical behavior of rank-1 random graphs

We next discuss the effect of inhomogeneity on the size of the largest connectedcomponents in the critical case. As it turns out, the behavior is rather differentdepending on whether E[W 3] <∞ or not.

Theorem 3.20 (The critical behavior with finite third moments) Fix the Norros-Reittu random graph with weights w(t) = w(1+ tn(τ−3)(τ−1)). Assume that ν = 1,that the weight sequence w satisfies Condition 1.1(a)-(c), and further assume that

E[Wn] = E[W ]+o(n−1/3), E[W 2n ] = E[W 2]+o(n−1/3), E[W 3

n ] = E[W 3]+o(1)(3.7.5)

Let (|C(i)(t)|)i≥1 denote the clusters of NRn(w(t)) with w(t) = (1 + tn−1/3)w,ordered in size. Then, as n→∞, for all t ∈ R,(

n−2/3|C(i)(t)|)i≥1

d−→(γ?i (t)

)i≥1, (3.7.6)

in the product topology, for some limiting random variables(γ?i (t)

)i≥1

.

The limiting random variables(γ?i (t)

)i≥1

are, apart from a multiplication by a

constant and a time-rescaling, equal to those for ERn(λ/n) in the scaling window(see Theorem 5.7).

When E[W 3−ε] =∞ for some ε > 0, it turns out that the scaling of the largestcritical cluster is rather different:

Theorem 3.21 (Weak convergence of the ordered critical clusters for τ ∈ (3, 4))Fix the Norros-Reittu random graph with weights w(t) = w(1 + tn(τ−3)(τ−1))defined in (1.3.15). Assume that ν = 1 and that there exists a τ ∈ (3, 4) and0 < cF <∞ such that

limx→∞

xτ−1[1− F (x)] = cF . (3.7.7)

Let (|C(i)(t)|)i≥1 denote the clusters of NRn(w(t)), ordered in size. Then, as n→∞, for all t ∈ R, (

n−(τ−2)/(τ−1)|C(i)(t)|)i≥1

d−→ (γi(t))i≥1, (3.7.8)


in the product topology, for some non-degenerate limit (γi(t))i≥1.

In this chapter, we have already seen that distances depend sensitively on thefiniteness of E[W 2]. Now we see that the critical behavior is rather different whenE[W 3] < ∞ or E[W 3] = ∞. Interestingly, in the power-law case as describedin (3.7.7), the size of the largest clusters grows like n(τ−2)/(τ−1), which is muchsmaller than the n2/3 scaling when E[W 3] < ∞. The proof of Theorems 3.20and 3.21 also reveals that the structure of large critical clusters is quite different.When E[W 3] < ∞, then the vertex with largest weight is in the largest con-nected component with vanishing probability. Therefore, the largest connectedcomponent arises by many attempts to create a large cluster, and each trial hasroughly the same probability. This can be formulated as power to the masses. Inthe other hand, for weights w as in (1.3.15) for which (3.7.7) holds, the verticeswith largest weight are with probability bounded away from 0 and 1 in the largestcluster, while a vertex with small weight is in the largest cluster with vanishingprobability. Thus, to find the largest clusters, it suffices to explore the clusters ofthe high-weight vertices: power to the wealthy!


Notes on Section 3.1.


The seminal paper Bollobas et al. (2007) studies inhomogeneous random graph inan even more general setting, where the number of vertices in the graph need notbe equal to n. In this case, the vertex space is called a generalized vertex space.We simplify the discussion here by assuming that the number of vertices is alwaysequal to n. An example where the extension to a random number of vertices iscrucially used is in Turova and Vallier (2010), which studies an interpolationbetween percolation and ERn(p).


Theorem 3.4 is a special case of (Bollobas et al., 2007, Theorem 3.13).


See (Athreya and Ney, 1972, Chapter V) or (Harris, 1963, Chapter III) for morebackground on multi-type branching processes.


Theorem 3.11 is novel to the best of our knowledge, even though Bollobas et al.(2007) prove various relations between inhomogeneous random graphs and branch-ing processes. Proposition 3.13 appears first as (Norros and Reittu, 2006, Proposi-tion 3.1), where the connection between NRn(w) and Poisson branching processeswere first exploited to prove versions of Theorem 6.3.



Theorem 3.17 is taken from Janson and Luczak (2009), where the giant compo-nent is investigated for the configuration model. We explain its proof in detail inSection 4.2, where we also prove how the result for the configuration model inTheorem 4.4 can be used to prove Theorem 3.17. Theorem 3.16 is a special caseof (Bollobas et al., 2007, Theorem 3.1). Earlier versions for random graphs withgiven expected degrees or Chung-Lu model appeared in Chung and Lu (2002b,2006b) (see also the monograph Chung and Lu (2006a)). I have learned the proofof the linear complexity of the giant in Theorem 3.17 from Svante Janson.

Bollobas et al. (2007) prove various other results concerning the giant compo-nent of IRGn(κ). For example, (Bollobas et al., 2007, Theorem 3.9) proves thatthe giant component of IRGn(κ) is stable in the sense that its size does not changemuch if we add or delete a few edges. Note that the edges added or deleted donot have to be random or independent of the existing graph, rather, they can bechosen by an adversary after inspecting the whole of IRGn(κ). More precisely,(Bollobas et al., 2007, Theorem 3.9) shows that, for small enough δ > 0, the giantcomponent of IRGn(κ) in the supercritical regime does change by more than εnvertices if we remove any collection of δn edges.


Theorem 3.19 is (Janson, 2008, Corollary 4.4). Theorem 3.20 is proved in Bhamidiet al. (2010b), a related version with a different proof can be found in Turova(2013). Theorem 3.21 is proved in Bhamidi et al. (2012).


Exercise 3.1 (Erdos-Renyi random graph) Show that when S = [0, 1] andpij = κ(i/n, j/n)/n with κ : [0, 1]2 → [0,∞) being continuous, then the model isthe Erdos-Renyi random graph with edge probability λ/n precisely when κ(x, y) =λ. Is this also true when κ : [0, 1]2 → [0,∞) is not continuous?

Exercise 3.2 (Lower bound on expected number of edges) Show that whenκ : S × S is continuous, then

lim infn→∞

1

nE[E(IRGn(κ))] ≥ 1

2

∫∫S2

κ(x, y)µ(dx)µ(dy), (3.9.1)

so that the lower bound in (3.2.4) generally holds.

Exercise 3.3 (Expected number of edges) Show that when κ : S ×S is boundedand continuous, then (3.2.4) holds.

Exercise 3.4 (The Chung-Lu model) Prove that when κ is given by

κ(x, y) = [1− F ]−1(x)[1− F ]−1(y)/E[W ], (3.9.2)

then κ is graphical precisely when E[W ] <∞, where W has distribution functionF . Further, κ is always irreducible.


Exercise 3.5 (The Chung-Lu model repeated) Let wi = [1−F ]−1(i/n)√nE[W ]/`n

and wi = [1 − F ]−1(i/n) as in [Volume 1, (6.2.14)]. Then CLn(w) and CLn(w)

are asymptotically equivalent whenever (E[Wn]

`n− 1)2 = o(n).

Exercise 3.6 (Asymptotic equivalence for general IRGs) Prove that the randomgraphs IRGn(p) with pij as in (3.2.7) is asymptotically equivalent to IRGn(p) withpij = p(NR)

ij (κn) and to IRGn(p) with pij = p(GRG)

ij (κn) when (3.2.10) holds.

Exercise 3.7 (Definitions 3.2-3.3 for homogeneous bipartite graph) Prove thatDefinitions 3.2-3.3 hold for the homogeneous bipartite graph.

Exercise 3.8 (Examples of homogeneous random graphs) Show that the Erdos-Renyi random graph, the homogeneous bipartite random graph and the stochasticblockmodel are all homogeneous random graphs.

Exercise 3.9 (Homogeneous bipartite graph) Prove that the homogeneous bi-partite random graph is a special case of the finite-types case.

Exercise 3.10 (Irreducibility for the finite-types case) Prove that, in the finite-type case, irreducibility follows when there exists an m such that the mth powerof the matrix (κ(i, j)µ(j))i,j∈[r] contains no zeros.

Exercise 3.11 (Graphical limit in the finite-types case) Prove that, in thefinite-type case, (3.2.1) holds precisely when

limn→∞

ni/n = µi. (3.9.3)

Exercise 3.12 (Variance of number of vertices of type i and degree k) LetIRGn(κn) be a finite-type inhomogeneous random graph with graphical sequenceof kernels κn. Let Ni,k be equal to the number of vertices of type i and degree k.Show that Var(Ni,k) = O(n).

Exercise 3.13 (Proportion of isolated vertices in inhomogeneous random graphs)Let IRGn(κn) be an inhomogeneous random graph with graphical sequence of ker-nels κn that converge to κ. Show that the proportion of isolated vertices convergesto

1

nN0(n)

P−→ p0 =

∫e−λ(x)µ(dx). (3.9.4)

Conclude that p0 > 0 when∫λ(x)µ(dx) <∞.

Exercise 3.14 (Upper and lower bounding finite-type kernels) Prove that thekernels κ−m and κ+

m in (3.3.13) and (3.3.14) are of finite type.

Exercise 3.15 (Inclusion of graphs for larger κ) Let κ′ ≤ κ hold a.e. Show thatwe can couple IRGn(κ) and IRGn(κ) such that IRGn(κ) ⊆ IRGn(κ).

Exercise 3.16 (Tails of Poisson variables) Use stochastic domination of Poissonrandom variables with different parameters, as well as concentration propertiesof Poisson variables, to complete the proof of (3.3.27).


Exercise 3.17 (Power-laws for sum kernels) Let κ(x, y) = α(x) + α(y) for acontinuous function α : [0, 1] 7→ [0,∞). Use Corollary 3.7 to identify when thedegree distribution satisfies a power law. How is the tail behavior of D related tothat of α?

Exercise 3.18 (Irreducibility of multi-type branching process) Show that thepositivity of the survival probability ζ(i) of an individual of type i is independentof the type i when the probability that an individual of type j to have a type idescendent is strictly positive.

Exercise 3.19 (Irreducibility of multi-type branching process (Cont.)) Provethat the probability that an individual of type j to have a type i descendent isstrictly positive precisely when there exists an l such that T lκ(i, j) > 0, whereTκ(i, j) = κijµj is the mean offspring matrix.

Exercise 3.20 (Singularity of multi-type branching process) Prove that G(s) =Ms for some matrix M precisely when each individual in the Markov chain hasexactly one offspring.

Exercise 3.21 (Erdos-Renyi random graph) Prove that NRn(w) = ERn(λ/n)when w is constant with wi = −n log (1− λ/n).

Exercise 3.22 (Homogeneous Poisson multi-type branching processes) Con-sider a homogeneous Poisson multi-type branching process with parameter λ.Show that the function φ(x) = 1 is an eigenvector of Tκ with eigenvalue λ. Con-clude that (Zj)j≥0 is a martingale, where (Zj)j≥0 denotes the number of individ-uals in the jth generation.

Exercise 3.23 (Bounds on distances in the case where ‖Tκ‖ > 1) Fix ε >0. Use (3.5.19) to show that distIRGn(κn)(o1, o2) ≥ (1 − ε) log‖Tκ‖(n) whp when‖Tκ‖ > 1.

Exercise 3.24 (No large component when ‖Tκ‖ < 1) Use (3.5.19) to show thatdistIRGn(κn)(o1, o2) =∞ whp when ‖Tκ‖ < 1.

Exercise 3.25 (Proof of no-overlap property in (3.5.22)) Prove that P(B(n;Q)

k (o1) =(t, q), o2 ∈ B(n)

2k (o1))→ 0, and conclude that (3.5.22) holds.

Exercise 3.26 (Branching process domination of Erdos-Renyi random graph)Show that Exercise 3.21 together with Proposition 3.13 imply that |C (o)| T ?, where T ? is the total progeny of a Poisson branching process with mean−n log (1− λ/n) offspring.

Exercise 3.27 (Local weak convergence of ERn(λ/n)) Use Theorem 3.15 toshow that also ERn(λ/n) converges locally weakly in probability to the Galton-Watson tree with Poisson offspring distribution with parameter λ.

Exercise 3.28 (Coupling to a multi-type Poisson branching process) ProveProposition 3.14 by adapting the proof of Proposition 3.13.


Exercise 3.29 (Phase transition for r = 2) Let ζκ denote the survival probabilityof a two-types multi-type branching process. Compute ζκ and give necessary andsufficient conditions for ζκ > 0 to hold.

Exercise 3.30 (The size of small components in the finite-types case) Provethat, in the finite-types case, when (κn) converges, then supx,y,n κn(x, y) < ∞holds, so that the results of Theorem 3.18 apply in the sub- and supercriticalcases.

Exercise 3.31 (Law of large numbers for Cmax for ERn(λ/n)) Prove that, for

the Erdos-Renyi random graph, Theorem 3.16 implies that |Cmax|/n P−→ ζλ, whereζλ is the survival probability of a Poisson branching process with mean λ offspring.

Exercise 3.32 (Connectivity of uniformly chosen vertices) Suppose we drawtwo vertices uniformly at random from [n] in IRGn(κn). Prove that Theorem3.17 implies that the probability that the vertices are connected converges to ζ2.

Exercise 3.33 (The size of small components for CLn(w)) Use Theorem 3.18to prove that, for CLn(w) with weights given by (1.3.15) and with 1 < ν < ∞,the second largest cluster has size |C(2)| = OP(log n) when W has bounded supportor is a.s. bounded below by ε > 0, while if ν < 1, |Cmax| = O(log n) when W hasbounded support. Here W is a random variable with distribution function F .

Exercise 3.34 (Average degree in two populations) Show that the averagedegree is close to pm1 + (1− p)m2 in the setting of Example 3.1.

Exercise 3.35 (The phase transition for two populations) Show that the ζ > 0precisely when [pm2

1 + (1−p)m22]/[pm1 + (1−p)m2] > 1 in the setting of Example

3.1. Find an example of p,m1,m2 where the average degree is less than one, yetthere exists a giant component.

Exercise 3.36 (Degree sequence of giant component) Consider GRGn(w) asin Theorem 3.17. Show that the proportion of vertices of the giant componentCmax having degree k is close to pk(1− ξk)/ζ.

Exercise 3.37 (Degree sequence of complement of giant component) ConsiderGRGn(w) as in Theorem 3.17. Show that when ξ < 1, the proportion of verticesoutside the giant component Cmax having degree k is close to pkξ

k/(1− ζ). Con-clude that the degree sequence of the complement of the giant component neversatisfies a power law. Can you give an intuitive explanation for this?

Exercise 3.38 (Finiteness of ν(p)) Prove that ν(p) in (3.6.18) satisfies thatν(p) <∞ for every p ∈ (0, 1] and any distribution function F .

Chapter 4

The phase transition inthe configuration model

Abstract

In this chapter, we investigate the local weak limit of the configurationmodel. After this, we give a detailed proof of the phase transition in theconfiguration model by investigating its largest connected component.We identify when there is a giant component and find its size and de-gree structure. Further results include the connectivity transition of theconfiguration model as well as the critical behavior in the configurationmodel.

In this chapter, we study the connectivity structure of the configuration model.We focus on the local connectivity by investigating the local weak limit of theconfiguration model, as well as the global connectivity by identifying its giantcomponent as well as when it is completely connected. In inhomogeneous randomgraphs, there always is a positive proportion of vertices that are isolated (recallExercise 3.13). In many real-world examples, we observe the presence of a giantcomponent (recall Table 3.1). However, in many of these examples, the giantis almost the whole graph, and sometimes it, by definition is the whole graph.For example, Internet needs to be connected to allow e-mail messages to be sentbetween any pair of vertices. In many other real-world examples, though, it isnot at all obvious why the network needs te be connected.

Subject % in Giant Size Source Data

California roads 0.9958 1,965,206 Leskovec et al. (2009) Leskovec and Krevl (2014)Facebook 0.9991 721m Ugander et al. (2011) Ugander et al. (2011)

Hyves 0.996 8,047,530 Corten (2012) Corten (2012)arXiv astro-ph 0.9538 18,771 Leskovec et al. (2007) Kunegis (2017)US power grid 1 4,941 Watts and Strogatz (1998) Kunegis (2017)Jazz-musicians 1 198 Gleiser and Danon (2003) Kunegis (2017)

Table 4.1 The rows in the above table represent the following real-world networks:The California road network consists of nodes representing intersections or endpoints of roads.In the Facebook network nodes represent the users and the edges Facebook friendships.Hyves was a Dutch social media platform. Nodes represent the users, and an edge betweennodes represents a friendship.The arXiv astro-physics network represents authors of papers within the astro-physics sectionof arXiv. There is an edge between authors if they have ever been co-authors.Models the high voltage power network in western USA. The nodes denote transformerssubstations, and generators and the edges represent transmission cables.The data set consists of jazz-musicians and there exists a connected if they ever collaborated.

129

130 The phase transition in the configuration model

Table 4.1 invites us to think about what makes networks close to fully con-nected. We investigate this question here in the context of the configurationmodel. The advantage of the configuration model is that it highly flexible in itsdegree structure, making it possible for all degrees to be at least a certain value.We will see that this allows the random graph to be connected, while at the sametime remaining sparse, as is the case in many real-world networks.


This chapter is organized as follows. In Section 4.1, we study the local weak limitof the configuration model. In Section 4.2, we state and prove the law of largenumbers for the giant component in the configuration model, thus establishingthe phase transition. In Section 4.3, we study when the configuration model isconnected. In Section 4.4, we state further results in the configuration model. Weclose this chapter in Section 4.5 with notes and discussion and with exercises inSection 4.6.

4.1 Local weak convergence to unimodular trees for CMn(d)

We start by investigating the locally tree-like nature of the configuration model.Our main result is as follows:

Theorem 4.1 (Locally tree-like nature configuration model) Assume that Con-ditions 1.5(a)-(b) hold. Then CMn(d) converges locally-weakly in probability tothe unimodular Galton-Watson tree with root offspring distribution (pk)k≥0 givenby pk = P(D = k).

Before starting with the proof of Theorem 4.1, let us explain the above con-nection between local neighborhoods and branching processes. We note that theasymptotic offspring distribution at the root is equal to (pk)k≥0, where pk =P(D = k) is the asymptotic degree distribution, since the probability that a ran-dom vertex has degree k is equal to p(n)

k = P(Dn = k) = nk/n, where nk denotesthe number of vertices with degree k, which, by Condition 1.5(a), converges topk = P(D = k), for every k ≥ 1. This explains the offspring of the root of ourbranching process approximation.

The offspring distribution of the individuals in the first and later generationsis given by

p?k =(k + 1)pk+1

E[D]. (4.1.1)

We now heuristically explain this relation to branching processes by intuitivelydescribing the exploration of a vertex chosen uniformly from the vertex set [n].To describe the offspring of the direct neighbors of the root, we examine thedegree of the vertex to which the first half-edge incident to the root is paired.By the uniform matching of half-edges, the probability that a vertex of degree kis chosen is proportional to k. Ignoring the fact that the root and one half-edge

4.1 Local weak convergence to unimodular trees for CMn(d) 131

have already been chosen (which does have an effect on the number of available orfree half-edges), the degree of the vertex incident to the chosen half-edge equalsk with probability equal to kp(n)

k /E[Dn], where again p(n)

k = nk/n denotes theproportion of vertices with degree k, and

E[Dn] =1

n

∑i∈[n]

di =1

n

∑i∈[n]

∞∑k=0

k1di=k =∞∑k=0

kp(n)

k (4.1.2)

is the average degree in CMn(d). Thus, (kp(n)

k /E[Dn])k≥0 is a probability massfunction. However, one of the half-edges is used up to connect to the root, sothat, for a vertex incident to the root to have k offspring, it needs to connectits half-edge to a vertex having degree k + 1. Therefore, the probability that theoffspring of any of the direct neighbors of the root is equal to k equals

p?(n)

k =(k + 1)p(n)

k+1

E[Dn]. (4.1.3)

Thus, (p?(n)

k )k≥0 can be interpreted as the forward degree of vertices in the clusterexploration. When Condition 1.5(a)-(b) hold, also p?(n)

k → p?k, where (p?k)k≥0 isdefined in (4.1.1). As a result, we often refer to (p?k)k≥0 as the asymptotic forwarddegree distribution.

The above heuristically argues that the number of vertices unequal to the rootconnected to any direct neighbor of the root has asymptotic law (p?k)k≥0. How-ever, every time we pair two half-edges, the number of free or available half-edgesdecreases by 2. Similarly to the depletion-of-points effect in the exploration ofclusters for the Erdos-Renyi random graph ERn(λ/n), the configuration modelCMn(d) suffers from a depletion-of-points-and-half-edges effect. Thus, by iter-atively connecting half-edges in a breadth-first way, the offspring distributionchanges along the way, which gives potential trouble. Luckily, the number ofavailable half-edges that we start with equals `n − 1, which is very large whenCondition 1.5(a)-(b) hold, since then `n/n = E[Dn]/n→ E[D] > 0. Thus, we canpair many half-edges before we start noticing that their number decreases. As aresult, the degrees of different vertices in the exploration process is close to beingi.i.d., leading to a branching process approximation. In order to prove Theorem4.1, we will only need to pair a bounded number of edges.

In order to get started for the proof of (2.3.10) for Theorem 4.1, we introducesome notation. Fix a rooted tree t with k generations, and let

Nn(t) =∑v∈[n]

1B(n)k (v)'t (4.1.4)

denote the number of vertices whose local neighborhood up to generation t equalst. By Theorem 2.13, in order to prove Theorem 4.1, we need to show that

Nn(t)

nP−→ P(BP≤k ' t). (4.1.5)

Here, we also rely on Theorem 2.8 to see that it suffices to prove (4.1.5) for


trees, since the unimodular Galton-Watson tree is a tree with probability 1. Forthis, we will use a second moment method. We first prove that the first momentE[Nn(t)]/n → P(t(p, k) ' t), after which we prove that Var(Nn(t)) = o(n2).Then, by the Chebychev inequality [Volume 1, Theorem 2.18], (4.1.5) follows.

Local weak convergence for configuration model: first moment

We next relate the neighborhood in a random graph to a branching process wherethe root has offspring distribution Dn, while all other individuals have offspringdistribution D?

n − 1, where

P(D?n = k) =

k

E[Dn]P(Dn = k), k ∈ N, (4.1.6)

is the size-biased distribution ofDn. Denote this branching process by (BPn(t))t∈N0.

Here, BPn(t) denotes the branching process when it contains precisely t vertices,and we explore it in the breadth-first order. Clearly, by Conditions 7.8(a)-(b),

Dnd−→ D and D?

n

d−→ D?, which implies that BPn(t)d−→ BP(t) for every t fi-

nite, where BP(t) is the restriction of the unimodular branching process BP withoffspring distribution (pk)k≥1 fo which pk = P(D = k) to its first t individuals(see Exercise 4.1). For future reference, denote the offspring distribution of theabove branching process by

p?k = P(D? − 1 = k) (4.1.7)

Note that, when t is a fixed rooted tree of k generations, then BP≤k ' tprecisely when BP(tk) ' t, where tk denotes the number of vertices in t.

We let (Gn(t))t∈N0denote the graph exploration process from a uniformly

chosen vertex o ∈ [n]. Here Gn(t) is the exploration up to t vertices, in thebreadth-first manner. In particular, from (Gn(t))t∈N0

we can retrieve (Bo(t))t∈N0

for every t ≥ 0. The following lemma proves that we can couple the graph ex-ploration to the branching process in such a way that (Gn(t))0≤t≤mn is equal to(BPn(t))0≤t≤mn whenever mn →∞ arbitrarily slowly. In the statement, we write

(Gn(t), BPn(t))t∈N0for the coupling of (Gn(t))0≤t≤mn and (BPn(t))0≤t≤mn :

Lemma 4.2 (Coupling graph exploration and branching process) Subject to

Conditions 1.5(a)-(b), there exists a coupling (Gn(t), BPn(t))t∈N0of (Gn(t))0≤t≤mn

and (BPn(t))0≤t≤mn such that

P(

(Gn(t))0≤t≤mn 6= (BPn(t))0≤t≤mn

)= o(1), (4.1.8)

whenever mn →∞ arbitrarily slowly. Consequently, E[Nn(t)]/n→ P(BP≤k ' t).

In the proof, we will see that any mn = o(√n/dmax) is allowed. Here dmax =

maxi∈[n] di is the maximal vertex degree in the graph, which is o(n) when Condi-tions 7.8(a)-(b) hold.

4.1 Local weak convergence to unimodular trees for CMn(d) 133

Proof We let the offspring of the root of the branching process Dn be equal to do,which is the number of neighbors of the vertex o ∈ [n] that is chosen uniformly

at random. By construction, Dn = do, so that also Gn(1) = BPn(1). We next

explain how to jointly construct (Gn(t), BPn(t))0≤t≤m given that we have already

constructed (Gn(t), BPn(t))0≤t≤m−1.

To obtain (Gn(t)0≤t≤m, we take the first unpaired half-edge xm. This half-edgeneeds to be paired to a uniform half-edge that has not been paired so far. Wedraw a uniform half-edge ym from the collection of all half-edges, independently

of the past, and we let the (m−1)st individual in (BPn(t))0≤t≤m−1 have preciselydUm − 1 children. Note that dUm − 1 has the same distribution as D?

n − 1 and, byconstruction, the collection

(dUt −1

)t≥0

is i.i.d. When ym is still free, i.e., has not

yet been paired in (Gn(t))0≤t≤m−1, then we also let xm be paired to ym, and we

have constructed (Gn(t))0≤t≤m. However, a problem arises when ym has already

been paired in (Gn(t))0≤t≤m−1, in which case we draw a uniform unpaired half-edgey′m and pair xm to y′m instead. Clearly, this might give rise to a difference between

(Gn(t))t≤m and (BPn(t))0≤t≤m. We now provide bounds on the probability thatan error occurs before time mn.

There are two sources of differences between (Gn(t))t≥0 and (BPn(t))t≥0:

Half-edge re-use In the above coupling ym had already been paired and is beingre-used in the branching process, and we need to redraw y′m;

Vertex re-use In the above coupling, this means that ym is a half-edge that hasnot yet been paired in (Gn(t))0≤t≤m−1, but it is incident to a half-edge that has

already been paired in (Gn(t))0≤t≤m−1. In particular, the vertex to which it is

incident has already appeared in (Gn(t))0≤t≤m−1 and it is being re-used in the

branching process. In this case, a copy of the vertex appears in (BPn(t))0≤t≤m,

while a cycle appears in (Gn(t))0≤t≤m.

We now provide a bound on both contributions:

Half-edge re-use

Up to timem−1, at most 2m−1 half-edges are forbidden to be used by (Gn(t)t≤m).The probability that the half-edge qm equals one of these two half-edges is at most

2m− 1

`n. (4.1.9)

Hence the probability that a half-edge is being re-used before time mn is at most

mn∑m=1

2m− 1

`n=m2n

`n= o(1), (4.1.10)

when mn = o(√n).


Vertex re-use

The probability that vertex i is chosen in the mth draw is equal to di/`n. Theprobability that vertex i is drawn twice before time mn is at most

mn(mn − 1)

2

d2i

`2n. (4.1.11)

By the union bound, the probability that there exists a vertex that is chosentwice up to time mn is at most

mn(mn − 1)

2`n

∑i∈[n]

d2i

`n≤ m2

n

dmax

`n= o(1), (4.1.12)

by Condition 7.8 when mn = o(√n/dmax).

Completion of the proof

In order to show that E[Nn(t)]/n→ P(BP≤k ' t), we let tk denote the number ofindividuals in the first k−1 generations in t, and let (t(t))t∈[tk] be its breadth-firstexploration. Then,

E[Nn(t)]/n = P((Gn(t))t∈[tk] = (t(t))t∈[tk]), (4.1.13)

so that

P((Gn(t))t∈[tk] = (t(t))t∈[tk]) = P((BPn(t))t∈[tk] = (t(t))t∈[tk]) + o(1) (4.1.14)

= P((BP(t))t∈[tk] = (t(t))t∈[tk]) + o(1)

= P(BP≤k ' t) + o(1),

where the first equality is (4.1.8), while the second is the statement that BPn(t)d−→

BP(t) for every t finite from Exercise 4.1. This proves the claim.

Local weak convergence for the configuration model: second moment

Here, we study the second moment of Nn(t), and show that it is close to the firstmoment squared:

Lemma 4.3 (Concentration of the number of trees) Assume that Conditions1.5(a)-(b) hold. Then,

E[Nn(t)2]

n2→ P(BP≤k ' t)2. (4.1.15)

Consequently, Nn(t)/nP−→ P(BP≤k ' t).

Proof We start by computing

E[Nn(t)2]

n2= P(B(n)

k (o1), B(n)

k (o2) ' t), (4.1.16)

4.2 Phase transition in the configuration model 135

where o1, o2 ∈ [n] are two vertices chosen uniformly at random from [n], indepen-

dently. Since |B(n)

k (o1)| d−→ |BP≤k|, which is a tight random variable, o2 /∈ B(n)

2k (o1)with high probability, so that also

E[Nn(t)2]

n2= P(B(n)

k (o1), B(n)

k (o2) ' t, o2 /∈ B(n)

2k (o1)) + o(1). (4.1.17)

We now condition on B(n)

k (o1) = t, and write

P(B(n)

k (o1), B(n)

k (o2) ' t, o2 /∈ B(n)

2k (o1)) (4.1.18)

= P(B(n)

k (o2) ' t | B(n)

k (o1) ' t, o2 /∈ B(n)

2k (o1))

× P(B(n)

k (o1) ' t, o2 /∈ B(n)

2k (o1)).

We already know that P(B(n)

k (o1) ' t)→ P(BP≤k ' t), so that also

P(B(n)

k (o1) ' t, o2 /∈ B(n)

2k (o1))→ P(BP≤k ' t). (4.1.19)

In Exercise 4.2, you prove that indeed (4.1.19) holds.

We note that, conditionally on B(n)

k (o1) ' t, o2 /∈ B(n)

k (o1), the probability thatB(n)

k (o2) ' t is the same as the probability that B(n)

k (o2) ' t in CMn′(d′) which

is obtained by removing all vertices in B(n)

k (o1). Thus, since B(n)

k (o1) ' t, we havethat n′ = n−|V (T )| and d′ is the corresponding degree sequence. The whole pointis that the degree distribution d′ still satisfies Conditions 7.8(a)-(b). Therefore,also

P(B(n)

k (o2) ' t | B(n)

k (o1) ' t, o2 /∈ B(n)

k (o1)) (4.1.20)

→ P(BP≤k ' t),

and we have proved (4.1.15). Since E[Nn(t)]/n→ P(BP≤k ' t) and E[Nn(t)2]/n2 →P(BP≤k ' t)2, it follows that Var(Nn(t)/n)→ 0, so that Nn(t)/n is concentrated.

Since also E[Nn(t)]/n→ P(BP≤k ' t), we obtain that Nn(t)/nP−→ P(BP≤k ' t),

as required.

Lemma 4.3 completes the proof of Theorem 4.1.

4.2 Phase transition in the configuration model

In this section, we investigate the connected components in the configurationmodel. Alike for the Erdos-Renyi random graph, we identify when the configura-tion model whp has a giant component. Again, this condition has the interpreta-tion that an underlying branching process describing the exploration of a clusterhas a strictly positive survival probability.

We start by recalling some notation from [Volume 1, Chapter 7]. We investigatethe configuration model CMn(d), where in most cases, the degrees d = (di)i∈[n]

are assumed to satisfy Condition 1.5(a)-(b), and sometimes also Condition 1.5(c).We recall that Dn is the degree of a uniformly chosen vertex in [n], i.e., Dn = do,


where o is uniformly chosen from [n]. Equivalently,

P(Dn = k) = nk/n, (4.2.1)

where nk denotes the number of vertices of degree k. For a graph G, we writevk(G) for the number of vertices of degree k in G, and |E(G)| for the number ofedges. The main result concerning the size and structure of the largest connectedcomponents of CMn(d) is the following:

Theorem 4.4 (Phase transition in CMn(d)) Suppose that Condition 1.5(a)-(b) hold and consider the random graph CMn(d), letting n → ∞. Assume thatp2 = P(D = 2) < 1. Let Cmax and C(2) be the largest and second largest componentsof CMn(d) (breaking ties arbitrarily).

(a) If ν = E[D(D − 1)]/E[D] > 1, then there exist ξ ∈ [0, 1), ζ ∈ (0, 1] such that

|Cmax|/n P−→ ζ,

vk(Cmax)/nP−→ pk(1− ξk) for every k ≥ 0,

|E(Cmax)|/n P−→ 1

2E[D](1− ξ2).

while |C(2)|/n P−→ 0 and |E(C(2))|/n P−→ 0.

(b) If ν = E[D(D − 1)]/E[D] ≤ 1, then |Cmax|/n P−→ 0 and |E(Cmax)|/n P−→ 0.

Consequently, the same result holds for the uniform random graph with degree se-quence d satisfying Condition 1.5(a)-(b), under the extra assumption that

∑i∈[n] d

2i =

O(n).

Reformulation in terms of branching processes

We start by interpreting the results in Theorem 4.4 in terms of branching pro-cesses as also arising in Section 4.1. As it turns out, we can interpret ξ as theextinction probability of a branching process, and ζ as the survival probabilityof the unimodular Galton-Watson tree that appears in Theorem 4.1. Thus, ζsatisfies

ζ =∑k≥1

pk(1− ξk), (4.2.2)

where ξ is the extinction probability of the branching process with offspringdistribution (p?k)k≥0, which satisfies

ξ =∑k≥0

p?kξk. (4.2.3)

Clearly, ξ = 1 precisely when

ν =∑k≥0

kp?k ≤ 1. (4.2.4)


By (4.1.1), we can rewrite

ν =1

E[D]

∑k≥0

k(k + 1)pk+1 = E[D(D − 1)]/E[D], (4.2.5)

which explains the condition on ν in Theorem 4.4(a). Further, to understandthe asymptotics of vk(Cmax), we note that there are nk = np(n)

k ≈ npk verticeswith degree k. Each of the k direct neighbors of a vertex of degree k surviveswith probability close to 1− ξ, so that the probability that at least one of themsurvives is close to 1 − ξk. When one of the neighbors of the vertex of degreek survives, the vertex itself is part of the giant component, which explains whyvk(Cmax)/n

P−→ pk(1 − ξk). Finally, an edge consists of two half-edges, and anedge is part of the giant component precisely when one of the vertices incident toit is, which occurs with probability 1− ξ2. There are in total `n/2 = nE[Dn]/2 ≈nE[D]/2 edges, which explains why |E(Cmax)|/n P−→ 1

2E[D](1 − ξ2). Therefore,

all results in Theorem 4.4 have a simple explanation in terms of the branchingprocess approximation of the connected component for CMn(d) of a uniformvertex in [n].

Reformulation in terms of generating functions

We next reformulate the results in terms of generating functions, which play acrucial role throughout our proof. Let

GD(x) =∞∑k=0

pkxk = E[xD] (4.2.6)

be the probability generating function of the probability distribution (pk)k≥1. Re-call that, for a non-negative random variable D, the random variable D? denotesits size-biased distribution. Define further

G?D

(x) = E[xD?

] =∞∑k=1

p?kxk = G′

D(x)/G′

D(1), (4.2.7)

H(x) = E[D]x(x−G?

D(x)). (4.2.8)

Note that G?D

(1) = 1, and thus H(0) = H(1) = 0. Note also that

H ′(1) = 2E[D]−∑k

k2pk = E[2D −D2] = −E[D(D − 2)] (4.2.9)

For further properties of x 7→ H(x), see Lemma 4.9 below. We conclude that ifE[D(D − 2)] =

∑k k(k − 2)pk > 0 and if p?1 > 0, then there is a unique ξ ∈ (0, 1)

such that H(ξ) = 0, or equivalently G?D

(ξ) = ξ, so that indeed ξ is the extinctionprobability of the branching process with offspring distribution (p?k)k≥0. Whenp?1 = 0, instead, ξ = 0 is the unique solution in [0, 1) of H(ξ) = 0. The functionsx 7→ H(x) and x 7→ G?

D(x) play a central role in our analysis of the problem.

We prove Theorem 4.4 in Section 4.2.3 below. We now remark upon the resultand on the conditions arising in it.


The condition P(D = 2) = p2 < 1

Because isolated vertices do not matter, without loss of generality, we may assumethat p0 = 0. The case p2 = 1, for which ν = 1 is quite exceptional. In this case,H(x) = 0 for all x. We give three examples showing that then quite differentbehaviors are possible.

Our first example is when di = 2 for all i ∈ [n], so we are studying a random2-regular graph. In this case, the components are cycles and the distribution ofcycle lengths in CMn(d) is given by the Ewen’s sampling formula ESF(1/2), seee.g. Arratia et al. (2003). This implies that |Cmax|/n converges in distribution toa non-degenerate distribution on [0, 1] and not to any constant (Arratia et al.,2003, Lemma 5.7). Moreover, the same is true for |C(2)|/n (and for |C(3)|/n,...),so in this case there are several large components. To intuitively see this result,we note that in the exploration of a cluster we start with one vertex with twohalf-edges. When pairing a half-edge, it connects to a vertex that again has twohalf-edges. Therefore, the number of half-edges to be paired is always equal to 2,up to the moment when the cycle is closed, and the cluster is completed. Whenthere are m = αn free half-edges left, the probability of closing up the cycle equals1/m = 1/(αn), and, thus, the time this takes is of order n. A slight extensionof this reasoning shows that the time it takes to close a cycle is nTn, where Tnconverges to a limiting non-degenerate random variable (see Exercise 4.3).

Our second example with p2 = 1 is obtained by adding a small number ofvertices of degree 1. More precisely, we let n1 →∞ be such that n1/n→ 0, andn2 = n− n1. In this case, components can either be cycles, or strings of verticeswith degree 2 terminated with two vertices with degree 1. When n1 → ∞, itis more likely to terminate a long string of vertices of degree 2 by a vertex ofdegree 1 than by closing the cycle, as for the latter we need to pair to a uniquehalf-edge, while for the former, we have n1 choices. Therefore, it is easy to seethat this implies that |Cmax| = oP(n) (see Exercise 4.4 for details).

Our third example with p2 = 1 is obtained by instead adding a small numberof vertices of degree 4 (i.e., n4 → ∞ such that n4/n → 0, and n2 = n − n4.)We can regard each vertex of degree 4 as two vertices of degree 2 that have beenidentified. Therefore, to obtain CMn(d) with this degree distribution, we can startfrom a configuration model having N = n+n4 vertices, and uniformly identifyingn4 pairs of vertices of degree 2. Since the configuration model with N = n + n4

vertices of degree 2 has many components having size of order n, most of thesewill merge into one giant component. As a result, |Cmax| = n − oP(n), so thereis a giant component containing almost everything, as you will prove yourself inExercise 4.5.

We conclude that the case where p2 = P(D = 2) = 1 is quite sensitive to theprecise properties of the degree structure that are not captured by the limitingdistribution (pk)k≥1 only. In the sequel, we ignore the case where p2 = 1.


Reduction to the case where P(D = 1) = p1 > 0

In our proof, it is convenient to assume that p1 = P(D = 1) > 0. The extinctionprobability ξ = 0 and the survival probability ζ = 1 when p1 = 0, which causestechnical difficulties in the proof. We now explain how we can reduce the casewhere p1 = 0 to the case where p1 > 0.

Let dmin = mink : pk > 0 be the minimum of the support of the asymptoticdegree distribution D. Fix ε > 0, and assume that ε < pdmin

. Consider the con-figuration model with n = n + 2dminεn, and degree sequence d = (di)i∈[n] withnk = nk for all k > dmin, ndmin

= ndmin− εn, n1 = 2dminεn. This configuration

model can be obtained from CMn(d) by replacing εn vertices of degree dmin bydmin vertices having degree 1, as if we have ‘forgotten’ that these vertices areactually equal.

Clearly, CMn(d) can be retrieved by identifying εn collections of dmin verticesof degree 1 to a single vertex of degree dmin. When d satisfies Condition 1.5(a)-(b),then so does d with limiting degree distribution p1 = 2dminε/, pdmin

= (pdmin−

ε)/(1+2dminε), pk = pk/(1+2dminε) for all k > dmin. The above procedure clearlymakes |Cmax| smaller. Further, with ζε denoting the limit of |Cmax|/n for d, wehave that ζε → 1 as ε ↓ 0. As a result, Theorem 4.4 for ζ = 1, ξ = 0 follows fromTheorem 4.4 with p1 > 0, for which ζ < 1 and ξ > 0. In the remainder of theproof, we therefore without loss of generality assume that ξ > 0 and ζ < 1.

Organization of the proof of Theorem 4.4

Theorem 4.4 is proved using a clever randomization scheme to explore the con-nected components one by one. This construction is explained terms of a simplecontinuous-time algorithm in Section 4.2.2 below. The algorithm describes thenumber of vertices of given degrees that have been found, as well as the totalnumber of unpaired half-edges, at time t > 0. It is proved that, when n → ∞,these quantities all converge in probability to deterministic functions describedin terms of the functions x 7→ H(x) and x 7→ G?

D(x) above. In particular, the

number of unpaired half-edges is given in terms of x 7→ H(x), so that the firstzero of this function gives the size of the giant component. In Section 4.2.3, thealgorithm is analyzed by showing that when ζ > 0, after a short initial period ofexploring small clusters, the giant component is found, and the exploration ex-plores it completely, after which no large component is left. When ζ = 0, instead,only small clusters are found. A crucial aspect in the proof resides in how to dealwith the depletion-of-points-and-half-edges effect.

4.2.1 The giant component is almost local

A useful truncation argument

To be added!


4.2.2 Finding the largest component

The components of an arbitrary finite graph or multigraph can be found bythe following standard procedure. Pick an arbitrary vertex v and determine thecomponent of v as follows: include all the neighbors of v in an arbitrary order;then add in the neighbors of the neighbors, and so on, until no more verticescan be added. The vertices included until this moment form the component ofv. If there are still vertices left in the graph, then pick any such vertex w, andrepeat the above to determine the second component (the component of vertexw). Carry on in this manner until all the components have been found.

The same result can be obtained in the following way, which turns out to bemore convenient for the exploration of the giant component in the configurationmodel. Regard each edge as consisting of two half-edges, each half-edge havingone endpoint. We will label the vertices as sleeping or awake (= used) and thehalf-edges as sleeping, active or dead; the sleeping and active half-edges are alsocalled living. We start with all vertices and half-edges sleeping. Pick a vertexand label its half-edges as active. Then take any active half-edge, say x and findits partner y in the graph; label these two half-edges as dead. Further, if theendpoint of y is sleeping, label it as awake and all other half-edges of the vertexincident to y as active. Repeat as long as there are active half-edge. When thereis no active half-edge left, we have obtained the first component. Then start againwith another vertex until all components are found.

We apply this algorithm to CMn(d) with a given degree sequence, revealing itsedges during the process. We thus initially only observe the vertex degrees andthe half-edges, but not how they are joined to form edges. Hence, each time weneed a partner of an half-edge, it is uniformly distributed over all other livinghalf-edges, with the understanding that the dead half-edges are the ones that arealready paired into edges.It is here that we are using the specific structure of theconfiguration model, which simplifies the analysis substantially.

We make the random choices of finding a partner to the edges by associatingi.i.d. random maximal lifetimes τx to the half-edge x, where τx has an Exp(1)distribution. We interpret these lifetimes as clocks, and changes in our explorationprocess only occur when a clock of a half-edge rings. In other words, each half-edge dies spontaneously at rate 1 (unless killed earlier). Each time we need tofind the partner of a half-edge x, we then wait until the next living half-edge 6= xdies and take that one. This process in continuous-time can be formulated as analgorithm, constructing CMn(d) and exploring its components simultaneously,as follows. Recall that we start with all vertices and half-edges sleeping. Theexploration is then formalized in the following three steps:

Step 1 When there is no active half-edge (as in the beginning), select a sleeping vertexand declare it awake and all its half-edges active. For definiteness, we choosethe vertex by choosing a half-edge uniformly at random among all sleeping half-edges. When there is no sleeping half-edge left, the process stops; the remainingsleeping vertices are all isolated and we have explored all other components.


Step 2 Pick an active half-edge (which one does not matter) and kill it, i.e., changeits status to dead.

Step 3 Wait until the next half-edge dies (spontaneously, as a result of its clock ring-ing). This half-edge is joined to the one killed in the previous step Step 2 toform an edge of the graph. When the vertex incident to it is sleeping, we changethis vertex to awake and all other half-edges incident to it to active. Repeatfrom Step 1.

The above randomized algorithm is such that components are created betweenthe successive times Step 1 is performed, where we say that Step 1 is performedwhen there is no active half-edge and, as a result, a new vertex is chosen.

The vertices in the component created during one of these intervals betweenthe successive times Step 1 is performed are the vertices that are awakened duringthe interval. Note also that a component is completed and Step 1 is performedexactly when the number of active half-edges is 0 and a half-edge dies at a vertexwhere all other half-edges (if any) are dead. In the next section, we investigatethe behavior of the key characteristics of the algorithm, such as the number ofsleeping half-edges and the number of sleeping vertices of a given degree.

4.2.3 Analysis of the algorithm for CMn(d)

We start by introducing the key characteristics of the exploration algorithm.Let S(t) and A(t) be the numbers of sleeping and active half-edges, respectively,at time t, and let

L(t) = S(t) +A(t) (4.2.10)

be the number of living half-edges. For definiteness, we define these random func-tions to be right-continuous.

Let us first look at L(t). We start with `n =∑

i∈[n] di half-edges, all sleepingand thus living, but we immediately perform Step 1 and Step 2 and kill one ofthem. Thus, L(0) = `n − 1. In the sequel, as soon as a living half-edge dies, weperform Step 3 and then (instantly) either Step 2 or both Step 1 and Step 2.Since Step 1 does not change the number of living half-edges while Step 2 andStep 3 each decrease it by 1, the total result is that L(t) is decreased by 2 eachtime one of the living half-edges dies, except when the last living one dies andthe process terminates. Because of this simple dynamics of t 7→ L(t), we can givesharp asymptotics of L(t) when n→∞:

Proposition 4.5 (Number of living half-edges) As n → ∞, for any t0 ≥ 0fixed,

sup0≤t≤t0

|n−1L(t)− E[Dn]e−2t| P−→ 0. (4.2.11)

Proof The process t 7→ L(t) satisfies L(0) = `n−1, and it degreases by 2 at rateL(t). As a result, it is closely related to a death process. We study such processesin the following lemma:

Lemma 4.6 (Asymptotics of death processes) Let d, γ > 0 be given and let


(N (x)(t))t≥0 be a Markov process such that N (x)(t) = x a.s., and the dynamics oft 7→ (N (x)(t))t≥0 is such that when it is in position y, then it jumps down by d atrate γy. In other words, the waiting time until the next event is Exp(1/γy) andeach jump is of size d downwards. Then, for every t0 ≥ 0,

E[

supt≤t0

∣∣N (x)(t)− e−γdtx∣∣2] ≤ 8d(eγdt0 − 1)x+ 8d2. (4.2.12)

Proof The proof follows by distinguishing several cases. First assume that d = 1and that x is an integer. In this case, the process is a standard pure death processtaking the values x, x − 1, x − 2, . . . , 0, describing the number of particles alivewhen the particles die independently at rate γ > 0. As is well known, and easilyseen by regarding N (x)(t) as the sum of x independent copies of the processN (1)(t), the process (eγtN (x)(t))t≥0, is a martingale starting in x. Furthermore,for every t ≥ 0, the random variable N (x)(t) has a Bin(x, e−γt) distribution, sinceeach of the x particles has a probability of dying before time t of e−γt, and thedifferent particles die independently.

We rely on Doob’s martingale inequality, which states that for a martingale(Mn)n≥0,

E[

supm≤n

∣∣Mm − E[Mm]∣∣2] ≤ Var(Mn). (4.2.13)

Application of Doob’s martingale inequality yields

E[

supt≤t0

∣∣N (x)(t)− e−γtx∣∣2] ≤ E

[supt≤t0

∣∣eγtN (x)(t)− x∣∣2] ≤ 4E

[(eγtN (x)(t0)− x

)2]= 4e2γtVar(N (x)(t0)) ≤ 4(eγt0 − 1)x. (4.2.14)

This proves the claim for x being integer.Next, we still assume d = 1, but let x > 0 be arbitrary. We can couple the

two processes(N (x)(t)

)t≥0

and(N (bxc)(t))t≥0 with different initial values such that

whenever the smaller one jumps by 1, so does the other. This coupling keeps

|N (x)(t)−N (bxc)(t)| < 1 (4.2.15)

for all t ≥ 0, and thus,

supt≤t0

∣∣N (bxc)(t)− e−γtbxc∣∣ ≤ sup

t≤t0

∣∣N (x)(t)− e−γtx∣∣+ 2, (4.2.16)

so that by (4.2.14), in turn,

E[

supt≤t0

∣∣N (x)(t)− e−γtx∣∣2] ≤ 8(eγt0 − 1)x+ 8. (4.2.17)

Finally, for a general d > 0, we observe that N (x)(t)/d is a process of the sametype with the parameters (γ, d, x) replaced by (γd, 1, x/d), and the general resultfollows from (4.2.17) and (4.2.14).

The proof of Proposition 4.5 follows from Lemma 4.6 with d = 2, x = (`n−1) =nE[Dn]− 1 and γ = 1.


We continue by considering the sleeping half-edges S(t). Let Vk(t) be the num-ber of sleeping vertices of degree k at time t, so that

S(t) =∞∑k=1

kVk(t). (4.2.18)

Note that Step 2 does not affect sleeping half-edges, and that Step 3 implies thateach sleeping vertex of degree k is eliminated (i.e., awakened) with intensity k,independently of all other vertices. There are also some sleeping vertices elim-inated by Step 1, though, which complicates the dynamics of t 7→ Vk(t). It ishere that the depletion-of-points-and-half-edges effect enters the analysis of thecomponent structure of CMn(d). This effect is complicated, but we will see itis quite harmless. This can be understood by noting that we only apply Step 1when we have completed exploring an entire component. Since we will be mainlyinterested in settings where the giant component is large, we will see that wewill not be using Step 1 very often before having completely explored the giantcomponent. After having completed the exploration of the giant component, westart using Step 1 again quite frequently, but it will turn out that then it is veryunlikely to be exploring any particularly large connected components. Thus, wecan have a setting in mind where the number of applications of Step 1 is quitesmall.

With this intuition in mind, we first ignore the effect of Step 1 by letting Vk(t)be the number of vertices of degree k such that all its half-edges have maximallifetimes τx > t. Thus, none of its k half-edges would have died spontaneouslyup to time t, assuming they all escaped Step 1. Let us now explain in somemore detail why it is reasonable to ignore the effect of Step 1 in the leadingorder when there exists a giant component. Indeed, we perform Step 1 until wehit the giant component, and then it takes a long time to find the entire giantcomponent. When ζ > 0, the number of times we perform Step 1 until we findthe giant component will be small (probably a tight random variable), as eachtime we have a strictly positive probability of choosing a vertex in the giantcomponent. Thus, intuitively, we expect the difference between Vk(t) and Vk(t)

to be insignificant. Thus, we start by focussing on the dynamics of (Vk(t))t≥0,ignoring the effect of Step 1, and later adapt for this omission.

For a given half-edge, we call the half-edges incident to the same vertex itssibling half-edges. Let further

S(t) =∑k

kVk(t) (4.2.19)

denote the number of half-edges whose sibling half-edges have escaped sponta-neous death up to time t. Comparing with (4.2.18), we see that the process S(t)

ignores the effect of Step 1 in an identical way as Vk(t).Recall the functions GD, G?

Dfrom (4.2.6)–(4.2.7), and define

h(x) = xE[D]G?D

(x). (4.2.20)


Then, we can identify the asymptotics of (Vk(t))t≥0 in a similar way as in Propo-sition 4.5:

Lemma 4.7 (Number of living vertices of degree k) Assume that Conditions1.5(a)-(b) hold. Then, as n→∞, for any t0 ≥ 0 fixed

supt≤t0|n−1Vk(t)− pke−kt| P−→ 0 (4.2.21)

for every k ≥ 0 and

supt≤t0|n−1

∞∑k=0

Vk(t)−GD(e−t)| P−→ 0, (4.2.22)

supt≤t0|n−1S(t)− h(e−t)| P−→ 0. (4.2.23)

Proof The statement (4.2.21) again follows from Lemma 4.6, now with γ = k,

x = nk and d = 1. The case k = 0 is trivial, with V0(t) = n0 for all t. We canreplace p(n)

k = nk/n by pk by Condition 1.5(a).By Condition 1.5(b), the sequence of random variables (Dn)n≥1 is uniformly

integrable, which means that for every ε > 0 there exists K <∞ such that for alln,∑

k>K knk/n = E[Dn|Dn > k] < ε. We may further assume (or deduce fromFatou’s inequality) that

∑k>K kpk < ε, and obtain by (4.2.21) that, whp,

supt≤t0|n−1S(t)− h(e−t)| = sup

t≤t0

∣∣∣∣∣∞∑k=1

k(n−1Vk(t)− pke−kt)∣∣∣∣∣

≤K∑k=1

k supt≤t0|n−1Vk(t)− pke−kt|+

∑k>K

k(nkn

+ pk)

≤ ε+ ε+ ε,

proving (4.2.23). An almost identical argument yields (4.2.22).

Remarkably, the difference between S(t) and S(t) is easily estimated. The fol-lowing result can be viewed as the key to why this approach works. Indeed, itgives a uniform upper bound on the difference due to the application of Step 1:

Lemma 4.8 (Effect of Step 1) Let dmax := maxi∈[n] di be the maximum degreeof CMn(d). Then

0 ≤ S(t)− S(t) < sup0≤s≤t

(S(s)− L(s)) + dmax. (4.2.24)

The process (S(t))t≥0 runs on scale n (see e.g., the related statement for(L(t))t≥0 in Proposition 4.5). Further, dmax = o(n) when Conditions 1.5(a)-(b)

hold. Finally, one can expect that S(s) ≤ L(s) holds, since the difference is re-

lated to the number of active half-edges. Thus, intuitively, sup0≤s≤t(S(s)−L(s)) =

S(0)−L(0) = 0. We will make that argument precise after we have proved Lemma


4.8. We then conclude that S(t)− S(t) = oP(n), and so they have the same limitafter rescaling by n. Let us now prove Lemma 4.8:

Proof Clearly, Vk(t) ≤ Vk(t), and thus S(t) ≤ S(t). Furthermore, S(t) − S(t)increases only as a result of Step 1. Indeed, Step 1 acts to guarantee that A(t) =L(t)− S(t) ≥ 0, and is only performed when A(t) = 0.

If Step 1 is performed at time t and a vertex of degree j > 0 is awakened, thenStep 2 applies instantly and we have A(t) = j − 1 < dmax, and consequently

S(t)− S(t) = S(t)− L(t) +A(t) < S(t)− L(t) + dmax. (4.2.25)

Furthermore, S(t) − S(t) is never changed by Step 2 and either unchanged or

decreased by Step 3. Hence, S(t)−S(t) does not increase until the next time Step1 is performed. Consequently, for any time t, if s was the last time before (or

equal to) t that Step 1 was performed, then S(t) − S(t) ≤ S(s) − S(s), and theresult follows by (4.2.25).

Let us now set the stage for taking the limits of n → ∞. Recall that A(t) =L(t)− S(t) denotes the number of awakened vertices and let

A(t) = L(t)− S(t) = A(t)− (S(t)− S(t)) (4.2.26)

denote the number of awakened vertices ignoring the effect of Step 1. Thus,A(t) ≤ A(t) since S(t) ≤ S(t). We will use A(t) as a proxy for A(t) in s similar

way as S(t) is used a a proxy for S(t).

Recall the definition of H(x) in (4.2.8). By Lemmas 4.5 and 4.7 and the defi-

nition that A(t) = L(t)− S(t), for any t0 ≥ 0,

supt≤t0|n−1A(t)−H(e−t)| P−→ 0. (4.2.27)

Lemma 4.8 can be rewritten as

0 ≤ S(t)− S(t) < − infs≤t

A(s) + dmax. (4.2.28)

By (4.2.26) and (4.2.28),

A(t) ≤ A(t) < A(t)− infs≤t

A(s) + dmax, (4.2.29)

which, perhaps, illuminates the relation between A(t) and A(t). Recall that con-nected components are explored between subsequent zeros of the process t 7→A(t). The function t 7→ H(e−t), which acts as the limit of A(t) (and thus hope-fully also of A(t)), is strictly positive in (0,− log ξ) and H(1) = H(ξ) = 0. There-

fore, we expect A(t) to be positive for t ∈ (0,− log ξ), and, if so, infs≤t A(s) = 0.

This would prove that indeed A(t) and A(t) are close on this entire interval, andthe exploration on the interval t ∈ (0,− log ξ) will turn out to correspond to theexploration of the giant component.

The idea is to continue our algorithm in Step 1-Step 3 until the giant compo-nent has been found, which implies that A(t) > 0 for the time of exploration of


the giant component, and A(t) = 0 for the first time when we have completedthe exploration of the giant component, which is t = − log ξ. Thus, the terminfs≤t A(s) in (4.2.29) ought to be negligible. When Conditions 1.5(a)-(b) hold,

we further have that dmax = o(n), so that one can expect A(t) to be a goodapproximation of A(t). The remainder of the proof makes this intuition precise.We start by summarizing some useful analytical properties of x 7→ H(x) that werely upon in the sequel:

Lemma 4.9 (Properties of x 7→ H(x)) Suppose that Conditions 1.5(a)-(b) holdand let H(x) be given by (4.2.8). Suppose also that p2 < 1.

(i) If ν = E[D(D − 1)]/E[D] > 1 and p1 > 0, then there is a unique ξ ∈ (0, 1)such that H(ξ) = 0. Moreover, H(x) < 0 for all x ∈ (0, ξ) and H(x) > 0 forall x ∈ (ξ, 1).

(ii) If ν = E[D(D − 1)]/E[D] ≤ 1, then H(x) < 0 for all x ∈ (0, 1).

Proof As remarked earlier, H(0) = H(1) = 0 and H ′(1) = −E[D(D − 2)].Furthermore, if we define φ(x) := H(x)/x, then φ(x) = E[D](x − G?

D(x)) is a

concave function on (0, 1], and it is strictly concave unless pk = 0 for all k ≥ 3, inwhich case H ′(1) = −E[D(D− 2)] = p1 > 0. Indeed, p1 + p2 = 1 when pk = 0 forall k ≥ 3. Since we assume that p2 < 1, we thus obtain that p1 > 0 in this case.

In case (ii), we thus have that φ is concave and φ′(1) = H ′(1)−H(1) ≥ 0, witheither the concavity or the inequality strict, and thus φ′(x) > 0 for all x ∈ (0, 1),whence φ(x) < φ(1) = 0 for x ∈ (0, 1).

In case (i), H ′(1) < 0, and thus H(x) > 0 for x close to 1. Further, when p1 > 0,H ′(0) = −h′(0) = −p1 < 0, and thus H(x) ≤ 0 for x close to 0. Hence, there isat least one ξ ∈ (0, 1) with H(ξ) = 0, and since H(x)/x is strictly concave andalso H(1) = 0, there is at most one such ξ and the result follows.

Now we are in the position to complete the proof of Theorem 4.4 in the followingsection.

4.2.4 Proof of Theorem 4.4

We start with the proof of Theorem 4.4(i). Let ξ be the zero of H given byLemma 4.9(i) and let θ = − log ξ. Then, by Lemma 4.9, H(e−t) > 0 for 0 < t < θ,and thus inft≤θH(e−t) = 0. Consequently, (4.2.27) implies

n−1 inft≤θ

A(t) = inft≤θ

n−1A(t)− inft≤θ

H(e−t)P−→ 0. (4.2.30)

Further, by Condition 1.5(b), dmax = o(n), and thus n−1dmax → 0. Conse-quently, (4.2.28) and (4.2.30) yield

supt≤θ

n−1|A(t)− A(t)| = supt≤θ

n−1|S(t)− S(t)| P−→ 0. (4.2.31)

Thus, by (4.2.27),

supt≤θ|n−1A(t)−H(e−t)| P−→ 0. (4.2.32)


This will be the work horse of our argument. By Lemma 4.9, we know thatt 7→ H(e−t) is positive on (0,− log ξ) when ν > 1. Thus, the exploration in theinterval (0,− log ξ) will find the giant component. We now make this intuitionprecise. In particular, we need to show that no large connected component is foundbefore or after this interval (showing that the giant is unique), and we need toinvestigate the properties of the giant, in terms of number of edges, vertices ofdegree k etc. We now provide these details.

Let 0 < ε < θ/2. Since H(e−t) > 0 on the compact interval [ε, θ − ε], (4.2.32)implies that A(t) remains whp positive on [ε, θ− ε], and thus no new componentis started during this interval.

On the other hand, again by Lemma 4.9(i), H(e−(θ+ε)) < 0 and (4.2.27) im-

plies that n−1A(θ + ε)P−→ H(e−(θ+ε)), while A(θ + ε) ≥ 0. Thus, with ∆ =

|H(e−θ−ε)|/2 > 0, whp

S(θ + ε)− S(θ + ε) = A(θ + ε)− A(θ + ε) ≥ −A(θ + ε) > n∆, (4.2.33)

while (4.2.31) yields that S(θ)− S(θ) < n∆ whp. Consequently, whp S(θ + ε)−S(θ+ε) > S(θ)−S(θ), so whp Step 1 is performed between the times θ and θ+ε.

Let T1 be the last time Step 1 was performed before time θ/2. Let T2 be the nexttime Step 1 is performed (by convention, T2 =∞ if such a time does not exist).We have shown that for any ε > 0, and whp 0 ≤ T1 ≤ ε and θ − ε ≤ T2 ≤ θ + ε.In other words, T1

P−→ 0 and T2P−→ θ. We conclude that we have found one

component that is explored between time T1P−→ 0 and time T2

P−→ θ. This isour candidate for the giant component, and we continue to study its properties,i.e., its size, its number of edges and its number of vertices of degree k. Theseproperties are stated separately in the next proposition, so that we are able toreuse them later on:

Proposition 4.10 (Connected component properties) Let T ?1 and T ?2 be two

random times when Step 1 is performed, with T ?1 ≤ T ?2 , and assume that T ?1P−→ t1

and T ?2P−→ t2 where 0 ≤ t1 ≤ t2 ≤ θ < ∞. If C ? is the union of all components

explored between T ?1 and T ?2 , then

vk(C?)/n

P−→ pk(e−kt1 − e−kt2), k ≥ 0, (4.2.34)

|C ?|/n P−→ GD(e−t1)−GD(e−t2), (4.2.35)

|E(C ?)|/n P−→ 1

2h(e−t1)− 1

2h(e−t2). (4.2.36)

In particular, if t1 = t2, then |C ?|/n P−→ 0 and |E(C ?)| P−→ 0.

Below, we apply Proposition 4.10 to T1 = oP(1) and T2, where T2 = θ + oP(1).We can identify the values of the above constants for t1 = 0 and t2 = θ as e−kt1 =1, e−kt2 = ξ, GD(e−t1) = 1, GD(e−t2) = 1− ζ, h(e−t1) = 2E[D], h(e−t2) = 2E[D]ξ2

(see Exercise 4.6).


By Proposition 4.10 and Exercise 4.6, Theorem 4.4(i) follows when we provethat the connected component found between times T1 and T2 is indeed the giantcomponent. This is proved after we complete the proof of Proposition 4.10:

Proof The set of vertices C ? contains all vertices awakened in the interval[T ?1 , T

?2 ) and no others, and thus (writing Vk(t−) = limst Vk(s))

vk(C?) = Vk(T

?1−)− Vk(T ?2−), k ≥ 1. (4.2.37)

Since T ?2P−→ t2 ≤ θ and H is continuous, we obtain that inft≤T?2 H(e−t)

P−→inft≤t2 H(e−t) = 0, where the latter equality follows since H(1) = 0. Now, (4.2.27)

and (4.2.28) imply, in analogy with (4.2.30) and (4.2.31), that n−1 inft≤T?2 A(t)P−→

0 and thus also

supt≤T?2

n−1|S(t)− S(t)| P−→ 0. (4.2.38)

Since Vj(t) ≥ Vj(t) for every j and t ≥ 0,

Vk(t)− Vk(t) ≤ k−1∞∑j=1

j(Vj(t)− Vj(t)) = k−1(S(t)− S(t)), k ≥ 1. (4.2.39)

Hence (4.2.38) implies, for every k ≥ 1, supt≤T?2 |Vk(t) − Vk(t)| = oP(n). Conse-quently, using Lemma 4.7, for j = 1, 2,

Vk(T?j −) = Vk(T

?j −) + oP(n) = npke

−kT?j + oP(n) = npke−ktj + oP(n), (4.2.40)

and (4.2.34) follows by (4.2.37). Similarly, using∑∞

k=0(Vk(t)−Vk(t)) ≤ S(t)−S(t),

|C ?| =∞∑k=1

(Vk(T?1−)− Vk(T ?2−)) =

∞∑k=1

(Vk(T?1−)− Vk(T ?2 )) + oP(n) (4.2.41)

= nGD(e−T?1 )− nGD(e−T

?2 ) + oP(n),

and

2|E(C ?)| =∞∑k=1

k(Vk(T?1−)− Vk(T ?2 )) =

∞∑k=1

k(Vk(T?1−)− Vk(T ?2 )) + oP(n)

(4.2.42)

= nh(e−T?1 )− nh(e−T

?2 ) + oP(n),

and (4.2.35) and (4.2.36) follow from the convergence T ?iP−→ ti and the continuity

of t 7→ GD(e−t) and t 7→ h(e−t).

We are now ready to complete the proof of Theorem 4.4:

Proof of Theorem 4.4. Let C ′max be the component created at time T1 and exploreduntil time T2, where we recall that T1 is the last time Step 1 was performed beforetime θ/2 and let T2 be the next time it is performed if this occurs and T2 = ∞otherwise. Then, T1

P−→ 0 and T2P−→ θ. The cluster C ′max is our candidate for


the giant component Cmax, and we next prove that indeed it is, whp, the largestconnected component.

By Proposition 4.10, with t1 = 0 and t2 = θ,

|vk(C ′max)|/n P−→ pk(1− e−kt), (4.2.43)

|C ′max|/nP−→ GD(1)−GD(e−θ) = 1−GD(ξ), (4.2.44)

|E(C ′max)|/n P−→ 1

2(h(1)− h(e−θ)) =

1

2(h(1)− h(ξ)) =

E[D]

2(1− ξ2), (4.2.45)

using Exercise 4.6. We have found one large component C ′max with the claimednumbers of vertices and edges. It remains to show that whp there is no other largecomponent. The basic idea is that if there exists another component that has atleast η`nhalf-edges in it, then it should have a reasonable chance of actuallybeing found quickly. Since we can show that the probability of finding a largecomponent before T1 or after T2 is small, there just cannot be any other largeconnected component. Let us now make this intuition precise.

No early large component

Here we first show that it is unlikely that a large component different from C ′max

is found before time T1. For this, let η > 0, and apply Proposition 4.10 to T0 = 0and T1, where T1 was defined to be the last time Step 1 was performed beforetime θ/2. Then, since T1

P−→ 0, the total number of vertices and edges in allcomponents found before C ′max, i.e., before time T1, is oP(n). Hence, recalling that`n = Θ(n) by Condition 1.5(b),

P(a component C with |E(C )| ≥ η`n is found before C ′max)→ 0. (4.2.46)

We conclude that whp no component containing at least η`n half-edges is foundbefore C ′max is found.

No late large component

In order to study the probability of finding a component containing at least η`nedges after C ′max is found, we start by letting T3 be the first time after time T2

that Step 1 is performed. Since S(t)−S(t) increases by at most dmax = o(n) eachtime Step 1 is performed, we obtain from (4.2.38) that

supt≤T3

(S(t)− S(t)) ≤ supt≤T2

(S(t)− S(t)) + dmax = oP(n). (4.2.47)

Comparing this to (4.2.33), for every ε > 0 and whp, we have that θ + ε > T3.

Since also T3 > T2P−→ θ, it follows that T3

P−→ θ. If C ′ is the component createdbetween T2 and T3, then Proposition 4.10 applied to T2 and T3 yields |C ′|/n P−→ 0

and |E(C ′)| P−→ 0.On the other hand, if there would exist a component C 6= C ′max in CMn(d) with

at least η`n edges that has not been found before C ′max, then with probability atleast η, the vertex chosen at random by Step 1 at time T2 starting the compo-nent C ′ would belong to C . When this occurs, we clearly have that C = C ′.


Consequently,

P(a component C with |E(C )| ≥ η`n is found after C ′max) (4.2.48)

≤ η−1P(|E(C ′)| ≥ η`n)→ 0,

since |E(C ′)| P−→ 0.

Completion of the proof of Theorem 4.4(i)

Combining (4.2.46) and (4.2.48), we see that whp there is no component exceptC ′max that has at least η`n edges. As a result, we must have that C ′max = Cmax,where Cmax is the largest component. Further, again whp, |E(C(2))| < η`n. Conse-quently, the results for Cmax follow from (4.2.43)-(4.2.45). We have further shown

that |E(C(2))|/`n P−→ 0, which implies that |E(C(2))|/n P−→ 0 and |C(2)|/n P−→ 0because `n = Θ(n) and |C(2)| ≤ |E(C(2))|+1. This completes the proof of Theorem4.4(i).

Proof of Theorem 4.4(ii)

The proof of Theorem 4.4(ii) is similar to the last step in the proof for Theorem4.4(i). Indeed, let T1 = 0 and let T2 be the next time Step 1 is performed, orT2 =∞ when this does not occur. Then,

supt≤T2

|A(t)− A(t)| = supt≤T2

|S(t)− S(t)| ≤ 2dmax = o(n). (4.2.49)

For every ε > 0, n−1A(ε)P−→ H(e−ε) < 0 by (4.2.27) and Lemma 4.9(ii), while

A(ε) ≥ 0, and it follows from (4.2.49) that whp T2 < ε. Hence, T2P−→ 0. We

apply Proposition 4.10 (which holds in this case too, with θ = 0) and find that if

C is the first component found, then |E(C )|/n P−→ 0.Let η > 0. If |E(Cmax)| ≥ η`n, then the probability that the first half-edge

chosen by Step 1 belongs to Cmax, and thus C = Cmax, is 2|E(Cmax)|/(2`n) ≥ η,and hence,

P(|E(Cmax)| ≥ η`n) ≤ η−1P(|E(C )| ≥ η`n)→ 0. (4.2.50)

The results follows since `n = Θ(n) by Condition 1.5(b) and |Cmax| ≤ |E(Cmax)|+1. This completes the proof of Theorem 4.4(ii), and thus that of Theorem 4.4.

4.2.5 The giant component of related random graphs

In this section, we extend the results of Theorem 4.4 to some related models,such as uniform simple random graphs with a given degree sequence, as well asgeneralized random graphs.

Recall that UGn(d) denotes a uniform simple random graph with degrees d(see [Volume 1, Section 7.5]). The results in Theorem 4.4 also hold for UGn(d)when we assume that Conditions 1.5(a)-(c) hold:


Theorem 4.11 (Phase transition in UGn(d)) Let d satisfy Conditions 1.5(a)-(c). Then, the results in Theorem 4.4 also hold for a uniform simple graph withdegree sequence d.

Proof By [Volume 1, Corollary 7.17], and since d = (di)i∈[n] satisfies Condition1.5(a)-(c), any event En that occurs whp for CMn(d), also occurs whp for UGn(d).By Theorem 4.4, the event En that

∣∣|Cmax|/n− ζ∣∣ ≤ ε occurs whp for CMn(d),

so it also holds whp for UGn(d). The proof for the other limits is identical.

Note that it is not obvious how to extend Theorem 4.11 to the case whereν =∞, which we discuss now:

Theorem 4.12 (Phase transition in UGn(d) for ν = ∞) Let d satisfy Condi-tions 1.5(a)-(b), and assume that there exists p > 1 such that

E[Dpn]→ E[Dp]. (4.2.51)

Then, the results in Theorem 4.4 also hold for a uniform simple graph with degreesequence d.

Proof We do not present the entire proof, but rather sketch a route towardsit following Bollobas and Riordan (2015), who show that for every ε > 0, thereexists δ = δ(ε) > 0 such that

P(||Cmax| − ζn| > εn) ≤ e−δn, (4.2.52)

and

P(|vk(Cmax)− pk(1− ξk)n| > εn) ≤ e−δn. (4.2.53)

This exponential concentration is quite convenient, as it allows us to extend theresult to the setting of uniform random graphs by conditioning CMn(d) to besimple. Indeed, since Conditions 1.5(a)-(b) imply that

P(CMn(d) simple) = e−o(n), (4.2.54)

it follows that the results also hold for the uniform simple random graph UGn(d)when Conditions 1.5(a)-(b) hold. See Exercise 4.9 below.

We next prove Theorem 3.16 for rank-1 inhomogeneous random graphs, asalready stated in Theorem 3.17, and restated here for convenience:

Theorem 4.13 (Phase transition in GRGn(w)) Let w satisfy Condition 1.1(a)-(c). Then, the results in Theorem 4.4 also hold for GRGn(w), CLn(w) andNRn(w).

Proof Let di be the degree of vertex i in GRGn(w) defined in [Volume 1,(1.3.18)], where we use a small letter to avoid confusion with Dn, which is the de-gree of a uniform vertex in [n]. By [Volume 1, Theorem 7.18], the law of GRGn(w)conditionally on the degrees d and CMn(d) conditionally on being simple agree.Assume that (di)i∈[n] satisfies that Conditions 1.5(a)-(c) hold in probability. Then,by [Volume 1, Theorem 7.18] and Theorem 4.4, the results in Theorem 4.4 alsohold for GRGn(w). By [Volume 1, Theorem 6.20], the same result applies to


CLn(w), and by [Volume 1, Exercise 6.39], also to NRn(w). The fact that Con-ditions 1.1(a)-(c) imply that Conditions 1.5(a)-(c) hold for GRGn(w) is stated inTheorem 1.7.

Unfortunately, when ν = ∞, we cannot rely on the fact that by [Volume 1,Theorem 7.18], the law of GRGn(w) conditionally on the degrees d and CMn(d)conditionally on being simple agree. Indeed, when ν = ∞, the probability thatCMn(d) is simple vanishes. Therefore, we instead rely on a truncation argumentto extend Theorem 4.13 to the case where ν =∞. It is here that the monotonicityof GRGn(w) in terms of the edge probabilities can be used rather conveniently:

Theorem 4.14 (Phase transition in GRGn(w)) Let w satisfy Conditions 1.1(a)-(b). Then, the results in Theorem 4.4 also hold for GRGn(w), CLn(w) andNRn(w).

Proof We only prove that |Cmax|/n P−→ ζ, the other statements in Theorem 4.4can be proved in a similar fashion (see Exercise 4.7 below). We prove Theorem4.14 only for NRn(w), the proof for GRGn(w) and CLn(w) being similar. Therequired upper bound |Cmax|/n ≤ ζm + oP(1) follows by the local convergence inprobability in Theorem 3.11 and (2.5.6). This provides the required upper boundon |Cmax|.

For the lower bound, we bound NRn(w) from below by a random graph withedge probabilities

p(K)

ij = 1− e−(wi∧K)(wj∧K)/`n . (4.2.55)

Therefore, also |Cmax| |C (K)

max|, where C (K)

max is the largest connected componentin the inhomogeneous random graph with edge probabilities (p(K)

ij )i,j∈[n]. Let

w(K)

i = (wi ∧K)1

`n

∑j∈[n]

(wj ∧K), (4.2.56)

so that the edge probabilities in (4.2.55) correspond to the Norros-Reittu modelwith weights (w(K)

i )i∈[n]. It is not hard to see that when Condition 1.1(a) holds for(wi)i∈[n], then Conditions 1.1(a)-(c) hold for (w(K)

i )i∈[n], where the limiting randomvariable equals (W∧K). Therefore, Theorem 4.13 applies to (w(K)

i )i∈[n]. We deduce

that |C (K)

max|/nP−→ ζ(K), which is the survival probability of the two-stage mixed-

Poisson branching process with mixing variable (W ∧K). Since ζ(K) → ζ when

K →∞, we conclude that |Cmax|/n P−→ ζ.

4.3 Connectivity of CMn(d)

Assume that P(D = 2) < 1. By Theorem 4.4, we see that |Cmax|/n P−→ 1 whenP(D ≥ 2) = 1, as in this case the survival probability equals 1. In this section,we investigate conditions under which CMn(d) is whp connected, i.e., Cmax = [n]and |Cmax| = n.

Our main result shows that CMn(d) is whp connected when dmin = mini∈[n] di ≥

4.3 Connectivity of CMn(d) 153

3. Interestingly enough, we need very few conditions for this result, and do noteven need Conditions 1.5(a)-(b):

Theorem 4.15 (Connectivity of CMn(d)) Assume that Conditions 1.5(a)-(b)hold. Further, assume that di ≥ 3 for every i ∈ [n]. Then CMn(d) is connectedwhp. More precisely,

P(CMn(d) disconnected) = o(1). (4.3.1)

When Condition 1.5(a) holds with p1 = p2 = 0, then ν ≥ 2 > 1 is immediate,so we are always in the supercritical regime. Also, ζ = 1 when p1 = p2 = 0,since survival of the unimodular Galton-Watson tree occurs with probability 1.Therefore, Theorem 4.4 implies that the largest connected component has sizen(1 + oP(1)) when Condition 1.5(a)-(b) hold. Theorem 4.15 extends this to thestatement that CMn(d) is with high probability connected. However, we do notassume that Condition 1.5 holds here.

We note that Theorem 4.15 yields an important difference between the gener-alized random graph and the configuration model, also from a practical point ofview. Indeed, for the generalized random graph to be whp connected, the degreesmust tend to infinity. This has already been observed for ERn(p) in [Volume1, Theorem 5.8]. For the configuration model, it is possible that the graph isconnected while the average degree is bounded. Many real-world networks areconnected, which makes the configuration model often more suitable than inho-mogeneous random graphs.

Proof The proof is based on a simple counting argument. We recall that a con-figuration denotes a pairing of all the half-edges. We note that the probability ofa configuration equals 1/(`n − 1)!!. On the event that CMn(d) is disconnected,there exists a set of vertices I ⊂ [n] with |I| ≤ bn/2c such that all half-edgesincident to vertices in I are only paired to other half-edges incident to othervertices in I. For I ⊆ [n], we let

`n(I) =∑i∈I

di. (4.3.2)

Since di ≥ 3, we can use Theorem 4.4 to conclude that most edges are in Cmax,and I 6= Cmax. Therefore, `n(I) = o(`n) = o(n), and we may, without loss ofgenerality, assume that `n(I) ≤ `n/4. We denote the event that there exists aconnected component I consisting of |I| ≤ bn/2c vertices for which the sum ofdegrees is at most `n(I) ≤ `n/4 by E .

Clearly, in order for the half-edges incident to vertices in I to be paired onlyto other half-edges incident to vertices in I, `n(I) needs to be even. The numberof configurations for which this happens is bounded above by

(`n(I)− 1)!!(`n(Ic)− 1)!!. (4.3.3)


As a result,

P(CMn(d) disconnected; E) ≤∑I⊂[n]

(`n(I)− 1)!!(`n(Ic)− 1)!!

(`n − 1)!!(4.3.4)

=∑I⊂[n]

`n(I)/2∏j=1

`n(I)− 2j + 1

`n − 2j + 1,

where the sum over I ⊂ [n] is restricted to I for which |I| ≤ bn/2c and `n(I) ≤`n/4. In Exercise 4.12, you will use (4.3.4) to give a bound on the probability ofthe existence of an isolated vertex (i.e., a vertex with only self-loops)

Define

f(x) =x∏j=1

2x− 2j + 1

`n − 2j + 1. (4.3.5)

We can rewrite

f(x) =

∏xj=1(2x− 2j + 1)∏xj=1(`n − 2j − 1)

=

∏x−1i=0 (2i+ 1)∏x−1

k=0(`n − 2k + 1)=

x−1∏j=0

2i+ 1

`n − 2i− 1, (4.3.6)

where we write i = x−j and k = j−1 in the second equality. Thus, for x ≤ `n/4,x 7→ f(x) is decreasing, since

f(x+ 1)

f(x)=

2x+ 1

`n − 2x− 1≤ 1. (4.3.7)

Now, for every I, since di ≥ 3 for every i ∈ [n] and since `n(I) ≤≤ `n/4 is even,

`n(I) ≥ 2d3|I|/2e, (4.3.8)

which only depends on the number of vertices in I. Since there are precisely(nm

)ways of choosing m vertices out of [n], we conclude that

P(CMn(d) disconnected) ≤∑I⊂[n]

f(d3|I|/2e) =

bn/2c∑m=1

(n

m

)f(d3m/2e), (4.3.9)

with m = |I|.Note that, for m odd,

f(2d3(m+ 1)/2e)f(2d3m/2e) =

f((3m+ 1)/2 + 1)

f((3m+ 1)/2)=

3m+ 3

`n − 3m− 2. (4.3.10)

while, for m even,

f(d3(m+ 1)/2e)f(d3m/2e) =

f(3m/2 + 2)

f(3m/2)=

3m+ 5

`n − 3m− 5

3m+ 3

`n − 3m− 3, (4.3.11)

Define

hn(m) =

(n

m

)f(d3m/2e), (4.3.12)


so that

P(CMn(d) disconnected) ≤bn/2c∑m=1

hn(m). (4.3.13)

Then,

hn(m+ 1)

hn(m)=n−mm+ 1

f(d3(m+ 1)/2e)f(d3m/2e) , (4.3.14)

so that, for m odd and using `n ≥ 3n,

hn(m+ 1)

hn(m)=

3(n−m)

`n − 3m− 3≤ n−mn−m− 1

, (4.3.15)

while, for m even and using `n ≥ 3n,

hn(m+ 1)

hn(m)=

3(n−m)

`n − 3m− 5

3m+ 5

`n − 3m− 3≤ n−mn−m− 1

m+ 2

n−m− 2. (4.3.16)

Thus, we obtain that, for m ≤ n/2, there exists a c > 0 such that

hn(m+ 1)

hn(m)≤ 1 +

c

n. (4.3.17)

We conclude that, for m ≤ n/2 such that m ≥ 3,

hn(m) = hn(3)m∏j=3

hn(j + 1)

hn(j)≤ hn(3)

bn/2c∏j=3

(1 + c/n) (4.3.18)

≤ hn(3)(1 + c/n)bn/2c ≤ hn(3)ec/2,

so that

P(CMn(d) disconnected; E) ≤εn∑m=1

hn(m) ≤ hn(1) + hn(2) +

bn/2c∑m=3

hn(m)

(4.3.19)

≤ hn(1) + hn(2) + nhn(3)ec/2/2.

By Exercises 4.12 and 4.14, hn(1), hn(2) = O(1/n), so we are left to computehn(3). For this, we note that d3m/2e = 5 when m = 3, so that

hn(3) =

(n

3

)f(5) =

9!!n(n− 1)(n− 2)

6(`n − 1)(`n − 3)(`n − 5)(`n − 7)(`n − 9)= O(1/n2),

(4.3.20)so that nhn(3) = O(1/n). We conclude that

P(CMn(d) disconnected; E) = O(1/n). (4.3.21)

This is even stronger than required, and one would be tempted to believe thatalso P(CMn(d) disconnected) = O(1/n), but this proof does not show this sincewe start with the assumption that the non-giant (collection of) component(s) Isatisfies `n(I) ≤ `n/4.


The above proof is remarkably simple, and requires very little of the precisedegree distribution except for dmin ≥ 3. In the sequel, we investigate what happenswhen this condition fails.

We continue by showing that CMn(d) is with positive probability disconnectedwhen either n1 n1/2:

Proposition 4.16 (Disconnectivity of CMn(d) when n1 n1/2) Let Condition1.5(a)-(b) hold, and assume that n1 n1/2. Then,

limn→∞

P(CMn(d) connected) = 0. (4.3.22)

Proof We note that CMn(d) is disconnected when there are two vertices ofdegree 1 whose half-edges are paired to each other. When the half-edges of twovertices of degree 1 are paired to each other, we say that a 2-pair is created.Then, since after i pairings of degree-1 vertices to higher-degree vertices, thereare `n − n1 − i+ 1 half-edges incident to higher-degree vertices, out of a total of`n − 2i+ 1 unpaired half-edges,

P(CMn(d) contains no 2-pair) =n1∏i=1

`n − n1 − i+ 1

`n − 2i+ 1(4.3.23)

=n1∏i=1

(1− n1 − i

`n − 2i+ 1

).

Since, for each i ≥ 1,

1− n1 − i`n − 2i+ 1

≤ 1− n1 − i`n

≤ e−(n1−i)/`n , (4.3.24)

we arrive at

P(CMn(d) contains no 2-pair) ≤n1∏i=1

e−(n1−i)/`n (4.3.25)

= e−n1(n1−1)/[2`n] = o(1),

since `n = Θ(n) and n1 n1/2.

Proposition 4.17 (Disconnectivity of CMn(d) when p2 > 0) Let Conditions1.5(a)-(b) hold, and assume that p2 > 0. Then,

lim supn→∞

P(CMn(d) connected) < 1. (4.3.26)

Proof We perform a second moment method on the number P (2) of connectedcomponents consisting of two vertices of degree 2. The expected number of suchcomponents equals

E[P (2)] =2n2(n2 − 1)

2(`n − 1)(`n − 3), (4.3.27)

since there are n2(n2 − 1)/2 pairs of vertices of degree 2, and the probability


that a fixed pair forms a connected component is equal to 2/(`n − 1)(`n − 3). ByCondition 1.5(a)-(b), which implies that n2/n→ p2,

E[P (2)]→ p22/E[D]2 ≡ λ2. (4.3.28)

By assumption, p2 > 0, so that also λ2 > 0. We can use Theorem 2.6 to show

that P (2)d−→ Poi(λ2), so that

P(CMn(d) disconnected) ≥ P(P (2) > 0)→ 1− e−λ2 > 0, (4.3.29)

as required. The proof that Theorem 2.6 can be applied is left as Exercise 4.11below.

We close this section with a detailed result on the size of the giant componentwhen dmin ≥ 2:

Theorem 4.18 (Connectivity of CMn(d) when p1 = 0) Let Conditions 1.5(a)-(b) hold, and assume that p2 ∈ (0, 1). Assume that di ≥ 2 for every i ∈ [n].Then

n− |Cmax| d−→∑k≥2

kXk, (4.3.30)

where (Xk)k≥2 are independent Poisson random variables with parameters λk2/(2k)with λ2 = 2p2/E[D]. Consequently,

P(CMn(d) connected)→ e−∑k≥2 λ

k2/(2k) ∈ (0, 1). (4.3.31)

Rather than giving the complete proof of Theorem 4.18, we give a sketch of it:

Sketch of proof of Theorem 4.18. Let P (k) denote the number of k-cycles consist-ing of degree 2 vertices, for k ≥ 2. Obviously, every vertex in such a cycle is notpart of the giant component, so that

n− |Cmax| ≥∑k≥2

kP (k). (4.3.32)

A multivariate moment method allows one to prove that (P (k))k≥2d−→ (Xk)k≥2,

where (Xk)k≥2 are independent Poisson random variables with parameters λk2/(2k) =limn E[P (k)]. See also Exercise 4.15, where you are asked to prove this.

In order to complete the argument, two approaches are possible (and have beenused in the literature). First, Federico and van der Hofstad (2017) use countingarguments to show that as soon as a connected component has at least one vertexv of degree dv ≥ 3, then it is whp part of the giant component Cmax. This thenproves that (4.3.32) is whp an equality. See also Exercise 4.16.

Alternatively, and more in the style of Luczak (1992), one can pair up allthe half-edges incident to vertices of degree 2, and realize that the graph, afterpairing of all the vertices of degree 2 is again a configuration model with anchanged degree distribution. The cycles consisting of only degree 2 vertices willbe removed, so that we only need to consider the contribution of pairing strings


of degree 2 vertices to vertices of degrees at least 3. If both ends of the string areeach connected to two distinct vertices of degrees ds, dt at least 3, then we canimagine this string to correspond to a single vertex of degree ds+dt−2 ≥ 4, whichis sufficiently large. Unfortunately, it is also possible that the string of degree 2vertices is connected to the same vertex u of degree du ≥ 3. When du ≥ 5, thenthere will still be at least 3 remaining half-edges of u, which is fine. Thus, weonly need to care about the case where we create a cycle of vertices of degree 2with one vertex u in it of degree du = 3 or du = 4, which will then correspond tovertices of degree 1 and 2. In Exercise 4.17, you are asked to prove that there is abounded number of such cycles. Thus, it suffices to extend the proof of Theorem4.15 to the setting where there are a bounded number of vertices of degrees 1 and2. We can deal with the degree 2 vertices in the same way. Pairing the degree 1vertices again leads to vertices of (extra) degree at least 3-1=2, which are finewhen the extra degree is at least 4, and otherwise can be dealt with as with theother degree 2 vertices. We refrain from giving more details.

4.4 Related results for the configuration model

In this section, we discuss related results on connected components for theconfiguration model. We start by discussing the subcritical behavior of the con-figuration model.

The largest subcritical cluster

When ν < 1, so that in particular E[D2] <∞, the largest connected componentfor CMn(d) is closely related to the largest degree:

Theorem 4.19 (Subcritical phase for CMn(d)) Let d satisfy Condition 1.5(a)-(c) with ν = E[D(D − 1)]/E[D] < 1. Suppose further that there exists τ > 3 andc2 > 0 such that

[1− Fn](x) ≤ c2x−(τ−1). (4.4.1)

Then, for CMn(d) with dmax = maxj∈[n] dj,

|Cmax| =dmax

1− ν + oP(n1/(τ−1)). (4.4.2)

Theorem 4.19 is closely related to Theorem 3.19. In fact, we can use Theorem4.19 to prove Theorem 3.19, see Exercise 4.18. Note that the result in (4.4.2) ismost interesting when dmax = Θ(n1/(τ−1)), as it would be in the case when thedegrees obey a power law with exponent τ (for example, in the case where thedegrees are i.i.d.). When dmax = o(n1/(τ−1)), instead, Theorem 4.19 implies that|Cmax| = oP(n

1/(τ−1)), a less precise result. Bear in mind that Theorem 4.19 onlygives sharp asymptotics of |Cmax| when dmax = Θ(n1/(τ−1)) (see Exercise 4.19).

The intuition behind Theorem 4.19 is that from the vertex of maximal degree,there are dmax half-edges that can reach more vertices. Since the random graph is

4.4 Related results for the configuration model 159

subcritical, one can use Theorem 4.1 to prove that the tree rooted at any half-edgeincident to the vertex of maximal degree converges in distribution to a subcriticalbranching process. Further, the trees rooted at different half-edges are close tobeing independent. Thus, by the law of large numbers, one can expect that thetotal number of vertices in these dmax trees are close to dmax times the expectedsize of a single tree, which is 1/(1− ν). This explains the result in Theorem 4.19.Part of this intuition is made precise in Exercises 4.21 and 4.22. Exercises 4.23and 4.24 investigate conditions under which |Cmax| = dmax/(1 − ν)(1 + oP(1))might or might not hold.

The near-critical supercritical behavior in the configuration model

Janson and Luczak (2009) also prove partial results on the near-critical behaviorof CMn(d), and these are further extended in van der Hofstad et al. (2016). Wedistinguish between the degrees having finite third-moment degrees, and the casewhere the degrees obey a power law with power-law exponent τ ∈ (3, 4):

Theorem 4.20 (Near-critical behavior CMn(d) with finite third moments) Letd satisfy Condition 1.5(a)-(c) with ν = E[D(D − 1)]/E[D] = 1.(a) Assume further that αn = νn − 1 = E[Dn(Dn − 2)]/E[Dn] > 0 is such thatαn n−1/3, and that

E[D3+εn ] = O(1) (4.4.3)

for some ε > 0. Let β = E[D(D − 1)(D − 2)]/E[D] > 0. Then, CMn(d) satisfies

|Cmax| =2

E[D]βnαn + oP(nαn),

|vk(Cmax)| = 2E[D]

βkpknαn + oP(nαn), for every k ≥ 0,

|E(Cmax)| = 2E[D]2

βnαn + oP(nαn),

while |C(2)| = oP(nαn) and |E(C(2))| = oP(nαn).

We next investigate the setting where the degrees do not have finite third-moments, which turns out to be quite different:

Theorem 4.21 (Near-critical behavior CMn(d) with infinite third moments)Let d satisfy Condition 1.5(a)-(c) with ν = E[D(D−1)]/E[D] = 1. Let ζ?n denotethe survival probability of a branching process with offspring distribution D?

n − 1.Assume further that αn = νn − 1 = E[Dn(Dn − 2)]/E[Dn] > 0 is such thatαn n−1/3(E[D3

n])2/3. Then, CMn(d) satisfies

|Cmax| = E[D]ζ?nn(1 + oP(1)),

|vk(Cmax)| = kpkζ?nn(1 + oP(1)), for every k ≥ 0,

|E(Cmax)| = E[D]ζ?nn(1 + oP(1)),

while |C(2)| = oP(nζ?n) and |E(C(2))| = oP(nζ

?n).


The asymptotics of |Cmax| in Theorem 4.20 can be understood by the fact that,for a branching process with offspring distribution X having mean E[X] = 1 + εwhere ε is small, the survival probability ζ satisfies ζ = 2ε/Var(X)(1 + o(1)) (seeExercise 4.25). Therefore, the survival probability ζ? of the branching processwith offspring distribution D? − 1 is close to 2ε/β, where we note that β =Var(D? − 1) = Var(D?). Since the limit of |Cmax|/n ζ satisfies

ζ =∞∑k=1

pk(1− (1− ζ?)k), (4.4.4)

we further obtain that

ζ = ζ?E[D](1 + o(1)). (4.4.5)

The results on |vk(Cmax)| and |E(Cmax)| can be understood in a similar way.Exercises 4.25–4.26 investigate the asymptotics of survival probabilities undervarious tail and moment assumptions. The results in Theorem 4.21 are far moregeneral, and Theorem 4.21 implies Theorem 4.20 (see Exercise 4.27).

In the case where the degrees obey a power law, for example when 1−Fn(x) =Θ(x−(τ−1)) for all x ≤ n1/(τ−1), it can be seen that ζ?n = Θ(α1/(τ−3)

n ) (recall alsoExercise 4.26), and the restriction on αn becomes αn n−(τ−3)/(τ−1) (see Exercise4.28).

The critical behavior in the configuration model

In this section, we study the critical behavior of the configuration model. Thecritical phase can be characterized by the largest connected components beingrandom, and often scaling limits are taken to identify their limiting distribution.We focus on the critical case of CMn(d) for i.i.d. degrees:

Theorem 4.22 (Weak convergence of the ordered critical clusters: finite thirdmoments) Let d = (di)i∈[n] be a sequence of i.i.d. random variables having thesame distribution as D satisfying ν = E[D(D−1)]/E[D] = 1. Let (|C(i)|)i≥1 denotethe clusters of CMn(d), ordered in size. Let E[D3] <∞. Then, as n→∞,(

n−2/3|C(i)|)i≥1

d−→ (γi)i≥1, (4.4.6)

in the product topology, for some non-degenerate limit (γi)i≥1.

Theorem 4.22 is reminiscent of [Volume 1, Theorem 5.7] for the Erdos-R’enyirandom graph. In fact, it even turns out that the scaling limits (γi)i≥1 are closelyrelated to the scaling limits of critical connected components in Erdos-R’enyirandom graph. Thus, one can say that Theorem 4.22 describes the setting ofweak inhomogeneity. We next study the case of strong inhomogeneity, which cor-responds to degrees having infinite third moment:

Theorem 4.23 (Weak convergence of the ordered critical clusters) Let d =(di)i∈[n] be a sequence of i.i.d. random variables having the same distribution asD satisfying ν = E[D(D − 1)]/E[D] = 1. Let (|C(i)|)i≥1 denote the clusters of


CMn(d), ordered in size. Let the distribution function F of D satisfy that thereexists a τ ∈ (3, 4) and 0 < cF <∞ such that

limx→∞

xτ−1[1− F (x)] = cF . (4.4.7)

Then, as n→∞, (n−(τ−2)/(τ−1)|C(i)|

)i≥1

d−→ (γi)i≥1, (4.4.8)

in the product topology, for some non-degenerate limit (γi)i≥1.

As mentioned before, Theorem 4.22 describes the setting where the effect oflarge degrees is negligible. When E[D3] = ∞, on the other hand, the criticalscaling changes rather dramatically, and the largest critical cluster has size nρ,where ρ = (τ − 2)/(τ − 1) ∈ (1/2, 2/3). Thus, in the presence of high-degreevertices, critical connected components become smaller. This is similar to thefact that the near-critical branching process survival probability becomes smallerfor heavy-tailed offspring distributions compared to light-tailed offspring distri-butions (compare Exercises 4.25 and 4.26).

There are many results extending Theorems 4.22–4.23 to fixed degrees, undersimilar (but stronger) assumptions as in Condition 1.5(a)-(c). We refer to Section4.5 for an extensive overview of the literature.



Theorem 4.1 is a classical result for the configuration model, and has appearedin various guises throughout the literature. For example, Dembo and Montanari(2010b) crucially rely on it in order to identify the limiting pressure for the Isingmodel on the configuration model.


This section is adapted from Janson and Luczak (2009), which, in turn, gener-alizes the results by Molloy and Reed (1995, 1998). The results by Molloy andReed (1995, 1998) are not phrased in terms of branching processes, which makesthem a bit more difficult to grasp. We have chosen to reformulate the results us-ing branching process terminology. We also refer to Bollobas and Riordan (2015),who give an alternative proof using branching process approximations on the ex-ploration of the giant component. They also provide the extension of Theorem4.11 that we explain there, by showing that the probability of a deviation of or-der εn of vk(Cmax) is exponentially small for the configuration model. Since theprobability of simplicity in CMn(d) for power-law degrees with infinite varianceis not exponentially small, this implies the stated extension in Theorem 4.12.



These results are folklore. A version of Theorem 4.15 can be found in (Chatterjeeand Durrett, 2009, Lemma 1.2). We could not find the precise version stated inTheorem 4.15. Theorem 4.18 is proved by Federico and van der Hofstad (2017).This paper also allows for a number of vertices n1 of degree 1 satisfying n1 = ρ

√n.

Earlier versions include the results by Luczak (1992) for dmin ≥ 2 and Wormald(1981), who proved r-connectivity when dmin = r.


Theorem 4.19 is proved by Janson (2008). (Janson, 2008, Theorem 1.1) showsthat |Cmax| ≤ An1/(τ−1) when the power-law upper bound in (4.4.1) holds, while(Janson, 2008, Theorem 1.1) gives the asymptotic statement. Further, Janson(2008) remarks that the jth largest cluster has size d(j)/(1− ν) + oP(n

1/(τ−1)).Theorem 4.20 was first proved under a finite (4+ε)-moment condition in Janson

and Luczak (2009). It is improved to the case where E[D3n] → E[D3] by van der

Hofstad et al. (2016). Interestingly, the behavior is different when Conditions1.5(a)-(b) hold, but E[D3

n]→∞ sufficiently fast. Also this case is studied by vander Hofstad et al. (2016).

Theorems 4.22–4.23 are proved by Joseph (2014), who focuses on configurationmodels with i.i.d. degrees. For related (weaker but more robust) results in thecase of fixed degrees satisfying an assumption as in Condition 1.5, see Hatami andMolloy (2010). There is a lot of work on scaling limits for critical configurationmodels (or the related problem of critical percolation on configuration models,which can be related to critical configuration models, see Janson (2009a)), andwe now give some links to the literature. Nachmias and Peres (2010) studiescritical percolation on random regular graphs. Riordan (2012) studies the criticalbehavior of configuration models with bounded degrees, while Dhara et al. (2017)extends these results to the (necessary) finite third-moment assumption. Dharaet al. (2016) shows that different scaling arises when the degrees have power-law tails with infinite third-moment degrees. Interestingly, these results are quitedifferent from those in the setting of i.i.d. degrees studied by Joseph (2014). Foran extensive overview of the literature, we refer to van der (Hofstad, 2018+,Chapter 4).


Exercise 4.1 (Convergence of n-dependent branching process) Assume that

Conditions 1.5(a)-(b) hold. Prove that D?n

d−→ D?, and conclude that BPn(t)d−→

BP(t) for every t finite.

Exercise 4.2 (Proof of no-overlap property in (4.1.19)) Assume that the con-ditions in Theorem 4.1 hold. Prove that P(B(n)

k (o1) ' t, o2 ∈ B(n)

2k (o1)) → 0, andconclude that (4.1.19) holds.

Exercise 4.3 (Cluster size of vertex 1 in a 2-regular graph) Let n2 = n, and


let C (1) denote the cluster size of vertex 1. Show that

|C (1)|/n d−→ T, (4.6.1)

where P(T ≤ x) = 1−√

1− x.

Exercise 4.4 (Cluster size in a 2-regular graph with some degree-1 vertices)Let n1 →∞ with n1/n→ 0, and n2 = n−n1. Let C (1) denote the cluster size ofvertex 1. Show that

|C (1)|/n P−→ 0. (4.6.2)

Exercise 4.5 (Cluster size in a 2-regular graph with some degree-4 vertices)Let n4 →∞ with n4/n→ 0, and n2 = n−n4. Let C (1) denote the cluster size ofvertex 1. Show that

|C (1)|/n P−→ 1. (4.6.3)

Exercise 4.6 (Limiting constants) Prove that for t1 = 0 and t2 = θ, e−kt1 =1, e−kt2 = ξ, GD(e−t1) = 1, GD(e−t2) = 1−ζ, h(e−t1) = 2E[D], h(e−t2) = 2E[D]ξ2.

Exercise 4.7 (Theorem 4.4 for GRGn(w) with ν = ∞) Prove that all state-ments in Theorem 4.4 hold for the GRGn(w) in Theorem 4.14 when ν =∞.

Exercise 4.8 (Number of vertices with degree k) Let w satisfy Conditions

1.1(a)-(b). Adapt the proof of |Cmax|/n P−→ ζ to show that also vk(Cmax)/nP−→

pk(1− ξk) for NRn(w).

Exercise 4.9 (Phase transition in UGn(d) for ν = ∞) Combine (4.2.54) and(4.2.52)–(4.2.53) to complete the proof of Theorem 4.12.

Exercise 4.10 (A lower bound on the probability of simplicity) Prove (4.2.54)under the assumption that Conditions 1.5(a)-(b) hold, or look the proof up in(Bollobas and Riordan, 2015, Lemma 21).

Exercise 4.11 (Factorial moments of P (2)) Let Conditions 1.5(a)-(b) hold,and assume that p2 > 0. Prove that, for every k ≥ 1 and with λ2 = 2p2

2/E[D]2,

E[(P (2))k]→ λk2 . (4.6.4)

Conclude that P (2)d−→ Poi(λ2).

Exercise 4.12 (Isolated vertex) Use (4.3.4) to show that, when di ≥ 3 for alli ∈ [n],

P(there exists an isolated vertex) ≤ 3n

(2`n − 1)(2`n − 3). (4.6.5)

Exercise 4.13 (Isolated vertex (Cont.)) Use (4.3.9) to reprove Exercise 4.12.Hence, the above bound is quite sharp.


Exercise 4.14 (A cluster of size two) Use (4.3.9) to prove that, when di ≥ 3for all i ∈ [n],

P(there exists a cluster of size 2) ≤ 15n(n− 1)

(2`n − 1)(2`n − 3)(2`n − 5). (4.6.6)

Exercise 4.15 (Cycles in CMn(d)) Let P (k) denote the number of k-cyclesconsisting of degree 2 vertices, for k ≥ 2. Let λ2 = 2p2/E[D]. Use the multi-

variate moment method to prove that (P (k))k≥2d−→ (Xk)k≥2, where (Xk)k≥2 are

independent Poisson random variables with parameters λk2/(2k) = limn E[P (k)].

Exercise 4.16 (Cmax when dmin = 2) Consider CMn(d) with dmin = 2 andassume that P(D ≥ 3) > 0. Show that Theorem 4.18 holds if P(∃v : dv ≥ 3 and v 6∈Cmax) = o(1).

Exercise 4.17 (Cycles of degree 2 vertices with one other vertex) Let Condi-tions 1.5(a)-(b) hold, and suppose that dmin ≥ 2. Show that the expected number ofcycles consisting of vertices of degree 2 except for one vertex of degree k convergesto k(k−1)

2E[D]2

∑`≥1(2p2/E[D])`.

Exercise 4.18 (Proof of Theorem 3.19) Use Theorem 4.19 and Theorem 1.7to prove Theorem 3.19.

Exercise 4.19 (Sharp asymptotics in Theorem 4.19) Prove that |Cmax| =dmax/(1− ν)(1 + oP(1)) precisely when dmax = Θ(n1/(τ−1)).

Exercise 4.20 (Sub-polynomial subcritical clusters) Use Theorem 4.19 to provethat |Cmax| = oP(n

ε) for every ε > 0 when (4.4.1) holds for every τ > 1 (wherethe constant c2 is allowed to depend on τ). Thus, when the maximal degree issub-polynomial, then also the maximal connected component is.

Exercise 4.21 (Single tree asymptotics in Theorem 4.19) Assume that theconditions in Theorem 4.19 hold. Use Theorem 4.1 to prove that the tree rootedat any half-edge incident to the vertex of maximal degree converges in distributionto a subcritical branching process with expected total progeny 1/(1− ν).

Exercise 4.22 (Two-tree asymptotics in Theorem 4.19) Assume that the con-ditions in Theorem 4.19 hold. Use the local weak convergence in Theorem 4.1 toprove that the two trees rooted at any pair of half-edges incident to the vertexof maximal degree jointly converge in distribution to two independent subcriticalbranching processes with expected total progeny 1/(1− ν).

Exercise 4.23 (Theorem 4.19 when dmax = o(log n)) Assume that the condi-tions in Theorem 4.19 hold, so that ν < 1. Suppose that dmax = o(log n). Do youexpect |Cmax| = dmax/(1− ν)(1 + oP(1)) to hold?

Exercise 4.24 (Theorem 4.19 when dmax log n) Assume that the conditionsin Theorem 4.19 hold, so that ν < 1. Suppose that dmax log n. Do you expect|Cmax| = dmax/(1− ν)(1 + oP(1)) to hold?


Exercise 4.25 (Survival probability of finite-variance branching process) LetX be the offspring distribution of a branching process with finite variance. Showthat its survival probability ζ = ζ(ε) satisfies ζ(ε) = 2ε/Var(X)(1 + o(1)). Whatdoes this imply for a unimodular branching process with finite third-moment rootoffspring distribution?

Exercise 4.26 (Survival probability of infinite-variance branching process) LetX be the offspring distribution of a branching process with infinite variance andtail asymptotics P(X > x) = cx−(τ−2)(1 + o(1)) for some τ ∈ (2, 3). Show that itssurvival probability ζ = ζ(ε) satisfies ζ(ε) = Θ(ε1/(τ−3)). What does this imply fora unimodular branching process with root offspring distribution X that satisfiesP(X > x) = cx−(τ−1)(1 + o(1)) for some τ ∈ (2, 3)?

Exercise 4.27 (Relation Theorems 4.20 and 4.21) Show that Theorem 4.21implies Theorem 4.20 when E[D3

n]→ E[D3].

Exercise 4.28 (Near-critical configuration model with infinite third momentpower-law degrees) Let the conditions in Theorem 4.21 hold. Assume that 1 −Fn(x) = Θ(x−(τ−1)) for all x ≤ n1/(τ−1). Recall that ζ?n is the survival prob-ability of the branching process with offspring distribution D?

n − 1. Show thatζ?n = Θ(α1/(τ−3)

n ) and thus that |Cmax| = Θ(α1/(τ−3)n n). Show further that αn

n−1/3(E[D3n])2/3 is equivalent to αn n−(τ−3)/(τ−1).

Chapter 5

Connected components inpreferential attachment models

Abstract

In this chapter, we further investigate preferential attachment models.In Section 5.1 we start by discussing an important tool in this chapter:exchangeable random variables and their distribution described in DeFinetti’s Theorem. We apply these results to Polya urn schemes, which,in turn, we use to describe the distribution of the degrees in preferentialattachment models. It turns out that Polya urn schemes can also beused to describe the local limit of preferential attachment models.


We start in Section 5.1 to discuss exchangeable random variables, and their fas-cinating properties. We continue in Section 5.2 with local weak convergence forpreferential attachment models. In Section 5.4, we investigate the connectivityof PA(m,δ)

t . Section 5.5 highlights some further results for preferential attachmentmodels. We close this chapter in Section 5.6 with notes and discussion, and inSection 5.7 with exercises.

Throughout this chapter, we work with the preferential attachment model de-fined in Section 1.3.5 (and discussed in detail in [Volume 1, Chapter 8]) anddenoted by (PA(m,δ)

n )n≥1, unless stated otherwise. We recall that, for m = 1, thismodel starts with a single vertex with 1 self-loops at time t = 1 and at each timea vertex is added with 1 edges which are attached to the vertices in the graphwith probabilities given in (1.3.55) for m = 1. The model with m ≥ 2 is obtainedby identifying blocks of m vertices in (PA(1,δ/m

n )n≥1. We sometimes also discussother variants of the model, such as (PA(m,δ)

n (b))n≥1, in which the m = 1 modeldoes not have any self-loops (recall (1.3.62)), so that the model is by defaultconnected.

5.1 Exchangeable random variables and Polya urn schemes

In this section, we discuss the distribution of infinite sequences of exchange-able random variables and their applications to Polya urn schemes. We start bydiscussing De Finetti’s Theorem.

De Finetti’s Theorem for exchangeable random variables

We start by defining when sequences of random variables are exchangeable:

Definition 5.1 (Exchangeable random variables) A finite sequence of randomvariables (Xi)

ni=1 is called exchangeable when the distribution of (Xi)

ni=1 is the

same as the one of (Xσ(i))ni=1 for any permutation σ : [n] → [n]. An infinite

167

168 Connected components in preferential attachment models

sequence (Xi)i≥1 is called exchangeable when (Xi)ni=1 is exchangeable for every

n ≥ 1.

The notion of exchangeability is rather strong, and implies for example thatthe distribution of Xi is the same for every i (see Exercise 5.1), as well as that(Xi, Xj) have the same distribution for every i 6= j.

Clearly, when a sequence of random variables is i.i.d., then it is also exchange-able (see Exercise 5.2). A second example arises when we take a sequence of ran-dom variables that are i.i.d. conditionally on some random variables. An examplecould be a sequence of Bernoulli random variables that are i.i.d. conditionallyon their success probability U , but U itself is random. This is called a mixtureof i.i.d. random variables. Remarkably, however, the distribution of an infinitesequence of exchangeable random variables is always such a mixture of i.i.d. ran-dom variables. This is the content of De Finetti’s Theorem, which we state andprove here in the case where (Xi)i≥1 are indicator variables, which is the mostrelevant setting for our purposes:

Theorem 5.2 (De Finetti’s Theorem) Let (Xi)i≥1 be an infinite sequence ofexchangeable random variables, and assume that Xi ∈ 0, 1. Then there exists arandom variable U with P(U ∈ [0, 1]) = 1 such that, for all n ≥ 1 and 1 ≤ k ≤ n,

P(X1 = · · · = Xk = 1, Xk+1 = · · · = Xn = 0) = E[Uk(1− U)n−k]. (5.1.1)

De Finetti’s Theorem (Theorem 5.2) states that an infinite exchangeable se-quence of indicators has the same distribution as an independent Bernoulli se-quence with a random success probability U . Thus, the different elements of thesequence are not independent, but their dependence enters only through the ran-dom success probability U . In particular, an infinite exchangeable sequence ofindicators is a mixture of i.i.d. Bernoulli variables.

The proof of Theorem 5.2 can be relatively easily be extended to more generalsettings, for example, when Xi takes on at most a finite number of values. Sincewe only rely on Theorem 5.2 for indicator variables, we refrain from stating thismore general version.

Define Sn to be the number of ones in (Xi)ni=1, i.e.,

Sn =n∑k=1

Xk. (5.1.2)

Then Theorem 5.2 is equivalent to the statement that

P(Sn = k) = E[P(Bin(n,U) = k

)]. (5.1.3)

You are asked to prove (5.1.3) in Exercise 5.4. Equation (5.1.3) also allows us tocompute the distribution of U . Indeed, when we would have that

limn→∞

P(Sn ∈ (an, bn)) =

∫ b

a

f(u)du, (5.1.4)

5.1 Exchangeable random variables and Polya urn schemes 169

where f is a density, then (5.1.3) implies that f is in fact the density of the randomvariable U . This is useful in applications of De Finetti’s Theorem (Theorem 5.2).Equation (5.1.4) follows by noting that Sn/n

a.s.−→ U by the strong law of largenumbers applied to the conditional law given U . In Exercise 5.3, you are askedto fill in the details.Proof of Theorem 5.2. The proof makes use of Helly’s Theorem, which states thatany sequence of bounded random variables has a weakly converging subsequence.We fix m ≥ n and condition on Sm to write

P(X1 = · · · = Xk = 1, Xk+1 = · · · = Xn = 0) (5.1.5)

=m∑j=k

P(X1 = · · · = Xk = 1, Xk+1 = · · · = Xn = 0 | Sm = j

)P(Sm = j).

By exchangeability and conditionally on Sm = j, each sequence (Xi)mi=1 containing

precisely j ones is equally likely. Since there are precisely(mj

)such sequences, and

precisely(m−nj−k

)of them start with k ones and n− k zeros, we obtain

P(X1 = · · · = Xk = 1, Xk+1 = · · · = Xn = 0 | Sm = j

)=

(m−nj−k

)(mj

) . (5.1.6)

Writing (m)k = m · (m− 1) · · · (m− k+ 1) for the kth factorial moment of m, wetherefore arrive at

P(X1 = · · · = Xk = 1, Xk+1 = · · · = Xn = 0)

=m∑j=k

(j)k(m− j)n−k(m)n

P(Sm = j). (5.1.7)

When m→∞ and for k and n with k ≤ n fixed,

(j)k(m− j)n−k(m)n

=( jm

)k(1− j

m

)n−k+ o(1), (5.1.8)

which can be seen by splitting between j > εm and j ≤ εm for ε > 0 arbitrarilysmall. On the former, (j)k = jk(1 + o(1)), while on the latter, (j)k ≤ (εm)k and(m− j)m−k/(m)n ≤ m−k.

Recall that Sm = j, so that

P(X1 = · · · = Xk = 1, Xk+1 = · · · = Xn = 0) (5.1.9)

= limm→∞

E[Y km(1− Ym)n−k

],

where Ym = Sm/m. Note that it is here that we make use of the fact that (Xi)i≥1

is an infinite exchangeable sequence of random variables. Equation (5.1.9) is thepoint of departure for the completion of the proof.

We have that 0 ≤ Ym ≤ 1 since 0 ≤ Sm ≤ m, so that the sequence of randomvariables (Ym)m≥1 is a bounded sequence. By Helly’s Theorem, it contains aweakly converging subsequence, i.e., there exists a (Yml)l≥1 with liml→∞ml =∞and a random variable U such that Yml

d−→ U . Since the random variable Y km(1−


Ym)n−k is uniformly bounded for each k, n, Lebeque’s Dominated ConvergenceTheorem ([Volume 1, Theorem A.1]) gives that

limm→∞

E[Y km(1− Ym)n−k

]= lim

l→∞E[Y kml

(1− Yml)n−k]

(5.1.10)

= E[Uk(1− U)n−k

].


A careful reader may wonder about whether the above proof on the basis ofsubsequences indeed is enough. Indeed, it is possible that another subsequence(Ym′l)l≥1 with liml→∞m

′l =∞ has another limiting random variable V such that

Ym′ld−→ V. However, from (5.1.9) we then conclude that E

[V k(1 − V )n−k

]=

E[Uk(1−U)n−k

]for every k, n. In particular, E[V k] = E[Uk] for every k ≥ 0. Since

the random variables U, V are a.s. bounded by 1, and have the same moments,

they also have the same distribution. We conclude that Ymld−→ U for every

subsequence along which Yml converges, and this is equivalent to Ymd−→ U .

De Finetti’s Theorem implies that when Xk and Xn are coordinates of an infiniteexchangeable sequence of indicators, then they are positively correlated, see Ex-ercise 5.5. Thus, it is impossible for infinite exchangeable sequences of indicatorvariables to be negatively correlated, which is somewhat surprising.

In the proof of De Finetti’s Theorem, it is imperative that the sequence (Xi)i≥1

is infinite. This is not mere a technicality of the proof. Rather, there are finiteexchangeable sequences of random variables for which the equality (5.1.1) doesnot hold. Indeed, take an urn filled with b blue and r red balls and draw balls suc-cessively without replacement. Thus, the urn is sequentially being depleted, andit will be empty after the (b+r)th ball is drawn. Let Xi denote the indicator thatthe ith ball drawn is blue. Then, clearly, the sequence (Xi)

r+bi=1 is exchangeable.

However,

P(X1 = X2 = 1) =b(b− 1)

(b+ r)(b+ r − 1)(5.1.11)

<( b

b+ r)2 = P(X1 = 1)P(X2 = 1),

so that X1 and X2 are negatively correlated. By Theorem 5.2, on the other hand,P(X1 = X2 = 1) = E[U2] ≥ E[U ]2 = P(X1 = 1)P(X2 = 1), so in infiniteexchangeable sequences, the random variables are positively correlated when U isnot constant. As a result, (5.1.1) fails.

Polya urn schemes

An important application of De Finetti’s Theorem (Theorem 5.2) arises in so-called Polya urn schemes. An urn consists of a number of balls, and we succes-sively draw balls and replace them in the urn. We start with B0 = b0 blue ballsand R0 = r0 red balls at time n = 0. Let Wb,Wr : N → (0,∞) be two weight


functions. Then, at time n+ 1, the probability of drawing a blue ball, condition-ally on the number Bn of blue balls at time n, is proportional to the weight ofthe blue balls at time n, i.e., the conditional probability of drawing a blue ball isequal to

Wb(Bn)

Wb(Bn) +Wr(Rn). (5.1.12)

After drawing a ball, it is replaced together with a second ball of the same color.We denote this Polya urn scheme by ((Bn, Rn))∞n=1. Naturally, since we alwaysreplace one ball by two balls, the total number of balls is deterministic and satisfiesBn +Rn = b0 + r0 + n.

In this section, we restrict to the case where there exist ar, ab > 0 such that

Wb(k) = ab + k, Wr(k) = ar + k, (5.1.13)

i.e., both weight functions are linear with the same slope, but possibly a differentintercept. Our main result concerning Polya urn schemes is the following theorem:

Theorem 5.3 (Limit theorem for linear Polya urn schemes) Let ((Bn, Rn))∞n=1

be a Polya urn scheme with linear weight functions Wb and Wr as in (5.1.13) forsome ar, ab > 0. Then, as n→∞,

BnBn +Rn

a.s.−→ U, (5.1.14)

where U has a Beta-distribution with parameters a = b0 +ab and b = r0 +ar, and

P(Bn = B0 + k) = E[P(Bin(n,U) = k

)]. (5.1.15)

Before proving Theorem 5.3, let us comment on its remarkable content. Clearly,the number of blue balls Bn is not a binomial random variable, as early drawsof blue balls reinforce the proportion of blue balls in the end. However, (5.1.15)states that we can first draw a random variable, and then conditionally on thatrandom variable, the number of blue balls is binomial. Thus, we can insteadthink of the Polya urn as first drawing the limiting random variable U , and,conditionally on its value, after this having a binomial experiment. This is anextremely useful perspective, as we will see later on. The urn conditioned on thelimiting variable U is sometimes called a Polya urn with strength U , and Theorem5.3 implies that this is a mere binomial experiment.

Proof of Theorem 5.3. Let Xn denote the indicator that the nth ball drawn isblue. We first show that (Xn)n≥1 is an infinite exchangeable sequence. For this,we note that

Bn = b0 +n∑j=1

Xj, Rn = r0 +n∑j=1

(1−Xj) = r0 − b0 + n−Bn. (5.1.16)

Now, for any sequence (xt)nt=1,

P((Xt)

nt=1 = (xt)

nt=1

)=

n∏t=1

Wb(bt−1)xtWr(rt−1)1−xt

Wb(bt−1) +Wr(rt−1), (5.1.17)


where bt = b0 +∑t

j=1 xj and rt = R0 − B0 + t − bt. Denote k =∑n

t=1 xt. Then,by (5.1.13) and (5.1.16),

n∏t=1

(Wb(bt−1) +Wr(rt−1)) =n∏t=1

(b0 + r0 + ab + ar + t− 1), (5.1.18)

while

n∏t=1

Wb(bt−1)xt =k−1∏m=0

(b0 + ab +m),n∏t=1

Wr(rt−1)1−xt =n−k−1∏j=0

(r0 + ar + j).

(5.1.19)Thus, we arrive at

P((Xt)

nt=1 = (xt)

nt=1

)=

∏k−1m=0(b+m)

∏n−k−1j=0 (r + j)∏n−1

t=0 (b+ r + t), (5.1.20)

where b = b0 + ab and r = r0 + ar. In particular, (5.1.20) does not depend on theorder in which the elements of (xt)

nt=1 appear, so that the sequence (Xn)n≥1 is an

infinite exchangeable sequence. Thus, by De Finetti’s Theorem (Theorem 5.2),the sequence (Xn)n≥1 is a mixture of Bernoulli random variables with a randomsuccess probability U , and we are left to compute the distribution of U . We alsoobserve that the distribution of depends only on b0, r0, ab, ar through b = b0 + aband r = r0 + ar.

We next verify (5.1.4). For fixed 0 ≤ k ≤ n, there are(nk

)sequences of k ones

and n−k zeros. Each sequence has the same probability given by (5.1.20). Thus,

P(Sn = k) =

(n

k

)∏k−1m=0(b+m)

∏n−k−1j=0 (r + j)∏n−1

t=0 (b+ r + t)

=Γ(n+ 1)

Γ(k + 1)Γ(n− k + 1)× Γ(k + b)

Γ(b)× Γ(n− k + r)

Γ(r)× Γ(b+ r)

Γ(n+ b+ r)

=Γ(b+ r)

Γ(r)Γ(b)× Γ(k + b)

Γ(k + 1)× Γ(n− k + r)

Γ(n− k + 1)× Γ(n+ 1)

Γ(n+ b+ r). (5.1.21)

For k and n− k large, by [Volume 1, (8.3.9)],

P(Sn = k) =Γ(b+ r)

Γ(r)Γ(b)

kb−1(n− k)r−1

nb+r−1(1 + o(1)). (5.1.22)

Taking k = dune (recall (5.1.4))

limn→∞

nP(Sn = dune) =Γ(b+ r)

Γ(r)Γ(b)ub−1(1− u)r−1, (5.1.23)

which is the density of a Beta-distribution with parameters b and r. It is not hardto show from (5.1.23) that (5.1.4) holds with f(u) the right-hand side of (5.1.23)(see Exercise 5.6).


Multiple urn extensions

We next explain how Theorem 5.3 can be inductively extended to urns with sev-eral colors of balls. This will be essential in the analysis of preferential attachmentmodels, where we will need a large number of urns. Assume that we have an urnwith several colors (Ci(n))i∈[`],n≥0. Again, we restrict to the setting where theweight functions in the urn are affine, i.e., there exist (ai)i∈[`] such that

Wi(k) = ai + k. (5.1.24)

We assume that the Polya urn starts with ki balls of color i, and that a ballis drawn according to the weights Wi(Ci(n)) for i ∈ [`], and then replaced by

two balls of the same color. For j ∈ [`], we let a[j,`] =∑`

j=i ai and C[j,`](n) =∑`j=iCi(n). We first view the balls of color 1 and the other colors as a two type

urn. Thus,

C1(n)

na.s.−→ U1, (5.1.25)

where U1 has a Beta distribution with parameters a = k1+a1 and b = k[2,`]+a[2,`].This highlights what the proportion of balls of color 1 is, but it groups all otherballs together as one ‘combined’ color. This combined color takes a proportion1 − U1 of the balls. Now, the times that a ‘combined’ color ball is being drawnagain forms a (multitype) Polya urn scheme, now with the colors 2, . . . , `. Thisimplies that

C2(n)

na.s.−→ U2(1− U1), (5.1.26)

where U2 is independent of U1 and has a Beta distribution with parametersa = k2 + a2 and b = k[3,`] + a[3,`]. Repeating gives that

Ci(n)

na.s.−→ Ui

i−1∏j=1

(1− Uj), (5.1.27)

where Ui is independent of (U1, . . . , Ui−1) and has a Beta distribution with pa-rameters a = ki + ai and b = k[i,`] + a[i,`]. This not only gives an extension ofTheorem 5.3 to urns with multiple colors, but also gives an appealing indepen-dence structure of the various limits.

Applications to relative sizes in scale-free trees

We close this section by discussing applications of Polya urn schemes to scale-free trees. We start at time n = 2 with an initial graph consisting of two verticesof which vertex 1 has degree d1 and vertex 2 has degree d2. Needless to say, inorder for the initial graph to be possible, we need that d1 + d2 to be even, andthe graph may contain self-loops and multiple edges. After this, we successivelyattach vertices to older vertices with probability proportional to the degree plusδ > −1. We do not allow for self-loops in the growth of the trees, so that thestructures connected to vertices 1 and 2 are trees (but the entire structure is notbe when d1 + d2 > 2). This is a generalization of (PA(1,δ)

n (b))∞n=2, in which we


are are more flexible in choosing the initial graph. The model for (PA(1,δ)

n (b))n≥1

arises when d1 = d2 = 2 (see Exercise 5.8).

We decompose the growing tree in two trees. For i = 1, 2, we let Ti(n) be thetree of vertices that are closer to i than to 3− i. Thus, the tree T2(n) consists ofthose vertices for which the path in the tree from the vertex to the root passesthrough vertex 2, and T1(n) consists of the remainder of the scale-free tree. LetSi(n) = |Ti(n)| denote the number of vertices in Ti(n). Clearly, S1(n)+S2(n) = n,which is the total number of vertices of the tree at time n. We can apply Theorem5.3 to describe the relative sizes of T1(n) and T2(n):

Theorem 5.4 (Tree decomposition for scale-free trees) For scale-free trees withinitial degrees d1, d2 ≥ 1, as n→∞,

S1(n)

na.s.−→ U, (5.1.28)

where U has a Beta-distribution with parameters a = (d1 + δ)/(2 + δ) and b =(d2 + δ)/(2 + δ), and

P(S1(n) = k) = E[P(Bin(n− 1, U) = k − 1

)]. (5.1.29)

By Theorem 5.4, we can decompose a scale-free tree into two disjoint scale-freetrees each of which contains a positive proportion of the vertices that convergesalmost surely to a Beta-distribution with parameters a = (d1 + δ)/(2 + δ) andb = (d2 + δ)/(2 + δ).

Proof of Theorem 5.4. The evolution of (S1(n))n≥2 can be viewed as a Polyaurn scheme. Indeed, when S1(n) = s1(n), then the probability of attaching the(n+ 1)st vertex to T1(n) is equal to

(2s1(n) + d1 − 2) + δs1(n)

(2s1(n) + d1 − 2) + δs1(n) + 2(s2(n) + d2) + δs2(n), (5.1.30)

since the number of vertices in Ti(n) equals Si(n), while the total degree of Ti(n)equals (2Si(n) + di − 2). We can rewrite this as

s1(n) + (d1 − 2)/(2 + δ)

s1(n) + s2(n) + (d1 + d2 − 4)/(2 + δ), (5.1.31)

which is equal to (5.1.12) in the case (5.1.13) when r0 = b0 = 1 and ab =(d1−2)/(2+δ), ar = (d2−2)/(2+δ). Therefore, the proof of Theorem 5.4 followsdirectly from Theorem 5.3.

We continue by adapting the above argument to the size of the connectedcomponent of, or subtree containing, vertex 1 in PA(1,δ)

n , which we denote byS′1(n).

Theorem 5.5 (Tree decomposition for preferential attachment trees) For (PA(1,δ)

n )n≥1,as n→∞,

S′1(n)

na.s.−→ U ′, (5.1.32)


where U ′ has a mixed Beta-distribution with parameters a = I + 1 and b =1 + (1 + δ)/(2 + δ), where, for k ≥ 2,

P(I = k) = P(first vertex that is not connected to vertex 1 is vertex k).(5.1.33)

Consequently,

P(S1(n) = k) = E[P(Bin(n− 1, U ′) = k − 1

)]. (5.1.34)

Proof of Theorem 5.5. We note that S′1(n) = n for all n < I, and S′1(I) =I − 1, S′2(I) = 1. For n ≥ I + 1, the evolution of (S′1(n))n≥2 can be viewed as aPolya urn scheme. Indeed, when S′1(n) = s′1(n), then the probability of attachingthe (n+ 1)st vertex to the tree rooted at vertex 1 is equal to

(2 + δ)s′1(n)

(2 + δ)n+ 1 + δ. (5.1.35)

We can rewrite this as

s′1(n)

n+ (1 + δ)/(2 + δ)=

(s′1(n)− I) + I

(n− I) + I + (1 + δ)/(2 + δ), (5.1.36)

which is equal to (5.1.12) in the case (5.1.13) when b0 = r0 = 1 and ab = I, ar =(1+δ)/(2+δ). Therefore, the proof of Theorem 5.5 follows directly from Theorem5.3.

Applications to relative degrees in scale-free trees

We continue by discussing an application of Polya urn schemes to the relativesizes of the initial degrees. For this, we fix an integer k ≥ 2, and only regardtimes n ≥ k at which an edge is attached to one of the k initial vertices. We workwith (PA(1,δ)

n )n≥1, so that we start at time n = 1 with one vertex with one self-loop, after which we successively attach vertices to older vertices with probabilityproportional to the degree plus δ > −1, allowing for self-loops. The main resultis as follows:

Theorem 5.6 (Relative degrees in scale-free trees) For (PA(1,δ)

n )n≥1, as n→∞,

Dk(n)

D[k](n)a.s.−→ Bk, (5.1.37)

where D[k](n) = D1(n) + · · ·+Dk(n) and Bk has a Beta-distribution with param-eters a = 1 + δ and b = (k − 1)(2 + δ).

By Theorem 1.11, Dk(n)n−1/(2+δ) a.s.−→ ξk, where ξk is positive almost surelyby the argument in the proof of [Volume 1, Theorem 8.14]. It thus follows fromTheorem 5.6 that Bk = ξk/(ξ1 + · · ·+ ξk). We conclude that Theorem 5.6 allowsto identify properties of the law of the limiting degrees.


Proof of Theorem 5.6. Denote the sequence of stopping times (τk(n))n≥2k−1, byτk(2k − 1) = k − 1, and

τk(n) = inft : D[k](t) = n, (5.1.38)

i.e., τk(n) is the time where the total degree of vertices 1, . . . , k equals n. Theinitial condition τk(2k − 1) = k − 1 is chosen such that the half-edge incidentto vertex k is already considered to be present at time k − 1, but the receivingend of that edge is not. This guarantees that also the attachment of the edge ofvertex k is properly taken into account.

Since Dj(n)a.s.−→ ∞ as n → ∞ for every j, τk(n) < ∞ for every n. Moreover,

since τk(n)a.s.−→∞,

limn→∞

Dk(n)

D[k](n)= lim

n→∞

Dk(τk(n))

D[k](τk(n))(5.1.39)

= limn→∞

Dk(τk(n))

n.

Now, the random variables((Dk(τk(n)), D[k−1](τk(n)))

)n≥2k−1

form a Polya urn

scheme, with Dk(τk(2k − 1)) = 1, and D[k−1](τk(2k − 1)) = 2k − 2. The edge attime τk(n) is attached to vertex k with probability

Dk(τk(n)) + δ

n+ kδ, (5.1.40)

which are the probabilities of a Polya urn scheme in the linear weight case in(5.1.13) when ab = δ, ar = (k − 1)δ, b0 = 1, r0 = 2(k − 1). Thus, the statementfollows from Theorem 5.3.

Theorem 5.6 is easily extended to (PA(1,δ)

n (b))t≥1:

Theorem 5.7 (Relative degrees in scale-free trees) For (PA(1,δ)

n (b))t≥1, as t →∞,

Dk(n)

D[k](n)a.s.−→ Bk, (5.1.41)

where Bk has a Beta-distribution with parameters a = 1 + δ and b = (2k − 1) +(k − 1)δ.

Of course, the dynamics for (PA(1,δ)

n (b))t≥1 are slightly different. Indeed, nowthe random variables

((Dk(τk(n)), D[k−1](τk(n)))

)n≥2k

form a Polya urn scheme,

with Dk(τk(2k)) = 1, and D[k−1](τk(2k)) = 2k − 1. The edge at time τk(n) isattached to vertex k with probability

Dk(τk(n)) + δ

n+ kδ, (5.1.42)

which are the probabilities of a Polya urn scheme in the linear weight case in(5.1.13) when ab = δ, ar = (k−1)δ, b0 = 1, r0 = 2k−1. Thus, again the statementfollows from Theorem 5.3. See Exercise 5.10 for the complete proof. Remarkably,

5.2 Local weak convergence of preferential attachment models 177

even though (PA(1,δ)

t )n≥1 and (PA(1,δ)

n (b))n≥1 have the same asymptotic degreedistribution, the limiting degrees in Theorems 5.6 and 5.7 are different.

In the following section, we bring the above discussion substantially forwardand study the local weak limit of preferential attachment models.

5.2 Local weak convergence of preferential attachment models

In this section, we study the local weak convergence of preferential attachmentmodels, which is a much more difficult subject. Indeed, it turns out that the localweak limit is not described by a homogeneous Galton-Watson tree, but rather byan inhomogeneous one, leading to multiype branching processes.

5.2.1 Local weak convergence of PAMs with fixed number of edges

In this section, we study local weak limits of preferential attachment models,using Polya urn schemes. We start with (PA(m,δ)

t (d))t≥1, as this model turns outto be the simplest for local weak limits due to its close relation to Polya urnschemes.

Recall that the graph start at time t with two vertices with m edges betweenthem. Let τk be the kth time that an edge is added to either vertex 1 or 2. Therelation to Polya urn schemes can be informally explained by noting that therandom variable k 7→ D1(τk)/(D1(τk) +D2(τk)) can be viewed as the proportionof type 1 vertices in a Polya urn starting with m balls of type 1 and type 2,respectively. Application of a De Finetti Theorem, which relies on exchangeabil-ity, shows that D1(τk)/(D1(τk) + D2(τk)) converges almost surely to a certainBeta-distribution, which we denote by β. What is particularly nice about thisdescription is that the random variable D1(τk) has exactly the same distributionas m plus a Bin(k, β) distribution, i.e., conditionally on β, D1(τk) is a sum of i.i.d.random variables.

In the graph context, the Polya urn description becomes more daunting. How-ever, the description again is in terms of Beta random variables, and the ex-changeable version of the model PA(m,δ)

n (d) can again be gives rather explicitly interms of the arising random variables. Let us now give some details.

Definition of the Polya point graph

We continue by introducing the limiting graph. Let

u =δ

2m, (5.2.1)

and write

χ =1 + 2u

2 + 2u=

m+ δ

2m+ δ, ψ =

1− χχ

=1

1 + 2u=

m

m+ δ. (5.2.2)


The exponent ψ equals ψ = 1/(τ − 2), where τ = 3 + δ/m equals the power-lawexponent of the graph. We will show that asymptotically, the branching processobtained by exploring the neighborhood of a random vertex on in PA(m,δ)

t (d) isgiven by a multitype branching process, in which every vertex has a type that isclosely related to the age of the vertex. To state our main theorem, we introducethis tree.

As for inhomogeneous random graphs and the configuration model, we will seethat the local weak limit of PA(m,δ)

n (d) is again a tree, but it is not homogeneous.Further, each vertex has two different types of children, labelled L and R. Thechildren labeled with L are the older neighbors that one of the m edges that thevertex entered with is connected to, while children labeled R are younger verticesthat used one of their m edges to connect to the vertex. The distinction betweenthe different kind of vertices can be determined by giving each vertex an agevariable. Since we are interested in the asymptotic neighborhood of a uniformvertex, the age of the root, which corresponds to the limit of on/n, is a uniformrandom variable on [0, 1]. In order to describe its immediate neighbors, we haveto describe how many older vertices of type L the root on is connected to, as wellas the number of younger vertices of type R. After this, we again have to describethe number of R and L type children its children has, etc.

Let us now describe these constructs in detail. Let the random rooted tree Tbe defined by its vertices labeled by finite words, as in the Ulam-Harris labellingof trees,

w = w1w2 · · ·wl, (5.2.3)

each carrying a label R or L, which are defined recursively as follows: The root∅ has an age U∅, where U∅ is chosen uniformly at random in [0, 1].

In the recursion step, we assume that w and the corresponding variable xw ∈[0, 1] have been chosen in a previous step. For j ≥ 1, let wj be the jth child ofw, i.e., wj = ∅w1w2 · · ·wlj, and set

m−(w) =

m if w is the root or of type L,

m− 1 if w is of type R.(5.2.4)

Recall that a Gamma distribution Y with parameters r and λ has density

fY (y) = yr−1λre−λy/Γ(r) for y ≥ 0, (5.2.5)

and 0 otherwise. Let Y have a Gamma distribution with parameters r = m + δand λ = 1, and let Y ? be the size-biased version of Y , which has a Gammadistribution with parameters r = m + δ + 1 and λ = 1 (see Exercise 5.11). Wethen take

Γw ∼Y if w is the root or of type R,

Y ? if w is of type L,(5.2.6)

independently of everything else.


Let w1, . . . , wm−(w) be the children of w type L, and let their agesAw1, . . . , Awm−(w)

be given by Awj = U(τ−1)/(τ−2)wj Aw, where (Uwj)

m−(w)j=1 are i.i.d. uniform random

variables on [0, 1] independent of everything else. Further, let (Aw(m−(w))+j))j≥1

be the (ordered) points of a Poisson point process with intensity

ρw(x) = Γw(τ − 1)ψx1/(τ−1)−1

A1/(τ−1)w

(5.2.7)

on [Aw, 1], and the vertices (w(m−(w) + j))j≥1 have type R. The children of ware the vertices wj of type L and R.

Let us discuss the degree structure of the above process. Obviously, there arefinitely many children of type L. Further, note that 1/(τ−1) = m/(2m+δ) > 0, sothe intensity ρw in (5.2.7) of the Poisson process is integrable. Thus every vertexin the random tree has a.s. finitely many children. The above random tree iscoined the Polya-point tree, and the point process (aw)w the Polya-point process.The Polya-point tree is a multitype discrete-time branching process, where thetype of a vertex w is equal to the pair (aw, tw), where aw ∈ [0, 1] corresponds tothe age of the vertex, and tw ∈ L,R is its type. Thus, the type-space of themultitype branching process is continuous.

With the above description in hand, we are ready to state our main resultconcerning local weak convergence of PA(m,δ)

n (d):

Theorem 5.8 (Local weak convergence of preferential attachment models: fixededges) Fix m ≥ 1 and δ > −m. The preferential attachment model PA(m,δ)

n (d)converges locally in probability to the Polya-point tree.

Theorem 5.8 does not include the case where self-loops are allowed. Giventhe robustness of the theorem (which applies to three quite related settings),we strongly believe that a similar result also applies to PA(m,δ)

n . We refer to thediscussion in Section 5.6 for more details, also on the history of Theorem 5.8.

Degree structure of PA(m,δ)n (d)

Before turning to the proof of Theorem 5.8, we use it to describe some propertiesof the degrees of vertices in PA(m,δ)

n (d). This is done in the following lemma:

Lemma 5.9 (Degree sequence of PA(m,δ)

n (d)) Let Do(n) be the degree at time nof a vertex chosen uniformly at random from [n] in PA(m,δ)

n (d). Then,

P(Do(n) = k)→ pk =2m+ δ

m

Γ(m+ 2 + δ + δ/m)

Γ(m+ δ)

Γ(k + δ)

Γ(k + 3 + δ + δ/m), (5.2.8)

and, with D′o(n) be the degree at time n of one of the m older neighbors of o,

P(D′o(n) = k)→ p′k =2m+ δ

m2

Γ(m+ 2 + δ + δ/m)

Γ(m+ δ)

(k −m+ 1)Γ(k + 1 + δ)

Γ(k + 4 + δ + δ/m).

(5.2.9)Further, let Pk(n) denote the proportion of vertices of degree k in PA(m,δ)

n (d), and


P ′k(n) the proportion of older neighbors of vertices of degree k. Then Pk(n)P−→ pk

and P ′k(n)P−→ p′k.

Note that the limiting degree distribution in (5.2.8) is equal to that for PA(m,δ)

n

in (1.3.58), again exemplifying that the details of the model have little influenceon the limiting degree sequence. It is not hard to see from Lemma 5.9 that

pk = cm,δk−τ (1 +O(1/k)), p′k = c′m,δk

−(τ−1)(1 +O(1/k)), (5.2.10)

for some constants cm,δ and c′m,δ and with τ = 3 + δ/m (see Exercise 5.12). Weconclude that there is a form of size-biasing in that older neighbors of a uniformvertex have a limiting degree distribution that again satisfies a power law (likethe degree of the random vertex itself), but with an exponent that is one lowerthan that of the vertex itself. Exercises 5.13–5.15 study the joint distribution(D,D′) and various conditional power laws.

Proof of Lemma 5.9. We note that local weak convergence implies the convergenceof the degree distribution. It thus suffices to study the distribution of the degreeof the root in the Polya graph. We first condition on the age A∅ = U∅ of the rootof the Polya graph, where U∅ is standard uniform. Let D be the degree of theroot. Conditionally on A∅ = a, the degree D is m plus a Poisson variable withparameter

Γ∅

a1/(τ−1)

∫ 1

a

(τ − 1)x1/(τ−1)−1dx = Γ∅1− a1/(τ−1)

a1/(τ−1)≡ Γ∅κ(a), (5.2.11)

where Γ∅ is a Gamma variable with parameters r = m + δ and λ = 1. Thus,taking expectations with respect to Γ∅, we obtain

P(D = k | A∅ = a) =

∫ ∞0

P(D = k | A∅ = a,Γ∅ = y)ym+δ−1

Γ(m+ δ)e−ydy

=

∫ ∞0

e−yκ(a) (yκ(a))k−m

(k −m)!

ym+δ−1

Γ(m+ δ)e−ydy

=κ(a)k−m

(1 + κ(a))k−m+m+δ

Γ(k + δ)

(k −m)!Γ(m+ δ)

= (1− a1/(τ−1))k−ma(m+δ)/(2m+δ) Γ(k + δ)

(k −m)!Γ(m+ δ), (5.2.12)

where we use that κ(a)/(1 + κ(a)) = 1− a1/(τ−1). We thus conclude that

P(D = k) =

∫ 1

0

P(D = k | A∅ = a)da (5.2.13)

=

∫ 1

0

(1− a1/(τ−1))k−ma(m+δ)/(2m+δ) Γ(k + δ)

(k −m)!Γ(m+ δ)da.

We now use the integral transform u = a1/(τ−1), for which da = (τ − 1)u2−τdu,


to arrive at

P(D = k) = (τ − 1)Γ(k + δ)

(k −m)!Γ(m+ δ)

∫ 1

0

(1− u)k−mum+δ+1+δ/mdu

= (τ − 1)Γ(k + δ)

(k −m)!Γ(m+ δ)

Γ(k −m+ 1)Γ(m+ 2 + δ + δ/m)

Γ(k + 3 + δ + δ/m)

= (τ − 1)Γ(k + δ)Γ(m+ 2 + δ + δ/m)

Γ(m+ δ)Γ(k + 3 + δ + δ/m), (5.2.14)

as required.We next extend this to convergence in distribution of D′o(n), for which we

again note that local weak convergence implies the convergence of the degreedistribution of neighbors at distance 1 of the root, so in particular of D′o(n).It thus suffices to study the distribution of the degree of a uniform neighborof the root in the Polya graph. We first condition on the age A∅ = U∅ of theroot of the Polya graph, where U∅ is standard uniform, and recall that the ageA∅1 of one of the m older vertices to which ∅ is connected has distribution

A∅1 = U(τ−1)/(τ−2)∅1 A∅, where U∅1 is uniform on [0, 1]. Let D′ be the degree of

vertex ∅1. Conditionally on A∅1 = b, the degree D′ is m plus a Poisson variablewith parameter

Γ∅1

b1/(τ−1)

∫ 1

b

(τ − 1)x1/(τ−1)−1dx = Γ∅1

1− b1/(τ−1)

b1/(τ−1)≡ Γ∅1κ(b), (5.2.15)

where Γ∅1 is a Gamma variable with parameters r = m+ 1 + δ and λ = 1.Thus, taking expectations with respect to Γ∅1, we obtain as before

P(D′ = k | A∅1 = b) =

∫ ∞0

P(D′ = k | A∅1 = b,Γ∅1 = y)ym+1+δ

Γ(m+ 2 + δ)e−ydy

=

∫ ∞0

e−yκ(b) (yκ(b))k−m

(k −m)!

ym+δ

Γ(m+ 1 + δ)e−ydy

=κ(b)k−m

(1 + κ(b))k−m+m+1+δ

Γ(k + 1 + δ)

(k −m)!Γ(m+ 1 + δ)

=Γ(k + 1 + δ)

(k −m)!Γ(m+ 1 + δ)(1− b1/(τ−1))k−mbm(m+1+δ)/(2m+δ),

where we again use that κ(b)/(1 + κ(b)) = 1− b1/(τ−1). We next use that A∅1 =

U(τ−2)/(τ−1)∅1 A∅, where A∅ is uniform on [0, 1].Recall that the vector (A∅, U∅1) has density 1 on [0, 1]2. Let (A∅, A∅1) =

(A∅, U(τ−2)/(τ−1)∅1 A∅), then (A∅, A∅1) has joint density, on (a, b) : b ≤ a given

by

f(A∅,A∅1)(a, b) =τ − 2

τ − 1a−(τ−2)/(τ−1)b−1/(τ−1). (5.2.16)


We thus conclude that

P(D′ = k) =τ − 2

τ − 1

∫ 1

0

a−(τ−1)/(τ−2)

∫ a

0

b−1/(τ−1)P(D = k | A∅1 = b)dbda

(5.2.17)

=τ − 2

τ − 1

Γ(k + 1 + δ)

(k −m)!Γ(m+ 1 + δ)

×∫ 1

0

a−(τ−2)/(τ−1)

∫ a

0

(1− b1/(τ−1))k−mbm(m+1+δ)/(2m+δ)−1/(τ−1)dbda

=τ − 2

τ − 1

Γ(k + 1 + δ)

(k −m)!Γ(m+ 1 + δ)

×∫ 1

0

a−(τ−2)/(τ−1)

∫ a

0

(1− b1/(τ−1))k−mbm(m+δ)/(2m+δ)dbda

= (τ − 2)(τ − 1)Γ(k + 1 + δ)

(k −m)!Γ(m+ 1 + δ)

∫ 1

0

∫ u

0

(1− v)k−mvm+1+δ+δ/mdvdu,

where we have now used the integral transform u = aτ−1 and v = bτ−1. Inter-changing the integrals over u and v leads to

P(D′ = k) = (τ − 2)(τ − 1)Γ(k + 1 + δ)

(k −m)!Γ(m+ 1 + δ)

∫ 1

0

(1− v)k−m+1vm+1+δ+δ/mdv

(5.2.18)

=2m+ δ

m2

Γ(m+ 2 + δ + δ/m)

Γ(m+ δ)

(k −m+ 1)Γ(k + 1 + δ)

Γ(k + 4 + δ + δ/m),

as required.The statements that Pk(n)

P−→ pk and P ′k(n)P−→ p′k follow from the fact that

Theorem 5.8 states local weak convergence in probability.

The proof of Theorem 5.8 relies crucially on exchangeability and applications ofDe Finetti’s Theorem. The crucial observation is that De Finetti’s Theorem canbe used to give an equivalent formulation of PA(m,δ)

n (d) that relies on independentrandom variables. We explain this now.

Finite-graph Polya version of PA(m,δ)

t (d)

We now explain the finite-graph Polya version of PA(m,δ)

t (d). We start by intro-ducing the necessary notation. Let (ψj)j≥1 be independent random variables witha Beta distribution with parameters λ = m+ δ, r = (2j − 3)m+ δ(j − 1), i.e.,

ψj ∼ Beta(m+ δ, (2j − 3)m+ δ(j − 1)

). (5.2.19)

Here we recall that Y has a Beta distribution with parameters (α, β) when

fY (y) = yα−1(1− y)β−1/B(α, β) for y ∈ [0, 1], (5.2.20)


where B(α, β) = Γ(α)Γ(β)/Γ(α + β) is the Beta-function. We denote this asY ∼ Beta(α, β). Define

ϕ(n)

j = ψj

n∏i=j+1

(1− ψi), S(n)

k =k∑j=1

ϕ(n)

j =n∏

i=k+1

(1− ψi). (5.2.21)

Here the latter equality follows simply by induction on k ≥ 1 (see Exercise 5.16).Finally, let I (n)

k = [S(n)

k−1, S(n)

k ). We now construct a graph as follows:

B Conditioned on ψ1, . . . , ψn, choose (Uk,i)k∈[n],i∈[m] as a sequence of independentrandom variables, with Uk,i chosen uniformly at random from the (random)interval [0, S(n)

k−1];B Join two vertices j and k if j < k and Uk,i ∈ I (n)

j for some i ∈ [m] (with multipleedges between j and k if there are several such i);

B Call the resulting random multi-graph the finite-size Polya graph of size n.

The main result for PA(m,δ)

n (d) is as follows:

Theorem 5.10 (Finite-graph Polya version of PA(m,δ)

n (d)) Fix δ > −m andm ≥ 1. Then, the distribution of PA(m,δ)

n (d) is the same as that of the finite-sizePolya graph of size n.

The nice thing about Theorems 5.8–5.10 is that they also allow for an inves-tigation of the degree distribution and various other quantities of interest in thepreferential attachment model. Exercises 5.17–5.19 use Theorem 5.10 to deriveproperties of the number of multiple edges in PA(m,δ)

n (d) for m = 2.

In terms of the above Polya-point tree, the proof will show that the Gammavariables Y that define the “strength” Γw are inherited from the Beta randomvariables (ψk)k∈[n], while the “position” variables Xw are inherited from the ran-dom variables (S(n)

k )k∈[n] (see Lemmas 5.17 and 5.18 below).

Before moving to the proof of Theorem 5.10, we use it to identify the almostsure limit of degrees of fixed vertices. For this, we recall that χ = (m+δ)/(2m+δ)and we define the following random variable:

Fk = limn→∞

(nk

)χ ∏j=k+1

(1− ψj). (5.2.22)

In Exercise 5.20 you are asked to prove that the almost sure limit in (5.2.22)exists. Then, the limiting distribution of fixed vertices is as follows:

Lemma 5.11 (Degree structure of PA(m,δ)

t (d): fixed vertices) Consider PA(m,δ)

t (d)with m ≥ 1 and δ > −m. Then,

n−(1−χ)Dk(n)d−→ (2m+ δ)kχψkFk. (5.2.23)

We prove Lemma 5.11 below. We also note that 1−χ = m/(2m+ δ) = 1/(2 +δ/m) = 1/(τ − 1), where τ = 3 + δ/m denotes the degree power-law exponentof the preferential attachment model. Thus, Lemma 5.11 confirms the scaling


in [Volume 1, Theorem 8.2], but gives a different description for the limitingvariables. Note further that these limiting random variables are slightly differentin PA(m,δ)

n (d) compared to PA(m,δ)

n .

Let us give some insight into the proof of Theorem 5.10, after which we givetwo full proofs. The first proof relies on Polya urn methods, the second on a directcomputation. For the Polya urn proof, we rely on the fact that there is a closeconnection between the preferential attachment model and the Polya urn modelin the following sense: every new connection that a vertex gains can be representedby a new ball added in the urn corresponding to that vertex. The initial numberof balls in each urn is equal to m. Indeed, recall the discussion above aboutD1(τk)/(D1(τk) +D2(τk)) converging almost surely to a Beta-distribution. Whatis particularly nice about this description is that the random variable D1(τk) hasexactly the same distribution as m plus a Bin(k, β) distribution, i.e., conditionallyon β, D1(τk) is a sum of i.i.d. random variables, this equality in distribution beingvalid for all k. This observation can be extended to give a probabilistic descriptionof (D1(τ ′k(n)), D2(τ ′k(n)), . . . , Dk(τ

′k(n))n≥1 that is valid for all k, n ≥ 1 fixed.

Here τ ′k(n) = inft : D[k](t) = n is the first time where the total degree of thevertices in [k] is equal to n. Note that τ ′2mk = k. Further, the random variables(D1(τ ′k(n)), D2(τ ′k(n)), . . . , Dk(τ

′k(n)))2mn

n=1 determine the law of (PA(m,δ)

n (b))n∈[n]

uniquely. This explains why PA(m,δ)

n (b) can be described in terms of independentrandom variables. The precise description in terms of ψj in (5.2.19) follows fromtracing back the graph construction obtained in this way. We next use this tocomplete the proof of Theorem 5.10:

Let us now make this intuition precise:

Polya urn proof of Theorem 5.10. Let us consider first a two-urn model, withthe number of balls in one urn representing the degree of a particular vertex k,and the number of balls in the other representing the sum of the degrees of thevertices 1, . . . , k − 1. We will start this process at the point when n = k and khas connected to precisely m vertices in [k − 1]. Note that at this point, the urnrepresenting the degree of k has m balls, while the other one has (2k−3)m balls.Consider a time in the evolution of the preferential attachment model when wehave n−1 ≥ k old vertices, and i−1 edges between the new vertex n and [k−1].Assume that at this point the degree of k is dk, and the sum of the degrees ofvertices in [k − 1] is d<k. At this point, the probability that the ith edge from nto [n− 1] is attached to k is

dk + δ

2m(n− 2) + (1 + δ)(i− 1), (5.2.24)

while the probability that it is connected to a vertex in [k − 1] is equal to

d[k−1] + δ(k − 1)

2m(n− 2) + (1 + δ)(i− 1). (5.2.25)

Thus, conditioned on connecting to [k], the probability that the ith edge from nto [n−1] is attached to k is (dk+δ)/Z, while the probability that the ith edge from


n to [n−1] is attached to [k−1] is (d[k−1] +δ(k−1))/Z, where Z = kδ+d[k] is thenormalization constant. Taking into account that the two urns start with m and(2k− 3)m balls, respectively, we see that the evolution of the two bins is a Polyaurn with strengths ψk and 1−ψk, where ψk has the β(m+δ, (2k−3)m+δ(k−1))distribution. We next use this to complete the proof of Theorem 5.10, where weuse induction. Indeed, using the two-urn process as an inductive input, we cannow construct the Polya graph defined in Theorem 5.10 in a similar way as itwas done for Polya urns with multiple colors in (5.1.27).

Let Xt ∈ [dt/me] be the vertex receiving the tth edge in the sequential model(the other endpoint of this edge being the vertex dt/me + 1). For t ≤ m, Xt isdeterministic (and equal to 1), but starting at t = m + 1, we have a two-urnmodel, starting with m balls in each urn. As shown above, the two urns can bedescribed as Polya urns with strengths 1−ψ2 and ψ2. Once t > 2m, Xt can takethree values, but conditioned on Xt ≤ 2, the process continues to be a two-urnmodel with strengths 1 − ψ2 and ψ2. To determine the probability of the eventthat Xt ≤ 2, we now use the above two-urn model with k = 3, which gives thatthe probability of the event Xt ≤ 2 is 1−ψ3, at least as long as t ≤ 3m. Combiningthese two-urn models, we get a three-urn model with strengths (1− ψ2)(1− ψ3),ψ2(1 − ψ3) and ψ3. Again, this model remains valid for t > 3m, as long as wecondition on Xt ≤ 3. Continuing inductively, we see that the sequence Xt evolvesin stages:

B For t ∈ [m], the variable Xt is deterministic: Xt = 1.

B For t = m+1, . . . , 2m, the distribution of Xt ∈ 1, 2 is described by a two-urnmodel with strengths 1− ψ2 and ψ2, where ψ2 ∼ B2.

B In general, for t = m(k−1)+1, . . . , km, the distribution of Xt ∈ [k] is describedby a k-urn model with strengths

ϕ(k)

j = ψj

k∏i=j+1

(1− ψi), j = 1, . . . , k. (5.2.26)

Here ψk is chosen at the beginning of the kth stage, independently of thepreviously chosen strengths ψ1, . . . , ψk−1 (for convenience, we set ψ1 = 1).

Note that the random variables ϕ(k)

j can be expressed in terms of the randomvariables introduced in Theorem 5.10 as follows. By (5.2.21), S(n)

k =∏nj=k+1(1−

ψj). This implies that φ(n)

j = ψj/S(n)

j , which relates the strengths φ(n)

j to therandom variables defined right before Theorem 5.10, and shows that the processderived above is indeed the process given in the theorem.

We next give a direct proof of Theorem 5.10, that is of independent interest:

Direct proof of Theorem 5.10. We continue with the direct proof of Theorem 5.10.In what follows, we let PA′n denote the law of the finite-size Polya graph of size n.Our aim will be to show that P(PA′n = G) = P(PA(m,δ)

n (d) = G). Here, we thinkof G as being a directed graph, where every vertex has out-degree m and the


out-edges are labeled as [m]. Thus, the out-edges correspond to the edges fromyoung to old. For this, on the one hand, we compute directly that

P(PA(m,δ)

n (d) = G) =∏

u∈[n],j∈[m]

d(G)

vj(u)(u) + δ

2m(u− 2) + j − 1 + δ(u− 1)(5.2.27)

=

d(G)s −m∏i=0

(i+m+ δ)∏

u∈[n],j∈[m]

1

2m(u− 2) + j − 1 + δ(u− 1).

To identify P(PA′n = G), it will be convenient to condition on the Beta variables(ψj)j∈[n]. We denote the conditional measure by Pn, i.e., for every event E ,

Pn(E) = P(E | (ψj)j∈[n]). (5.2.28)

The advantage of this measure is that now edges are conditionally independent,which allows us to give exact formulas for the probability of a certain graphoccurring. We start by computing the edge probabilities under Pn:

Lemma 5.12 (Edge probabilities in PA′n conditionally on Beta’s) Fix m ≥ 2and δ > −m, and consider PA′n. For any u > v and j ∈ [m],

Pn(uj v) = ψv(1− ψ)(v,u), (5.2.29)

where, for A ⊆ [n],

(1− ψ)A =∏i∈A

(1− ψi). (5.2.30)

Proof Recall the construction between (5.2.19) and Theorem 5.10. When wecondition on (ψj)j∈[n], the only randomness left is that in the uniform random

variables (Uk,i)k∈[n],i∈[m], where Uk,i is uniform on [0, S(n)

k−1]. Then, uj v occurs

precisely when Uu,j ∈ I (n)

v , which occurs with conditional Pn-probability

|I (n)

v |S(n)

u−1

. (5.2.31)

Now we note that

|I (n)

v | = S(n)

v − S(n)

v−1 = ψv

n∏i=v+1

(1− ψi) = ψv(1− ψ)(v,n], (5.2.32)

while, from (5.2.21),

S(n)

u−1 =n∏i=u

(1− ψi) = (1− ψ)[u,n]. (5.2.33)

Taking the ratio yields, with u > v,

Pn(uj v) = ψv

(1− ψ)(v,n]

(1− ψ)(u−1,n]

= ψv(1− ψ)(v,u), (5.2.34)

as required.


In the finite-graph Polya graph PA′n, different edges are conditionally indepen-dent, so that we obtain the following corollary:

Corollary 5.13 (Graph probabilities in PA′n conditionally on Beta’s) Fix m ≥ 2and δ > −m, and consider PA′n. For any G,

Pn(PA′n = G) =n∏s=1

ψass

n∏s=1

(1− ψ)bss , (5.2.35)

where as = a(G)

s and bs = b(G)

s are given by

as = d(G)

s −m, bs =∑u,j

1s∈(vj(u),u) (5.2.36)

where E(G) = (vj(u), u) : u ∈ [n], j ∈ [m], so that vj(u) < u is the vertex towhich the jth edge of u is attached in G.

Proof Multiply the factors Pn(uj vj(u)) for every edge (u, vj(u)) ∈ E(G), and

collect the powers of ψs and 1− ψs.We note that as equals the number of edges in the graph G that point towards

s. This is relevant, since in (5.2.29) in Lemma 5.12, every older vertex v in anedge receives a factor ψv. Further, again by (5.2.29) in Lemma 5.12, there arefactors 1 − ψs for every s ∈ (v, u) and all edges (u, v), so bs counts how manyfactors 1− ψs occur.

We see in Corollary 5.13 that we obtain expectations of the form E[ψa(1−ψ)b],where ψ ∼ Beta(α, β) and a, b ≥ 0. These are computed in the following lemma:

Lemma 5.14 (Expectations of powers of Beta random variables) For all a, b ∈N and ψ ∼ Beta(α, β),

E[ψa(1− ψ)b] =(α+ a− 1)a(β + b− 1)b(α+ β + a+ b− 1)a+b

, (5.2.37)

and where we recall that (x)m = x(x− 1) · · · (x−m+ 1) denotes the mth fallingfactorial of x.

Proof A direct computation based on the density of a Beta random variable in(5.2.20) yields that

E[ψa(1− ψ)b] =B(α+ a, β + b)

B(α, β)=

Γ(α+ β)

Γ(α)Γ(β)

Γ(α+ a)Γ(β + b)

Γ(α+ β + a+ b)(5.2.38)

=(α+ a− 1)a(β + b− 1)b(α+ β + a+ b− 1)a+b

.

The above computation, when applied to Corollary 5.13, leads to the followingexpression for the probability of observing a particular graph G:


Corollary 5.15 (Graph probabilities in PA′n) Fix m ≥ 2 and δ > −m, andconsider PA′n. For any G,

P(PA′n = G) =n−1∏s=1

(α+ as − 1)as(βs + bs − 1)bs(α+ βs + as + bs − 1)as+bs

, (5.2.39)

where α = m+δ, βs = (2s−3)m+δ(s−1) and as = a(G)

s and bs = b(G)

s are definedin (5.2.36).

Note that the contribution for s = n equals 1, since a(G)

n = d(G)

n −m = 0 almostsurely in PA′n. Corollary 5.15 allows us to complete the direct proof of Theorem5.10:

Corollary 5.16 (Graph probabilities in PA′n and PA(m,δ)

n (d)) Fix m ≥ 2 andδ > −m, and consider PA′n and PA(m,δ)

n (d). For any G,

P(PA′n = G) = P(PA(m,δ)

n (d) = G), (5.2.40)

where α = m+δ, βs = (2s−3)m+δ(s−1) and as = a(G)

s and bs = b(G)

s are definedin (5.2.36). Consequently, Corollaries 5.13 and 5.15 also hold for PA(m,δ)

n (d).

Proof We next evaluate (5.2.40) in Corollary 5.15 explicitly. Since α = m + δand as = d(G)

s −m,

(α+ as − 1)as =

d(G)s −m∏i=0

(i+m+ δ), (5.2.41)

which is promising. We next identify the other terms, for which we start byanalyzing b(G)

s as

b(G)

s =∑u,j

1s∈(vj(u),u) =∑u,j

1s∈(vj(u),n] − 1s∈(u,n] (5.2.42)

=∑u,j

1vj(u)∈[s−1] − 1u∈[s−1] = d(G)

[s−1] − 2m(s− 1),

since∑

u,j 1vj(u)∈[s−1] = d(G)

[s−1] −m(s− 1). Thus,

(βs + bs − 1)bs = ((2s− 3)m+ δ(s− 1) + d(G)

[s−1] − 2m(s− 1)− 1)d

(G)

[s−1]−2m(s−1)

= (δ(s− 1) + d(G)

[s−1] −m− 1)d

(G)

[s−1]−2m(s−1)

, (5.2.43)

while, since as+bs = d(G)

s −m+d(G)

[s−1]−2m(s−1) = d(G)

[s] −2ms+m and α = m+δ,

(α+ βs + as + bs − 1)as+bs (5.2.44)

= (m+ δ + (2s− 3)m+ δ(s− 1) + d(G)

[s] − 2ms+m− 1)d

(G)

[s]−2ms+m

= (δs+ d(G)

[s] −m− 1)d

(G)

[s]−2ms+m

= (δs+ 2ms−m− 1)m(δs+ d(G)

[s] −m− 1)d

(G)

[s]−2ms

.

5.3 Proof local convergence for preferential attachment models 189

Therefore,

n−1∏s=1

(βs + bs − 1)bs(α+ βs + as + bs − 1)as+bs

(5.2.45)

=n−1∏s=1

1

(δs+ 2ms−m− 1)m

n−1∏s=1

(δ(s− 1) + d(G)

[s−1] −m− 1)d

(G)

[s−1]−2m(s−1)

(δs+ d(G)

[s] −m− 1)d

(G)

[s]−2ms

=n−1∏s=1

1

(δs+ 2ms−m− 1)m,

since the second factor is telescoping with starting and ending value equal to 1.The proof follows, with s = u− 1,

n−1∏s=1

(δs+ 2ms−m− 1)m =∏u∈[n]

∏i∈[m]

(δ(u− 1) + 2m(u− 1)−m− i− 1)m

=∏

u∈[n],j∈[m]

(2m(u− 2) + j − 1 + δ(u− 1)), (5.2.46)

where j = m− i, as required.

5.3 Proof local convergence for preferential attachment models

Asymptotics of (ψk)k∈[n] and (S(n)

k )k∈[n]

We next continue to analyse the random variables in Theorem 5.10 to prepareus for proving local convergence in Theorem 5.8. We start by analyzing ψk for klarge:

Lemma 5.17 (Gamma asymptotics of Beta variables) As k →∞, kψkd−→ Y ,

where Y has a Gamma distribution with r = m+ δ and λ = 2m+ δ.More precisely, take fk(x) such that P(ψk ≤ fk(x)) = P(χk ≤ x), where χk has a

Gamma distribution with r = m + δ and λ = 1 (so that Yd= X/(2m + δ). For

every ε > 0, there exists K = Kε ≥ 1 sufficiently large. Then, for all k ≥ K andx ≤ (log k)2,

1− εk(2m+ δ)

x ≤ fk(x) ≤ 1 + ε

k(2m+ δ)x. (5.3.1)

Further, χk ≤ (log k)2 for all k ≥ K and with probability at least 1− ε.


Proof Fix x ≥ 0. We compute that

P(kψk ≤ x) =Γ(m+ δ + (2k − 3)m+ δ(k − 1)

)Γ(m+ δ

)Γ((2k − 3)m+ δ(k − 1)

) (5.3.2)

×∫ x/k

0

um+δ−1(1− u)(2k−3)m+δ(k−1)−1du

= (1 + o(1))[(2m+ δ)k]m+δ

Γ(m+ δ

) k−(m+δ)

∫ x

0

um+δ−1(1− u/k)(2k−3)m+δ(k−1)−1du.

For every u > 0, (1 − u/k)(2k−3)m+δ(k−1)−1 → e−u(2m+δ), so that dominated con-vergence implies that

P(kψk ≤ x)→∫ x

0

(2m+ δ)m+δum+δ−1

Γ(m+ δ

) e−u(2m+δ)du, (5.3.3)

as required. We refrain from proving the other statements, that are more technicalversions of the above argument.

By Lemma 5.17, we see that indeed the Beta random variables (ψk)k∈[n] giverise to the Gamma random variables in (5.2.6). This explains the relevance of theGamma random variables in the Polya-point graph.

We continue by analyzing the asymptotics for the random variables (S(n)

k )k∈[n]:

Lemma 5.18 (Asymptotics of S(n)

k ) Recall that χ = (m + δ)/(2m + δ). Forevery ε > 0, there exists η > 0 and K < ∞ such that for all n ≥ K and withprobability at least 1− ε,

maxk∈[n]

∣∣∣S(n)

k −(kn

)χ∣∣∣ ≤ η, (5.3.4)

and

maxk∈[n]\[K]

∣∣∣S(n)

k −(kn

)χ∣∣∣ ≤ ε(kn

)χ. (5.3.5)

Proof We give the intuition behind Lemma 5.18. We recall from (5.2.21) thatS(n)

k =∏ni=k+1(1 − ψi), where (ψk)k∈[n] are independent random variables. We

start by investigating S′n =∏ni=1(1− ψi), so that S(n)

k = S′k/S′n. We write

logS′n =n∑i=1

log(1− ψi). (5.3.6)

Thus, using Lemma 5.17 and with (χi)i≥1 an i.i.d. sequence of Gamma random

variables with parameters r = m+ δ and using that kψkd−→ χk/(2m+ 1),

logS′n ≈ −n∑i=1

ψi ≈ −1

2m+ δ

n∑i=1

χi/i (5.3.7)

≈ − 1

2m+ δ

n∑i=1

E[χi]

i=

m+ δ

2m+ δlog n = χ log n.


Thus, Sk = S′k/S′n ≈

(kn

)χ, as required. The proof can be completed using mar-

tingale techniques and is omitted here.

We use the above analysis to complete the proof of Lemma 5.11:Proof of Lemma 5.11. We use a conditional second moment method on Dk(n),conditionally on (ψk)k∈[n]. Denote

F (n)

k =(nk

)χ ∏j=k+1

(1− ψj). (5.3.8)

Then,

E[Dk(n) | (ψk)k∈[n]

]= m

n∑`=k+1

ϕ(`)

k , (5.3.9)

and

Var(Dk(n) | (ψk)k∈[n]

)≤ m

n∑`=k+1

ϕ(`)

k . (5.3.10)

It thus suffices to show that n1−χm∑n

`=k+1 ϕ(`)

k

a.s.−→ m1−χk

χψkFk. For this, we note

that since ϕ(`)

k = ψk∏`j=k+1(1− ψj), we arrive at

mn∑

`=k+1

ϕ(`)

k = mψk

n∑`=k+1

(k`

)χF (`)

k = mkχψk

n∑`=k+1

`−χF (`)

k . (5.3.11)

Since χ ∈ (0, 1) and F (`)

k

a.s.−→ Fk as `→∞, we obtain that

n1−χn∑

`=k+1

`−χF (`)

k

a.s.−→ Fk/(1− χ). (5.3.12)


Regularity properties of the Polya-point tree and overview

Due to Lemma 5.18, in the proof of Theorem ,it will be convenient to deal with‘locations’ rather than with ages of vertices, to indicate their type. For a vertex w,let Xw = Aχw denote its location. Then, we see in Lemma 5.18 that S(n)

k ≈ Xw/n.This will imply that the location of the vertex wj, for j ∈ [m−(w)], is uniformon [0, Xw], which is a nice property. This makes the analysis a little easier attimes. Since the type space in terms of ages can be revocered from that in termsof locations, the two multi-type branching processes can be retrieved from oneanother.

We next discuss some properties of the Polya-point tree, showing that therooted tree T is well defined and, within the r-neighborhood of the root, allrandom variables used in its description are uniformly bounded:

Lemma 5.19 (Regularity Polya-point tree) Fix r ≥ 1 and ε > 0. Then thereexist constants η > 0 and K < ∞ such that, with probability at least 1 − ε,


Br(∅) ≤ K, Xw ≥ η for all w ∈ Br(∅), and further Γw ≤ K, ρw(·) ≤ K for allw ∈ Br(∅), while finally, minw,w′∈Br(∅) |Xw −Xw′ | ≥ η.

Proof The proof of this lemma is standard, and can, for example, be obtained byinduction on r. The last bound follows from the continuous nature of the randomvariables Xw, which implies that Xw 6= Xw′ for all distinct pairs w,w′, whichimplies that any finite number will be with probability at least 1 − ε separatedby at least δ.

Let us describe how the proof of Theorem 5.8 is organised. Similarly to theproofs of Theorems 3.11 and 4.1, we will investigate the number of r-neighborhoodsof a specific shape t, using a second moment method. Let

Nn,r(t) =∑v∈[n]

1B(n)r (v)'t, (5.3.13)

where B(n)

r (vn) is the r-neighborhood of v in PA(m,δ)

n (d). With Br(∅) the r-neighborhood of ∅ in the Polya-point graph T , we will show that

Nn,r(t)

nP−→ P(Br(∅) ' t). (5.3.14)

Proving convergence of is much harder than for Theorems 3.11 and 4.1, as thetype of a vertex is crucial in determining the number and types of its children,and the type space is continuous. Thus, we need to take the types of vertices intoaccount, so as to make an inductive analysis in r possible in which we coupleboth the neighborhoods as well as their types. We start with the first moment,for which we note that

E[Nn,r(t)/n] = P(B(n)

r (on) ' t). (5.3.15)

We do this using a coupling approach, by showing that we can couple B(n)

r (on),together with the types of the vertices involved, with Br(∅) such that the differ-ences are quite small. After that, we use this as the starting point of a recursiveanalysis in r, which is the second step in the analysis. We complete the proofby studying the second moment of the number of r-neighborhoods of a specificshape (t). Let us now give the details of the convergence of the expected numberof one-neighborhoods of a uniform point.

Integrate Proposition 5.20 more fluently in the text!

In what follows, it will be useful to regard Br(oi) as a rooted graph, where thevertices receive labels in [n], and the edges receive labels in [m] correspondingto the label of the directed edge that gives rise to the edge (in either possible

direction). Thus, the edge u, v receives label j when uj v or when v

j u.

Let t be a rooted vertex- and edge-labelled tree of height exactly r, where thevertex labels are in [n] and the edge labels in [m]. We write Br(o) = t to denotethat the vertices, edges and edge labels in Br(o) are given by those in t. Note


that this is rather different from Br(o) ' t as defined in Definition 2.3, where tis unlabeled and we investigate the isomorphism of Br(o) and t.

Let V (t) denote the vertex labels in t. Also, let ∂V (t) denote the vertices atdistance exactly r from the root of t, and let V (t) = V (t) \ ∂V (t) denote therestriction of t to all vertices at distance at most r − 1 from its root. With thisnotation in hand, we have the following characterization of the conditional lawof Br(o):

Proposition 5.20 (Law of neighborhood in PA(m,δ)

n (d)) Fix m ≥ 2 and δ > −m,and consider PA(m,δ)

n (d). Let t be a rooted vertex- and edge-labelled tree with rooto. Then, as n→∞,

P(Br(o) = t | (ψv)v∈V (t)

)= (1 + oP(1))

∏v∈V (t)

ψa′vv e−(2m+δ)vψv(1−(v/n)1−χ)(1− ψv)b

′v

×∏

s∈[n]\V (t)

(α+ a′s − 1)a′s(βs + b′s − 1)b′s(α+ βs + a′s + b′s − 1)a′s+b′s

,

(5.3.16)

where

a′s = 1s∈V (t)∑

u∈V (t)

1u∼s,u>s, (5.3.17)

b′s =∑

u,v∈V (t)

1uju v

1s∈(v,u]. (5.3.18)

Proof We start by analyzing the conditional law of Br(o) given all (ψv)v∈[n].After this, we will take an expectation w.r.t. ψv for v 6∈ Br−1(o) to get to theclaim.

Computing the conditional law of Br(o) given (ψv)v∈[n]

We start by introducing some useful notation. Define, for u > v, the edge proba-bility

p(u, v) = ψv∏

s∈(v,u]

(1− ψs). (5.3.19)

We first condition on all (ψv)v∈[n] to obtain, for a trees t,

Pn(Br(o1) = t) =∏

v∈V (t)

ψa′vv

n∏s=1

(1− ψs)b′s (5.3.20)

×∏

v∈V (t)

∏u,j : u

j

6 v

[1− p(u, v)],

where the first term is due to all the required edges in Br(o) = t, while the secondterm is due to all other edges that are not allowed to be there.


The no-further-edge probability

We continue by analyzing the second line in (5.3.20), which for simplicity, wecall the no-further-edge probability. First of all, since we are exploring the r-neighborhood of o, the only edges that we need to require not to be there are

of the form uj v where v ∈ V (t) and u > v, i.e., younger vertices than those

in V (t) that do not form edges in t. Since p(u, v) is small for all v ∈ V (t) andu > v, and there are only finitely many elements in V (t), we can approximate∏

v∈V (t)

∏u,j : u

j

6 v

[1− p(u, v)] = (1 + oP(1))∏

v∈V (t)

∏u,j : u>v

[1− p(u, v)]. (5.3.21)

Recall (5.3.61). We can approximate∏u,j : u>v

[1− p(u, v)] = eΘ(1)∑u,j : u∈(v,n] p(u,v)2

exp(∑

u,j : u>v

p(u, v)). (5.3.22)

We compute ∑u,j : u∈(v,n]

p(u, v) = mψv∑

u∈(v,n]

∏s∈(v,u]

(1− ψs) (5.3.23)

= mψv∑

u∈(v,n]

S(n)

u

S(n)v.

We recall that nχS(n)

ns

a.s.−→Ms−χ. Further, u 7→ S(n)

u is decreasing, and s 7→Ms−χ

is continuous, so that also ∑u∈(sn,n]

nχS(n)

u

a.s.−→∫ 1

s

t−χsχds. (5.3.24)

We conclude that

m

snψsn

∑u∈(sn,n]

p(u, v)a.s.−→ m

s

∫ 1

s

t−χsχds =m

1− χ [1− s1−χ] (5.3.25)

= (2m+ δ)[1− s1−χ].

Since (vψv)v∈V (t) converges in distribution to a sequence of independent Gammarandom variables, ∑

u,j : u∈(v,n]

p(u, v)2 P−→ 0. (5.3.26)

Therefore, ∏v∈V (t)

∏u,j : u

j

6 v

[1− p(u, v)] (5.3.27)

= (1 + oP(1))∏

v∈V (t)

e−(2m+δ)vψv(1−(v/n)1−χ).


Conclusion of the proof

We next take the expectation w.r.t. ψs for all s 6∈ V (t) to obtain

P(Br(o) = t | (ψj)j∈[n]\V (t)

)(5.3.28)

= (1 + oP(1))∏

v∈V (t)


′v

×∏

s∈[n]\(V (t))


,

as required. We further note that a′s ∈ 0, 1 for all s ∈ [n] and that a′s = 0 unlesss ∈ V (t), so that a′s = 1 can only occur for s ∈ ∂V (t).

Exploration of PA(m,δ)n (d): one-neighborhood

In this step, we discuss how we can couple the one-neighborhood of a uniformlychosen vertex to that in T , so as to prove Theorem 5.8 for r = 1. This will serve asthe starting point for an inductive analysis, and will give insight into the methodof proof. The main result states that we can couple B(n)

1 (on) in PA(m,δ)

n (d) andB1(∅) in T in such a way that all the variables involved are close:

Lemma 5.21 (Coupling one-neighborhood of uniform vertex in PA(m,δ)

n (d) andPolya-point tree) Fix ε > 0. Let on be chosen uniformly at random from [n].Let

von1, . . . , von(m+qon )

be the neighbors of on, ordered such that the edge on, voni was created beforethe edge on, von(i+1). Then, there exist η > 0 and N = N(ε) < ∞ such that,with probability at least 1−ε, B(n)

1 (on), where on = v∅, together with the variables(S(n)

v∅i)i≥1, and B1(∅), together with its positions (X∅i)i≥1, can be coupled such

that

(i) B(n)

1 (on) ' B1(∅) and |B1(∅)| ≤ K;(ii) |X∅k − S(n)

v∅k| ≤ η for all k for which ok ∈ B1(∅);

(iii) v∅1, . . . , v∅(m+q∅) are all distinct and v∅k ≥ ηn for all v∅k ∈ B(n)

1 (on);(iv) Γ∅k ≤ K for all k such that ∅k ∈ B1(∅).

Consequently, P(B(n)

1 (on) ' t)→ P(B1(∅) ' t).

We start with Lemma 5.21, as this proof will share many ingredients with theproof for the general r ≥ 1, while its notation is relatively simple. Below, we willextend it to general r ≥ 1.

Proof We start by proving (i)–(ii). Choose U∅ uniform in [0, 1], let X∅ = Uχ∅,

and let X∅1, . . . , X∅(m+q∅) be the positions of the children of ∅ in (T ,∅). Defineon = dnU∅e, and define v∅i for i ∈ [m] by

S(n)

v∅i−1 ≤X∅i

X∅S(n)

v∅≤ S(n)

v∅i. (5.3.29)


By Theorem 5.10 and the observation that Uv∅i = X∅i/X∅ is a collection of i.i.d.uniform random variables on [0, 1], it follows that whp v∅i ∈ n[X∅i− δ,X∅i + δ],as required. In more detail, given ε > 0, choose δ > 0 and K > 0 such that theregularity properties in Lemma 5.19 hold for r = 1. By Lemma 5.18, for n largeenough,

|S(n)

v∅−X∅| ≤ η, and |S(n)

v∅i−X∅i| ≤ η ∀i ∈ [m], (5.3.30)

with probability at least 1− 2ε.We continue by investigating the limiting distribution of the remaining neigh-

bors v∅1, . . . , v∅(m+q(n)∅ )

. Along the way, we will prove that q(n)

∅ = q∅ whp. Note

that by Theorem 5.10 and conditionally on (ψk)k∈[n], each vertex v > v∅ has mindependent chances of being connected to v∅. Let Xv,i be the vertex to whichthe ith edge of v is attached. Then, the events Xv,i = v∅ for i ∈ [m] are,conditionally on (ψk)k∈[n], independent events with success probability

Pv→v∅ =ϕ(v−1)

v∅

S(v−1)

v−1

=S(n)

v∅

S(n)

v−1

ψv∅ , (5.3.31)

where the latter equality follows by (5.2.21).Denote

Nv∅(y) =ny∑

v=v∅+1

m∑i=1

1Xv,i=v∅. (5.3.32)

Our aim is to show that, conditionally on U∅,(Nv∅(y)

)y∈v∅/n

converges to a

Poisson process on [U∅, 1]. For this, we first analyze the asymptotics of the successprobabilities Pv→v∅ . By Lemma 5.19, v∅ ≥ nη with probability at least 1 − ε.Recall that ψk = fk(χk), where χk has a Gamma distribution with parametersλ = 1 and r = m+ δ. Thus, we may apply Lemmas 5.17 and 5.18, which yield

(1− η)Pv→v∅ ≤ Pv→v∅ ≤ (1 + η)Pv→v∅ , (5.3.33)

where

Pv→v∅ =χv∅

(2m+ δ)n

n

v∅

(v∅v

)χ. (5.3.34)

Letting Nv∅(y) =∑ny

v=v∅+1

∑mi=1 1Xv,i=v∅, where (Xv,i)v,i are conditionally in-

dependent given (ψk)k≥1 (or, equivalently, given (χk)k≥1, we can couple Nv∅(y)to a Poisson random variable with parameter

χv∅2(1 + δ/m)U∅

∫ y

0

(U∅

s

)χds (5.3.35)

on [U∅, 1] such that the random variables are distinct with vanishing probability.

Using that χv∅ = Γv∅/(2m + δ) proves that Nv∅(xχ) converges to a Poissonrandom variable with parameter

χv∅

Uψ∅

∫ x

0

ψsψ−1ds =Γ∅

Uψ∅

∫ x

0

ψsψ−1ds, (5.3.36)


on [U∅, 1]. It is not hard to extend this result to the statement that(Nv∅(xχ)

)x∈[U∅,1]

converges in distribution to an inhomogeneous Poisson process with intensityΓ∅ψx

ψ−1/Uψ∅ = ρ∅(x), as required. In particular, this shows that q(n)

∅ = q∅ withprobability at least 1− ε.

We conclude the proof of (i)–(ii) by investigating the differences between X∅iand S(n)

v∅iusing Lemma 5.18 to bound the difference between S(n)

v∅iand (v∅i/n)χ,

which yields that for n large enough and with probability at least 1−3ε, q(n)

∅ = q∅and the bounds q0 +m ≤ N , χv∅ = Γv∅ ≤ K and

|X∅k − S(n)

v∅k| ≤ η for all i ∈ [m+ q0] \ [q] (5.3.37)

hold, as required. This completes the proof of parts (i)–(ii).

For (iii), we note that by (5.3.30) and (5.3.37), and another application ofLemma 5.18, there exists an η > 0 such that v∅k ≥ ηn with probability at least1− 4ε and |v∅k − v∅l ≥ ηn for all k, l ≤ m+ q∅. As a result, (v∅k)k∈[m+q∅] are alldistinct. This completes the proof of (iii).

For (iv), we note that the claimed bounds are obviously true when the sequence(χv∅k)k∈[m+q0] would be independent. However, this is not true, since the fact thatv∅k is a neighbor of v∅ ‘tilts’ the distribution of the above random variables. Wewill provide a coupling between (χv∅k)k∈[m+q0] and a collection of independentrandom variables. Let us now provide the details.

We assume that Γ∅ = χv∅ ≤ K, that q(n)

∅ = q∅ and that (v∅k)k∈[m+q0] areall distinct with mink∈[m+q0] v∅k ≥ ηn. Let E denote the event that v∅ is theuniform random vertex and that the neighbors of v∅ are (v∅k)k∈[m+q0]. Let χE,γ∅

denote the conditional distribution of (χk)k∈[n]\v∅ conditioned on Γ∅ = γ∅ andE . Further, let (χk)k∈[n] denote an independent collection of random variableswith distribution

χv∅k ∼ Y ?v∅k∀k ∈ [m], χk ∼ Yk ∀k ∈ [n] \ [m]. (5.3.38)

Let ρ(· | E , χv∅ = γ∅) denote the density of the (multi-dimensional) randomvariable (χk)k∈[n], and let P be the joint distribution of PA(m,δ)

n (d) and the randomvariables (χk)k∈[n]. By Bayes’ Theorem,

ρ(· | E , χv∅ = γ∅) =P(E | ·, χv∅ = γ∅)

P(E | χv∅ = γ∅)ρ∅(·), (5.3.39)

where ρ∅(·) is the original density of the random variables (χk)k∈[n]\v∅. Wedenote the corresponding probability distributions and expectations as P∅ andE∅, respectively. We conclude that we have to compare P(E | ·, χv∅) to P(E |·, χv∅). Since we are interested in the distribution of (χk)k∈[n], this amounts tocomparing P(E | (χk)k∈[n]) to P(E | χv∅). For this, we need to compute P(E |(χk)k∈[n]) (and note that P(E | χv∅) = E∅[P(E | (χk)k∈[n])]). This is our next step.


By Theorem 5.10, we can explicitly compute

P(E | (χk)k∈[n]) = m!m∏i=1

Pv∅→v∅i

q∅∏j=1

mPv∅j→v∅(1− Pv∅j→v∅

)m−1(5.3.40)

×∏

v>v∅ : v 6∈v∅(m+1),...v∅(m+q)

(1− Pv→v∅

)m,

by the conditional independence given (ψk)k∈[n], which is equivalent to conditionalindependence given (χk)k∈[n]. We rewrite

P(E | (χk)k∈[n]) = m!m∏i=1

Pv∅→v∅i

q∅∏j=1

mPv∅j→v∅1− Pv∅j→v∅

∏v>v∅

(1− Pv→v∅

)m, (5.3.41)

where Pv→v′ is given, and using (5.2.21), by

Pv→v′ =ϕ(v−1)

v′

S(v−1)

v−1

=S(n)

v′

S(n)

v−1

ψv′ . (5.3.42)

We now perform asymptotics of this formula. We use Lemma 5.18 to show that wecan for every η > 0, there exists n0 sufficiently large, such that, with probabilityat least 1− ε,

(1− η)P(E | (χk)k∈[n]) (5.3.43)

≤ m!m∏i=1

ψv∅( v∅v∅i

)χ q∅∏j=1

mψv∅(v∅jv∅

)χe−mψv∅

∑v>v∅

(v∅v

)χ≤ (1 + η)P(E | (χk)k∈[n]).

To bound P(E | χv∅) = E∅[P(E | (χk)k∈[n])], we also need a good bound on“bad” event, for which we use the deterministic upper bound

P(E | (χk)k∈[n]) ≤ m!m∏i=1

Pv∅→v∅i

q∅∏j=1

mPv∅j→v∅ ≤ m!(mψv∅)qm∏i=1

ψv∅i (5.3.44)

≤ C ′m!m∏i=1

ψv∅( v∅v∅i

)χ q∅∏j=1

mψv∅(v∅jv∅

)χe−mψv∅

∑v>v∅

(v∅v

)χ,

where C ′ = η−(m+N) supn≥1 emnfδn(K) <∞.We conclude that with probability at least 1− ε/2 with respect to P∅,√

1− ηm∏i=1

ψv∅iE∅[ψv∅i ]

≤ P(E | (χk)k∈[n]

P(E | χ∅)≤√

1 + ηm∏i=1

ψv∅iE∅[ψv∅i ]

. (5.3.45)

With the help of Lemma 5.17, with probability at least 1− ε,

(1− η)m∏i=1

χv∅iE∅[χv∅i ]

≤ P(E | (χk)k∈[n]

P(E | χ∅)≤ (1 + η)

m∏i=1

χv∅iE∅[χv∅i ]

. (5.3.46)

The factors χv∅i/E∅[χv∅i ] correspond to the size-biasing of χv∅k for k ∈ [m] as


indicated in (5.3.38). Since this is of product structure, asymptotic independenceof (χv∅k)k∈[n]\v∅ remains.

We conclude that with probability at least 1− ε with respect to P∅,

(1− η)ρ(·) ≤ ρ(· | E , χv∅ = γ∅) ≤ (1 + η)ρ(·), (5.3.47)

where ρ is the density of (χv∅k)k∈[n]\v∅, whose law we denote by P.To complete the proof, we use [Volume 1, (2.2.8)] to compute that

dTV(ρ(· | E , χv∅), ρ(·)) = 1−∫ρ(~χ | E , χv∅) ∧ ρ(~χ)d~χ. (5.3.48)

Let Ω∅ denote the where (5.3.47) holds, so that P∅(Ω | E , χv∅) ≥ 1 − η. Then,we bound

dTV(ρ(· | E , χv∅), ρ(·)) ≤ 1−∫

Ω∅

ρ(~χ | E , χv∅) ∧ ρ(~χ)d~χ. (5.3.49)

On Ω∅, (5.3.47) holds, so that ρ(~χ) ≥ ρ(~χ | E , χv∅)/(1 + η). Thus,

dTV(ρ(· | E , χv∅), ρ(·)) ≤ 1− 1

1 + η

∫Ω∅

ρ(~χ | E , χv∅) (5.3.50)

= 1− 1

1 + ηP∅(Ω | E , χv∅) ≤ 1− 1− η

1 + η=

2η

1 + η≤ 2η.

We conclude that

dTV(ρ(· | A,χv∅), ρ(·)) ≤ 2η. (5.3.51)

Choosing η so small that 2η ≤ ε, this shows that we can couple (χE,γ∅k )k∈[n]\v∅

and (χk)k∈[n]\v∅ such that they are equal with probability at least 1 − ε, asrequired. It is straightforward to show that (iv) holds for the random variables(χk)k∈[n]\v∅ as these are independent Gamma distributions. This completes theproof of (iv).

The fact that P(B(n)

1 (on) ' t) → P(B1(∅) ' t) follows directly from the factthat we have coupled such that q(n)

∅ = q∅, and this occurs with probability atleast 1− ε. Since ε > 0 is arbitrary, the statement follows.

Exploration of PA(m,δ)n (d): general neighborhood

In this step, we discuss how we can explore the one-neighborhood of a uniformlychosen vertex, so as to prove Theorem 5.8.

Proposition 5.22 (Coupling general neighborhood of uniform vertex in PA(m,δ)

n (d)and Polya-point tree) Fix ε > 0 and r ≥ 1. Let on be chosen uniformly at ran-dom from [n]. Let vw be the vertices that are within distance r from on, orderedsuch that the edge vw, vwi was created before the edge vw, vw(i+1). Then, thereexist η > 0 and N = N(ε) <∞ such that, with probability at least 1−ε, B(n)

r (on),where on = v∅, together with the variables (S(n)

vw)|w|≤r, and Br(∅), together with

its positions (Xw)i≥1, can be coupled such that

(1) B(n)

r (on) ' Br(∅) and |Br(∅)| ≤ K;


(2) |Xw − S(n)

vw| ≤ η for all k for which w ∈ Br(∅);

(3) (vw)w∈Br(∅) are all distinct and vw ≥ ηn for all vw ∈ B(n)

r (on);(4) Γw ≤ K for all w ∈ Br(∅).

Consequently, P(B(n)

r (on) ' t)→ P(Br(∅) ' t).

Proposition 5.22 is the natural extension to r ≥ 1 of Lemma 5.21, and we willsee that its proof runs accordingly.

Proof For r = 1, this follows from Lemma 5.21. This initiates the inductionargument.

Assume by induction that the statement holds for r − 1, and fix Br−1(∅),(Xw)w∈Br−1(∅) (Γw)w∈Br−1(∅), (χw)w∈Br−1(∅) and (vw)w∈Br(∅) such that Proposi-tion 5.22 holds for them (a statement that holds with probability at least 1 − εaccording to the induction hypothesis). We now wish to extend these statementsto r.

For this, fix a vertex w ∈ ∂Br−1(∅) = Br−1(∅) \ Br−2(∅). We explore theneighborhood of vw in much the same way as exploring the neighborhood of v∅in the proof of Lemma 5.21, but we have much more information already thatwe need to keep in check. To this end, we note that for all w′ ∈ Br−1(∅), theneighborhood of vw′ is already determined by our conditioning on (vw)w∈Br−1(∅).This implies in particular that none of the edges sent out of vw′ can be attachedto a vertex v ∈ B(n)

r−1(on), except when w′ is of type R and vw′ happend to bethe parent of vw, in which case the edge between vw′ and vw is already present.To determine the children of type L of vw, we therefore have to condition onnot hitting the set B(n)

r−1(on). Apart from this, the process of determining thechildren of vw is exactly the same as that for v∅ in the proof of Lemma 5.21.Since |B(n)

r−1(on)| ≤ N , vw′ ≥ ηn and χvw′ ≤ K for all w′ ∈ B(n)

r−1(on), it followsthat ∑

vw′∈B(n)r−1(on)

ϕ(vw)

vw′≤ C

n(5.3.52)

for some C > 0. This implies that conditioning on vw′ 6∈ B(n)

r−1(on) has only anegligible effect on the distribution of the children of vw. We thus proceed asin the proof of Lemma 5.21 to obtain a coupling between a sequence of i.i.d.values (Xwi)i∈[m] of type L and the children (vwi)i∈[m]. As before, we have that|S(n)

vwi−Xwi| ≤ η, for n large enough, with probability at least 1− ε.

We can continue with this process for all vw ∈ ∂B(n)

r−1(on) = B(n)

r−1(on)\B(n)

r−2(on),thus producing the set Lr consisting of all the type L children of vertex vw ∈B(n)

r (on) \ B(n)

r−1(on). It is not hard to see that with probability tending to 1 asn → ∞, the set Lr has no intersection with B(n)

r−1(on), so we will assume this inthe remainder of the proof.

Next, we continue with the children of type R of vertices in ∂Br(∅). As-sume that we have already determined the children of type R of the vertices


in Ur−1 ⊆ B(n)

r−1(on) \ B(n)

r−2(on). Denote the set of children obtained so far byR(Ur−1). We decompose this set as R(Ur−1r) =

⋃mi=1R

(i)(Ur), where R(i)(Ur) =v ∈ Rr : Xv,i ∈ Ur−1. Now consider a vertex vw ∈ ∂B(n)

r−1(on) \Ur−1. Condition-ing on the graph obtained so far is again not difficult, and now amounts to twoconditions:

(1) Xv,i 6= vw if v ∈ B(n)

r−1(on)∪Ur−1, since all the edges coming out of this set havealready been determined;

(2) For v 6∈ B(n)

r−1(on) ∪ Ur−1, the probability that vw receives the ith edge fromk is different from the probability given in (5.3.42), since the random variableXv,i has been probed or checked before. Indeed, we know that Xv,i 6∈ B(n)

r−2(on),since otherwise v would have sent out an edge to a vertex in B(n)

r−2(on), whichwould mean that v would have been a child of type R in B(n)

r−1(on), which givesa contradiction. We also know that Xv,i 6∈ Ur−1, since otherwise v ∈ R(i)(Ur).

Instead of (5.3.42), (2) means that we have to use the modified probability

Pv→v′ =ϕ(v−1)

v′

S(v−1)

v−1

, (5.3.53)

where

S(v−1)

v−1 =∑

u>v−1: u 6∈B(n)r−1(on)∪Ur

ϕ(v−1)

u . (5.3.54)

Since, by the induction hypothesis, S(v−1)

v−1 ≤ S(v−1)

v−1 ≤ S(v−1)

v−1 +C/n, for some C > 0,

we can still infer Lemma 5.18 to approximate Pv→v′ by Pv→v′ , which equals

Pv→v′ =1

n

χv′

2m+ δ

n

v′

(v′v

)χ. (5.3.55)

From here onwards, the proof is a straightforward adaptation of the proof ofLemma 5.21, and we omit further details.

The fact that P(B(n)

r (on) ' (t)) → P(Br(∅) ' (t)) follows since we havecoupled the neighborhoods such that with probability at least 1− ε, q(n)

w = qw forall |w| ≤ r. Since ε > 0 is arbitrary, the statement follows.

Exploration of PA(m,δ)n (d): two neighborhoods

In this part, we argue that

P(B(n)

r (o(1)

n ), B(n)

r (o(2)

n ) ' t)→ P(Br(∅) ' t)2, (5.3.56)

where o(1)

n , o(2)

n are two vertices chosen uniformly at random and independentlyfrom [n].

A second moment method then immediately shows that, for every rooted treet,

1

n

∑v∈[n]

1B(n)1 (v)'t

P−→ P(Br(∅) ' t), (5.3.57)


which completes the proof of the local weak convergence in probability in Theo-rem 5.8.

Incorporate Proposition 5.23 more fluently in the text!

Again, in what follows, we regard Br(oi) as a rooted graph, where the verticesreceive labels in [n], and the edges receive labels in [m] corresponding to the labelof the directed edge that gives rise to the edge (in either possible direction). Thus,

the edge u, v receives label j when uj v or when v

j u.

Let t1 and t2 be two rooted vertex- and edge-labelled trees of height exactly r,where the vertex labels are in [n] and the edge labels in [m]. We write Br(oi) = tito denote that the vertices, edges and edge labels in Br(oi) are given by those inti. Note that this is rather different from Br(oi) ' ti as defined in Definition 2.3,where t is unlabeled and we investigate the isomorphism of Br(oi) and ti.

Let V (ti) denote the vertex labels in ti. Also, let ∂V (ti) denote the verticesat distance exactly r from the root of ti, and let V (ti) = V (ti) \ ∂V (ti) denotethe restriction of ti to all vertices at distance at most r − 1 from its root. Withthis notation in hand, we have the following characterization of the conditionallaw of Br(o1) and Br(o2), which is a generalization of Proposition 5.20 to twoneighborhoods:

Proposition 5.23 (Law of neighborhoods in PA(m,δ)

n (d)) Fix m ≥ 2 and δ >−m, and consider PA(m,δ)

n (d). Let t1 and t2 be two rooted vertex- and edge-labelledtrees with disjoint vertex sets and roots o1 and o2, respectively. Then, as n→∞,

P(Br(o1) = t1, Br(o2) = t2 | (ψv)v∈V (t1)∪V (t2)

)(5.3.58)

= (1 + oP(1))∏

v∈V (t1)∪V (t2)


′v

×∏

s∈[n]\(V (t1)∪V (t2))


,

where

a′s = 1s∈V (t1)∪V (t2)∑

u∈V (t1)∪V (t2)

1u∼s,u>s, (5.3.59)

b′s =∑

u,v∈V (t1)∪V (t2)

1uju v

1s∈(v,u]. (5.3.60)

Proof The proof follows the steps in the proof of Proposition 5.20. We start byanalyzing the conditional law of Br(o1) and Br(o2) given all (ψv)v∈[n]. After this,we will take an expectation w.r.t. ψv for v 6∈ Br−1(o1) ∪ Br−1(o2) to get to theclaim.


Computing the conditional law of Br(o1) and Br(o2) given (ψv)v∈[n]

We start by introducing some useful notation. Define, for u > v, the edge proba-bility

p(u, v) = ψv∏

s∈(v,u]

(1− ψs). (5.3.61)

We first condition on all (ψv)v∈[n] to obtain, for two trees t1 and t2,

Pn(Br(o1) = t1, Br(o2) = t2) =∏

v∈V (t1)∪V (t2)

ψa′vv

n∏s=1

(1− ψs)b′s (5.3.62)

×∏

v∈V (t1)∪V (t2)

∏u,j : u

j

6 v

[1− p(u, v)],

where the first term is due to all the required edges in Br(o1) = t1, Br(o2) = t2,while the second term is due to all other edges that are not allowed to be there.

The no-further-edge probability

We continue by analyzing the second line in (5.3.62), which for simplicity, wecall the no-further-edge probability. First of all, since we are exploring the r-neighborhoods of o1 and o2, the only edges that we need to require not to be

there are of the form uj v where v ∈ V (t1) ∪ V (t2) and u > v, i.e., younger

vertices than those in V (t1) ∪ V (t2) that do not form edges in t1 and t2. Sincep(u, v) is small for all v ∈ V (t1) ∪ V (t2) and u > v, and there are only finitelymany elements in V (t1) ∪ V (t2), we can approximate∏v∈V (t1)∪V (t2)

∏u,j : u

j

6 v

[1− p(u, v)] = (1 + oP(1))∏

v∈V (t1)∪V (t2)

∏u,j : u>v

[1− p(u, v)].

(5.3.63)Recall (5.3.61). We can approximate∏

u,j : u>v

[1− p(u, v)] = eΘ(1)∑u,j : u∈(v,n] p(u,v)2

exp(∑

u,j : u>v

p(u, v)). (5.3.64)

We compute ∑u,j : u∈(v,n]

p(u, v) = mψv∑

u∈(v,n]

∏s∈(v,u]

(1− ψs) (5.3.65)

= mψv∑

u∈(v,n]

S(n)

u

S(n)v.

We recall that nχS(n)

ns

a.s.−→Ms−χ. Further, u 7→ S(n)

u is decreasing, and s 7→Ms−χ

is continuous, so that also ∑u∈(sn,n]

nχS(n)

u

a.s.−→∫ 1

s

t−χsχds. (5.3.66)


We conclude that

m

snψsn

∑u∈(sn,n]

p(u, v)a.s.−→ m

s

∫ 1

s

t−χsχds =m

1− χ [1− s1−χ] (5.3.67)

= (2m+ δ)[1− s1−χ].

Since (vψv)v∈V (t1)∪V (t2) converges in distribution to a sequence of independentGamma random variables, ∑

u,j : u∈(v,n]

p(u, v)2 P−→ 0. (5.3.68)

Therefore, ∏v∈V (t1)∪V (t2)

∏u,j : u

j

6 v

[1− p(u, v)] (5.3.69)

= (1 + oP(1))∏

v∈V (t1)∪V (t2)

e−(2m+δ)vψv(1−(v/n)1−χ).

Conclusion of the proof

We next take the expectation w.r.t. ψs for all s 6∈ V (t1) ∪ V (t2) to obtain

P(Br(o1) = t1, Br(o2) = t2 | (ψj)j∈[n]\(V (t1)∪V (t2))

)(5.3.70)

= (1 + oP(1))∏

v∈V (t1)∪V (t2)


′v

×∏

s∈[n]\(V (t1)∪V (t2))


,

as required. We further note that a′s ∈ 0, 1 for all s ∈ [n] and that a′s = 0 unlesss ∈ V (t1) ∪ V (t2), so that a′s = 1 can only occur for s ∈ ∂V (t1) ∪ ∂V (t2).

Completion of proof

We note that

P(B(n)

r (o(1)

n ) ∩B(n)

r (o(2)

n ) 6= ∅) = P(o(2)

n /∈ B(n)

2r (o(1)

n ) = 1− o(1), (5.3.71)

since |B(n)

2r (o(1)

n | is a tight sequence of random variables. Thus, it suffices to showthat

P(B(n)

r (o(1)

n ), B(n)

r (o(2)

n ) ' t, B(n)

r (o(1)

n ) ∩B(n)

r (o(2)

n ) = ∅)→ P(Br(∅) ' t)2. (5.3.72)

To see (5.3.56), we condition onB(n)

r (o(1)

n ), which is such thatBr(∅(1)), (Xw)w∈Br(∅(1))

(Γw)w∈Br(∅), (χw)w∈Br(∅) and (vw)w∈Br(∅) such that Proposition 5.22 holds. Thisoccurs whp, as Proposition 5.22. Consider

P(B(n)

r (o(2)

n ) ' t | B(n)

r (o(1)

n ), o(2)

n 6∈ B(n)

2r (o(1)

n )). (5.3.73)


We adapt the proof of Proposition 5.22. Conditioning on B(n)

r (o(1)

n ), o(2)

n 6∈ B(n)

2r (o(1)

n )has the effect that (5.3.53) is now modified to

Pv→v′ =ϕ(v−1)

v′

S(v−1)

v−1,2

, (5.3.74)

where now

S(v−1)

v−1 =∑

u>v−1: u 6∈B(n)r−1(on)∪Ur∪B(n)

2r (o(1)n )

ϕ(v−1)

u . (5.3.75)

This again makes very little difference, so that indeed, for B(n)

r (o(1)

n ), which is suchthat Br(∅(1)), (Xw)w∈Br(∅(1)) (Γw)w∈Br(∅), (χw)w∈Br(∅) and (vw)w∈Br(∅) such thatProposition 5.22 holds,

P(B(n)

r (o(2)

n ) ' t | B(n)

r (o(1)

n ), o(2)

n 6∈ B(n)

2r (o(1)

n ))

(5.3.76)

P−→ P(Br(∅) ' t),

as required.

5.3.1 Local weak convergence of related models

In this section, we discuss the local weak convergence of two related models.The main result is the following theorem:

Theorem 5.24 (Local weak convergence of preferential attachment models: fixededges) Fix m ≥ 1 and δ > −m. The preferential attachment models PA(m,δ)

n andPA(m,δ)

n (d) converge locally weakly in probability to the Polya-point tree.

5.3.2 Local weak convergence Bernoulli preferential attachment

Recall the Bernoulli preferential attachment model (BPA(f)

t )t≥1 in Section 1.3.6.A special case is the (BPA(f)

t )t≥1 with an affine attachment function f , i.e., thesetting where there exist γ, β > 0 such that

f(k) = γk + β. (5.3.77)

Due to the attachment rules in (1.3.64), the model does not satisfy the rescalingproperty that the model with cf has the same law as the model with f , for anyc > 0. In fact, it turns out that the parameter γ > 0 (which is, by conventionalways taken to be 1 for (PA(m,δ)

t )t≥1) is now the parameter that determines thetail behavior of the degree distribution (recall Exercise 1.12). In Exercises 5.21-5.22, you are asked to compute the average degree of this affine model, as well asthe number of edges added at time t for large t.

In this section, we investigate the local weak limit of Bernoulli preferentialattachment models, as introduced in Section 1.3.6. The main result is as follows:

Theorem 5.25 (Local weak convergence of preferential attachment models: con-ditionally independent edges) Fix δ ≥ 0. The preferential attachment model


with conditionally independent edges converges locally weakly in probability to thePolya-point graph with mixed Poisson distributions.

TO DO 5.1: Add local weak convergence for BPAM

5.4 Connectivity of preferential attachment models

In this section we investigate the connectivity of (PA(m,δ)

t )t≥1. We start bydescribing the connectivity when m = 1, which is special. For m = 1, the numberof connected components of PA(1,δ)

t Nt has distribution given by

Nt = I1 + I2 + · · ·+ It, (5.4.1)

where Ii is the indicator that the ith edge connects to itself, so that (Ii)i≥1 areindependent indicator variables with

P(Ii = 1) =1 + δ

(2 + δ)(i− 1) + 1 + δ. (5.4.2)

It is not hard to see that this implies that Nt/ log t converges in probability to(1+δ)/(2+δ) < 1, so that whpthere exists a largest connected component of sizeat least t/ log t. As a result, whp PA(1,δ)

t is not connected, but has few connectedcomponents which are almost all quite large. We do not elaborate more on theconnectivity properties form = 1 and instead leave the asymptotics of the numberof connected components as Exercise 5.23.

For m ≥ 2, the situation is entirely different since then PA(m,δ)

t is connectedwhpat sufficiently large times:

Theorem 5.26 (Connectivity of PA(m,δ)

t for m ≥ 2) Fix m ≥ 2. Then, withhigh probability for T large, PA(m,δ)

t is connected for all t ≥ T .

Proof of Theorem 5.26. Again, we let Nt denote the number of connected com-ponents of PA(m,δ)

t . We note that, It = Nt −Nt−1 = 1 precisely when all m edgesof vertex t are attached to vertex t. Thus

P(It = 1) =m∏e=1

2e− 1 + δ

(2m+ δ)t+ (2e− 1 + δ). (5.4.3)

For m ≥ 2,∞∑t=2

P(It = 1) <∞, (5.4.4)

so that, almost surely, It = 1 only occurs finitely often. As a result, limt→∞Nt <∞ almost surely since Nt ≤ 1 +

∑∞t=2 It. This implies that, for m ≥ 2, PA(m,δ)

t al-most surely contains only finitely many connected components. However, PA(m,δ)

t

has a positive probability of being disconnected at a certain time t ≥ 2 (see Exer-cise 5.27 below). However, for m ≥ 2, It = Nt −Nt−1 can also be negative, since

5.4 Connectivity of preferential attachment models 207

the edges of the vertex vt can be attached to two distinct connected components.We will see that this happens with high probability, which explains why Nt = 1whp for t large, as we next show.

We first fix K ≥ 1 large. Then, with probability converging to 1 as K → ∞,∑∞t=K 1Nt>Nt−1 = 0. We condition on

∑∞t=K 1Nt>Nt−1 = 0, so that no new

connected components are formed after time K, and the number of connectedcomponents can only decrease in time. Let Fs denote the σ-algebra generatedby (PA(m,δ)

t )st=1. We are left to prove that for t sufficiently large, the vertices1, . . . ,K are whp all connected in PA(m,δ)

t . This proof proceeds in two steps. Weshow that, if Nt ≥ 2 and t is large, then P(N2t − Nt ≤ −1 | FK , Nt ≥ 2) isuniformly bounded from below. Indeed, conditionally on FK , Nt ≥ 2, and usingNt ≤ K, PA(m,δ)

t must have one connected component of size at least t/K. Everyother component has at least one vertex in it, and its degree is at least m. Fixs ∈ [2t] \ [t]. This means that the probability that the first edge of v(m)

s connectsto the connected component of size at least t/K, while the second connects tothe connected component of size at least 1, conditionally on Ft is at least

m+ δ

2(2m+ δ)t

(m+ δ)t/K

2(2m+ δ)t≥ ε

t, (5.4.5)

for some ε > 0. Thus, the probability that this happens for at least one s ∈ [2t]\[t]is at least

1−(

1− ε

t

)t, (5.4.6)

which is uniformly positive for every t. Thus, P(N2t − Nt ≤ −1 | FK , Nt ≥2) is uniformly bounded from below. As a result, Nt

a.s.−→ 1, so that NT = 1for some T < ∞ a.s. Without loss of generality, we can take T ≥ K. When∑∞

t=K 1Nt>Nt−1 = 0, if NT = 1 for some T , then Nt = 1 for all t ≥ T . Thisproves that PA(m,δ)

2t is whp connected for all t ≥ T , where T is large, whichimplies Theorem 5.26.

5.4.1 Giant component for Bernoulli preferential attachment

The preferential attachment model PA(m,δ)

n turns out to be connected whp, aswe discuss in more detail in the next section. This, however, is not true for thepreferential attachment model with conditionally independent edges as definedin Section 1.3.6. Here, we describe the existence of the giant component in thismodel:

Theorem 5.27 (Existence of a giant component: linear case) If f(k) = γk+ βfor some 0 ≤ γ < 1 and 0 < β ≤ 1, then there exists a giant component if andonly if

γ ≥ 1

2or β >

(1/2− γ)2

1− γ . (5.4.7)

The notation used by Dereich and Morters (2013) is slightly different from ours.


Dereich and Morters (2009) prove that their model obeys an asymptotic power-law with exponent τ = 1 + 1/γ = 3 + δ/m, so that γ intuitively corresponds toγ = m/(2m+ δ). As a result, γ ≥ 1

2corresponds to δ ≤ 0, which is also precisely

the setting where the configuration model always has a giant component (recallTheorem 4.4).

The more involved case of more general attachment functions k 7→ f(k) is moredelicate to describe. We start by introducing some notation, following Dereichand Morters (2013). We call a preferential attachment function f : 0, 1, 2, . . . 7→(0,∞) concave when

f(0) ≤ 1 and ∆f(k) := f(k + 1)− f(k) < 1 forall k ≥ 0. (5.4.8)

By concavity the limit

γ := limn→∞

f(n)

n= min

k≥0∆f(k) (5.4.9)

exists. γ plays a crucial role in the analysis, as can already be observed in Theorem5.27.

Let (Zt)t≥0 be a pure birth Markov process started with birth rate f(k) whenit is in state k. Let S := ` ∪ [0,∞] be a type space. Given a α ∈ (0, 1), definethe increasing functions M , respectively, M τ , by

M(n) =

∫ t

0

e−sE[f(Zs)]ds, M(n) = E[Zt], M τ (n) = E[Zt | ∆Zτ = 1]−1[τ,∞)(n),

(5.4.10)Next, define a linear operator Aα on the Banach space C(S) of continuous,bounded functions on S := ` ∪ [0,∞] with ` being a (non-numerical) sym-bol, by

(Aαg)(τ) =

∫ ∞0

g(n)eαtdM(n) +

∫ ∞0

g(n)e−αtdM τ (n). (5.4.11)

The operator Aα should be thought of as describing the expected offspring ofvertices of different types, as explained in more detail below. The main resulton the existence of a giant component in the preferential attachment model withconditionally independent edges is the following theorem:

Theorem 5.28 (Existence of a giant component) No giant component existsif and only if there exists 0 < α < 1 such that Aα is a compact operator withspectral radius ρ(Aα) ≤ 1.

It turns out that Aα is a well-defined compact operator (Dereich and Morters,2013, Lemma 3.1) if and only if (Aα1)(0) < ∞. When thinking of Aα as thereproduction operator, the spectral radius ρ(Aα) describes whether the multi-type branching process has a positive survival probability. Thus, ρ(Aα) shouldbe thought of as the equivalent of the usual condition E[X] ≤ 1 for extinction ofa discrete single-type branching process.TO DO 5.2: Extend the discussion of the phase transition for BPAM

5.5 Further results for preferential attachment models 209

5.5 Further results for preferential attachment models

TO DO 5.3: Add further results!



The proof of Theorem 5.2 is adapted from Ross (1996). More recent discussionson exchangeable random variables and their properties can be found in Aldous(1985) and Pemantle (2007), the latter focussing on random walks with self-interaction, where exchangeability is a crucial tool. There is a lot of work on urnschemes, also in cases where the weight functions are not linear with equal slope,in which case the limits can be seen to obey rather different characteristics. Seee.g., (Athreya and Ney, 1972, Chapter 9).


The multitype branching process local weak limit in Theorem 5.8 has been es-tablished by Berger, Borgs, Chayes and Saberi (2014) for preferential attachmentmodels with a fixed number of outgoing edges per vertex. Berger et al. (2014)only treat the case where δ ≥ 0, the more recent extension to δ > −m is novel.This is due to the fact that Berger et al. (2014) view the attachment probabili-ties as a mixture between attaching uniformly and according to the degree. We,instead, rely on the Polya urn description that works for all δ > −m. In thiscase, Theorem 5.8 is (Berger et al., 2014, Theorem 2.2). Theorem 5.8 states lo-cal weak convergence in probability to the Polya-point tree, while (Berger et al.,2014, Theorem 2.2) states local weak convergence in distribution. Local weakconvergence in probability can be deduced from the convergence in probability ofsubgraph counts in (Berger et al., 2014, Lemma 2.4). We refrain from discussingthis issue further.

The proof of Theorem 5.10 follows (Berger et al., 2014, Section 3.1) closely,apart from the fact that we do not rely on the relation to a mixture of choosinga vertex uniformly and according to degree.

Berger et al. (2014) also study two related settings, one where the edges areattached independently (i.e., without the intermediate update of the degrees whileattaching the m edges incident to the newest vertex), and the conditional modelin which the edges are attached to distinct vertices. This shows that the resultis quite robust, and hopefully also applies to the related settings of PA(m,δ)

n andPA(m,δ)

n (b).A related version of Theorem 5.10 for δ = 0 was proved by Bollobas and

Riordan (2004a) in terms of a pairing representation. This applies to the modelPA(m,δ)

n with δ = 0. Another related version of Theorem 5.10 is proved by Rudaset al. (2007), which applies to general preferential attachment functions withm = 1 and relies on a continuous-time embedding in terms of continuous-timebranching processes. We refer to Section 5.5 for more details.


Theorem 5.25 is proved by Dereich and Morters (2009, 2011, 2013) for theBernoulli preferential attachment model.


The result on the giant component for preferential attachment models with con-ditionally independent edges in Theorem 5.28 is proved by Dereich and Morters(2009, 2011, 2013).


The embedding results in terms of continuous-time branching processes can befound in Rudas et al. (2007), as well as Athreya (2007); Athreya et al. (2008).


Exercise 5.1 (Stationarity of exchangeable sequences) Show that when (Xi)ni=1

are exchangeable, then the marginal distribution of Xi is the same as that of X1.Show also that the distribution of (Xi, Xj), for j 6= i, is the same as that of(X1, X2).

Exercise 5.2 (I.i.d. sequences are exchangeable) Show that when (Xi)i≥1 arei.i.d., they form an infinite sequence of exchangeable random variables.

Exercise 5.3 (Limiting density in De Finetti’s Theorem (Theorem 5.2)) UseDe Finetti’s Theorem (Theorem 5.2) to prove that Sn/n

a.s.−→ U . Use this to prove(5.1.4).

Exercise 5.4 (The number of ones in (Xi)ni=1) Prove (5.1.3).

Exercise 5.5 (Positive correlation of exchangeable random variables) Let (Xi)i≥1

be an infinite sequence of exchangeable random variables. Prove that

P(Xk = Xn = 1) ≥ P(Xk = 1)P(Xn = 1). (5.7.1)

Prove that equality holds if and only if there exists a p such that P(U = p) = 1.

Exercise 5.6 (Limiting density of mixing distribution for Polya urn schemes)Prove that (5.1.23) proves (5.1.4).

Exercise 5.7 (Uniform recursive trees) A uniform recursive tree is obtained bystarting with a single vertex, and successively attaching the (n + 1)st vertex toa uniformly chosen vertex in [n]. Prove that for uniform recursive trees the treedecomposition in Theorem 5.4 is such that

S1(n)

S1(n) + S2(n)a.s.−→ U, (5.7.2)

where U is uniform on [0, 1]. Use this to prove that P(S1(n) = k) = 1/n for eachk ∈ [n].


Exercise 5.8 (Scale-free trees) Recall the model studied in Theorem 5.4, whereat time t = 2, we start with two vertices of which vertex 1 has degree d1 andvertex 2 has degree d2. After this, we successively attach vertices to older verticeswith probability proportional to the degree plus δ > −1 as in (1.3.62). Show thatthe model for (PA(1,δ)

n (b))n≥1, for which the graph at time n = 2 consists of twovertices joined by two edges, arises when d1 = d2 = 2. What does Theorem 5.4imply for (PA(1,δ)

n (b))n≥1?

Exercise 5.9 (Relative degrees of vertices 1 and 2) Use Theorem 5.6 to computelimt→∞ P(D2(n) ≥ xD1(n)) for (PA(1,δ)

t )t≥1.

Exercise 5.10 (Proof of Theorem 5.7) Complete the proof of Theorem 5.7.

Exercise 5.11 (Size-biased version of Gamma) Let X have a Gamma dis-tribution with parameter r. Show that its size-biased version X? has a Gammadistribution with parameter r + 1.

Exercise 5.12 (Power-law exponents in PA(m,δ)

t (d)) Prove the power-law rela-tions in (5.2.10) and identify cm,δ and c′m,δ.

Exercise 5.13 (Joint law (D,D′) for PA(m,δ)

t (d) (Berger et al., 2014, Lemma 5.3))Adapt the proof of Lemma 5.9 to show that, for j ≥ m and k ≥ m+ 1,

P(D = j,D′ = k) (5.7.3)

=2m+ δ

m2

Γ(k + 1 + δ)

k!Γ(m+ 1 + δ)

Γ(j + δ)

j!Γ(m+ δ)

×∫ 1

0

∫ 1

v

(1− v)kvm+1+δ+δ/m(1− u)jum+δdudv.


t (d) (Berger et al., 2014, Lemma 5.3))Use Lemma 5.9 and Exercise 5.13 to show that, for fixed j ≥ m and as k →∞,

P(D′ = k | D = j) = Cjk−(2+δ/m)(1 +O(1/k)). (5.7.4)


t (d) (Berger et al., 2014, Lemma5.3)) Use Lemma 5.9 and Exercise 5.13 to show that, for fixed k ≥ m + 1 andas j →∞,

P(D = j | D′ = k) = Cjj−(m+4+δ+δ/m)(1 +O(1/j)). (5.7.5)

Exercise 5.16 (Simple form of Sk in (5.2.21)) Prove, using induction on k,that S(n)

k =∏ni=k+1(1− ψi) in (5.2.21) holds.

Exercise 5.17 (Multiple edges and Theorem 5.10) Fix m = 2. Let Mt denotethe number of multiple edges in PA(m,δ)

t (d). Use Theorem 5.10 to show that

E[Mt+1 −Mt] =t∑

k=1

E[( ϕkS(n)

t−1

)2]. (5.7.6)

Exercise 5.18 (Multiple edges and Theorem 5.10 (cont.)) Fix m = 2. Let Mt


denote the number of multiple edges in PA(m,δ)

t (d). Compute E[(

ϕk

S(n)t−1

)2]and use

this to show that E[Mt]/ log t→ c, and identify c > 0.

Exercise 5.19 (Multiple edges and Theorem 5.10 (cont.)) Fix m = 2. Let Mt

denote the number of multiple edges in PA(m,δ)

t (d). Use Theorem 5.10 to showthat, conditionally on (ψk)k≥1, the sequence (Mt+1 −Mt)t≥2 is an independentsequence with

P(Mt+1 −Mt = 1 | (ψk)k≥1

)=

t−1∑k=1

( ϕkS(n)

t−1

)2

. (5.7.7)

Exercise 5.20 (Almost sure limit of Fk in (5.2.22)) Fix k ≥ 1. Prove that(Mn(k))n≥k+1, where

Mn(k) =n∏

j=k+1

1− ψjE[1− ψj]

, (5.7.8)

is a multiplicative positive martingale and using the Martingale Convergence The-orem ([Volume 1, Theorem 2.24]) and thus converges. Conclude that (5.2.22)holds by showing that

∏nj=k+1 E[1− ψj] = (k/n)χ(1 + o(1)).

Exercise 5.21 (Recursion formula for total edges in affine BPA(f)

t ) Consider theaffine BPA(f)

t with f(k) = γk+β. Derive a recursion formula for E[|E(BPA(f)

t )|],where we recall that |E(BPA(f)

t )| is the total number of edges in BPA(f)

t . Identifyµ such that E[|E(BPA(f)

t )|]/t→ µ.

Exercise 5.22 (Number of edges per vertex in affine BPA(f)

t ) Consider the

affine BPA(f)

t with f(k) = γk + β. Assume that |E(BPA(f)

t )|]/t P−→ µ. Show that

Dt(n)d−→ Poi(µ).

Exercise 5.23 (CLT for number of connected components for m = 1) Showthat the number of connected components Nt in PA(1,δ)

t satisfies a central limittheorem with equal asymptotic mean and variance given by

E[Nt] =1 + δ

2 + δlog t(1 + o(1)), Var(Nt) =

1 + δ

2 + δlog t(1 + o(1)). (5.7.9)

Exercise 5.24 (Number of connected components for m = 1) Use Exercise5.23 to show that the number of connected components Nt in PA(1,δ)

t satisfies

Nt/ log tP−→ (1 + δ)/(2 + δ).

Exercise 5.25 (Number of self-loops in PA(m,δ)

t ) Fix m ≥ 1 and δ > −m. Usea similar analysis as in Exercise 5.23 to show that the number of self-loops St inPA(m,δ)

t satisfies St/ log tP−→ m(m+ 1 + 2δ)/[2(2m+ δ)].

Exercise 5.26 (Number of self-loops in PA(m,δ)

t (b)) Fix m ≥ 1 and δ > −m.Use a similar analysis as in Exercise 5.25 to show that the number of self-loopsSt in PA(m,δ)

t (b) satisfies St/ log tP−→ (m− 1)(m+ 2δ)/[2(2m+ δ)].


Exercise 5.27 (All-time connectivity for (PA(m,δ)

t )t≥1) Fix m ≥ 2. Compute theprobability that (PA(m,δ)

t )t≥1 is connected for all times t ≥ 1, and show that thisprobability is in (0, 1).

Part III

Small-world propertiesof random graphs

215

217

Summary of Part II.

So far, we have considered the simplest connectivity properties possible. We havefocused on the degrees in Volume 1, and the existence and uniqueness of a macro-scopic connected component in Part II of this book. We can summarize the resultsobtained in the following meta theorem:

Meta Theorem A. (Existence and uniqueness of giant component) In a ran-dom graph model with power-law degrees having power-law exponent τ , there is aunique giant component when τ ∈ (2, 3) and there is a unique giant componentwhen the graph has average degree that exceeds a certain precise threshold whenτ > 3.

The above means, informally, that the giant component is quite robust torandom removal of edges when τ ∈ (2, 3), while it is not when τ > 3. Theseresults make the general philosophy that ‘random graphs with similar degreecharacteristics should behave alike’ precise, at least at the level of the existenceof a giant component.

Overview of Part III.

In this part, we aim to extend the discussion of similarity of random graphsto their small-world characteristics, by investigating distances within the giantcomponent. We focus both on their typical distances, which means the graphdistances between most pair of vertices as characterized by the graph distancebetween two uniformly chosen vertices conditioned on being connected, as wellas on their maximal distances as characterized by their diameters.

In more detail, this part is organised as follows. We study distances in gen-eral inhomogeneous random graphs in Chapter 6, and those in the configurationmodel, as well the closely related uniform random graph with prescribed degrees,in Chapter 7. In the last chapter of this part, Chapter 8, we study distances inthe preferential attachment model.

Chapter 6

Small-world phenomena ininhomogeneous random graphs

Abstract

In this chapter, we investigate the small-world structure in rank-1 in-homogeneous random graphs. For this, we develop path-counting tech-niques that are interesting in their own right.

TO DO 6.1: Add case of distances in real-world network...

In this chapter, we investigate the small-world properties of inhomogeneousrandom graphs.


We start in Section 6.1 by discussing results on the small-world phenomenon ininhomogeneous random graphs. We state results for general inhomogeneous ran-dom graphs, and then specialize to rank-1 inhomogeneous random graphs. Onlyfor the rank-1 case, we provide complete proofs. For the general case, we insteadexplain the intuition and give proofs in special cases. In Section 6.2, we provelower bounds on typical distances. In Section 6.3, we prove the correspondingupper bounds in the log log regime, and in Section 6.4, we discuss path-countingtechniques to show the log upper bound for τ > 3. In Section 6.5.1, we discussthe diameter of inhomogeneous random graphs. In Section 6.5, we discuss relatedresults for distances in inhomogeneous random graphs. We close this chapter withnotes and discussion in Section 6.6, and with exercises in Section 6.7.

6.1 Small-world effect in inhomogeneous random graphs

In this section, we consider the distances between vertices of IRGn(κn) where,as usual, (κn) is a graphical sequence of kernels with limit κ.

Recall that we write distG(i, j) for the graph distance between the verticesi, j ∈ [n] in a graph G, where the graph distance is the minimum number ofedges in the graph G that form a path from i to j, and where, by convention,we let distG(i, j) =∞ when i, j are in different connected components. We definethe typical graph distance to be distG(o1, o2), where o1, o2 are two vertices thatare chosen uniformly at random from the vertex set [n].

It is possible that no path connecting o1 and o2 exists, in which case we definedistIRGn(κn)(o1, o2) =∞. By Theorem 3.16, P(distIRGn(κn)(o1, o2) =∞)→ 1− ζ2 >0, since ζ < 1 (see Exercise 3.32). In particular, when ζ = 0, which is equivalentto ν ≤ 1, P(distIRGn(κn)(o1, o2) =∞)→ 1. Therefore, in our main results, we shallcondition on o1 and o2 to be connected, and only consider cases where ζ > 0.

219

220 Small-world phenomena in inhomogeneous random graphs

Logarithmic asymptotics of typical graph distance in IRGn(κn)

We start by discussing logarithmic asymptotics of the typical graph distance inthe case where ν = ‖Tκ‖ ∈ (1,∞). When ‖Tκ‖ = ∞, instead, then our resultsalso prove that distIRGn(κn)(o1, o2) = oP(log n), but they do not tell us much abouttheir exact asymptotics.

The main result on typical graph distances in IRGn(κn) is as follows:

Theorem 6.1 (Typical distances in IRGn(κn)) Let (κn) be graphical sequenceof kernels with limit κ, and with ν = ‖Tκ‖ ∈ (1,∞). Let ε > 0 be fixed. Then,for IRGn(κn),

(i) If supx,y,n κn(x, y) <∞, then

P(distIRGn(κn)(o1, o2) ≤ (1− ε) logν n) = o(1). (6.1.1)

(ii) If κ is irreducible, then

P(distIRGn(κn)(o1, o2) ≤ (1 + ε) logν n) = ζ2κ + o(1). (6.1.2)

In the terminology of [Volume 1, Section 1.4], Theorem 6.1(ii) implies thatIRGn(κ) is a small world when κ is irreducible and ν = ‖Tκ‖ <∞. Theorem 6.1(i)shows that the graph distances are of order Θ(log n) when supx,y,n κn(x, y) <∞,so that IRGn(κn) is not an ultra-small world.

The intuition behind Theorem 6.1 is that, by (3.4.6) and (??), a Poisson multi-type branching process with kernel κ has neighborhoods that grow exponentially,i.e., the number of vertices at distance k grows like ‖Tκ‖k. Thus, if we examinethe distance between two vertices o1 and o2 chosen uniformly at random from [n],then we need to explore the neighborhood of vertex o1 up to the moment thatit ‘catches’ vertex o2. In this case, the neighborhood must be of size of order n,so that we need that ‖Tκ‖k = νk ∼ n, i.e., k = kn ∼ logν n. However, provingsuch a fact is quite tricky, since there are far fewer possible further vertices toexplore when the neighborhood has size proportional to n. The proof overcomesthis fact by exploring from the two vertices o1 and o2 simultaneously up to thefirst moment that these neighborhoods share a common vertex. At this moment,we have found the shortest path.

We next specialize to rank-1 inhomogeneous random graphs, where we alsoinvestigate in more detail what happens when ν = ∞ in the case where thedegree power-law exponent τ satisfies τ ∈ (2, 3).

Distances in rank-1 IRGs with finite-variance weights

We continue by investigating the behavior of distNRn(w)(o1, o2) for NRn(w) in thecase where the weights have finite variance:

Theorem 6.2 (Typical distances in NRn(w) for finite-variance weights) In theNorros-Reittu model NRn(w), where the weights w = (wi)i∈[n] satisfy Condition1.1(a)-(c) and where ν > 1, conditionally on distNRn(w)(o1, o2) <∞,

distNRn(w)(o1, o2)

log nP−→ 1/ log ν. (6.1.3)

6.1 Small-world effect in inhomogeneous random graphs 221

The same result applies, under the same conditions, to GRGn(w) and CLn(w).

Theorem 6.2 can be seen as a special case of Theorem 6.1. However, in Theorem6.1(i), we require that κn is a bounded kernel. In the setting of Theorem 6.2, thiswould imply that maxiwi is uniformly bounded in n. Theorem 6.2 does notrequire this.

We give a complete proof of Theorem 6.2 in Sections 6.2 and 6.4 below. There,we will also use the ideas in the proof of Theorem 6.2 to give a proof of Theorem6.1.

The intuition behind Theorem 6.2 is as follows. In Section 3.5, we have arguedthat the neighborhood of a uniform vertex in NRn(w) is well-approximated bya two-stage branching process, where the second and all later generations haveoffspring distribution (p?k)k≥0, which is the probability mass function of the size-biased version minus one of the degree distribution D, where D has probabilitymass function (pk)k≥0 as in (3.5.37). When ν =

∑k≥0 kp

?k < ∞, the number of

vertices at distance k is close to Mνk, where M is the martingale limit of Zk/νk.

To know what distNRn(w)(o1, o2) is, we need to grow the neighborhoods from thefirst uniform vertex until we find the second uniform vertex. The latter happenswith reasonable probability when Zk ≈ n, which suggests that the relevant k issuch that νk ≈ n, so that k ≈ logν n.

While the above heuristic is quite convincing, the argument is fatally flawed.Indeed, as already argued in Section 3.5, the neighborhoods of a uniform vertexare well-approximated by a branching process as long as the number of verticesfound is much smaller than n. When the number of vertices found becomes oforder n, the depletion-of-points effect has already started to kick in. Therefore,the above approach is doomed to fail. Our proof instead, is divided in a lowerand and upper bound on the typical distance distNRn(w)(o1, o2). For the proof ofthe lower bound in Section 6.2.1, we show that the expected number of paths of kedges between two uniform vertices is approximately νk/`n, so that such a pathwhp does not exist when k ≤ (1 − ε) logν n. For the proof of the upper boundin Section 6.4, we use a second moment method to show that, conditionally onthe two uniformly chosen vertices being in the giant component, whp there existsa path of (1 + ε) logν n edges. This requires novel path-counting techniques inrandom graphs that are interesting in their own right. Exercise 6.1 investigatetypical distance for the Erdos-Renyi random graph, as well as in the case whereν =∞.

Theorem 6.2 leaves open what happens when ν = ∞. We can use Theorem6.2 to show that distNRn(w)(o1, o2) = oP(log n), see Exercise 6.2 below. We nextstudy the case where the weights have an asymptotic power-law distribution withτ ∈ (2, 3).

Distances in rank-1 IRGs with infinite-variance weights

We continue to study typical distances in the Norros-Reittu random graph NRn(w),in the case where the degrees obey a power-law with degree exponent τ satisfying


that τ ∈ (2, 3). In this case, ν = ∞, so that distNRn(w)(o1, o2) = oP(log n) (recallExercise 6.2). In turns out that the precise scaling of distNRn(w)(o1, o2) dependssensitively on the precise way how νn →∞. Below, we assume that they obey apower-law with exponent τ satisfying that τ ∈ (2, 3). We will later discuss whathappens in related settings, for example when τ = 3.

Many of our arguments also apply to the generalized random graph GRGn(w)and the Chung-Lu model CLn(w). In this section, we discuss the setting where theweights w are heavy tailed. Recall that Fn(x) denotes the proportion of verticesi for which wi ≤ x. Then, we assume that there exists a τ ∈ (2, 3) such that forall δ > 0, there exists c1 = c1(δ) and c2 = c2(δ) such that, uniformly in n,

c1x−(τ−1+δ) ≤ [1− Fn](x) ≤ c2x

−(τ−1−δ), (6.1.4)

where the upper bound is expected to hold for every x ≥ 1, while the lower boundis only required to hold for 1 ≤ x ≤ nα for some α > 1/2.

The assumption in (6.1.4) is what we need precisely, and it states that [1 −Fn](x) obeys power-law bounds for appropriate values of x. Note that the lowerbound in (6.1.4) cannot be valid for all x, since Fn(x) > 0 implies that Fn(x) ≥1/n, so that the lower and upper bound in (6.1.4) are contradicting when x n1/(τ−1). Thus, the lower bound can hold only for x = O(n1/(τ−1)). When τ ∈(2, 3), we have that 1/(τ − 1) ∈ (1/2, 1), and we only need the lower boundto hold for x ≤ nα for some α ∈ (1/2, 1). Exercises 6.3 and 6.4 give simplerconditions for (6.1.4) in special cases, such as i.i.d. weights.

The main result on graph distances in the case of infinite-variance weights isas follows:

Theorem 6.3 (Typical distances in NRn(w) for τ ∈ (2, 3)) Fix the Norros-Reittu model NRn(w), where the weights w = (wi)i∈[n] satisfy Conditions 1.1(a)-(b) and (6.1.4). Then, conditionally on distNRn(w)(o1, o2) <∞,

distNRn(w)(o1, o2)

log log nP−→ 2

| log (τ − 2)| . (6.1.5)

The same results apply, under the same conditions, to CLn(w) and GRGn(w).

Theorem 6.3 implies that NRn(w) with w as in (1.3.15), for τ ∈ (2, 3), is anultra-small world when (6.2.23) is satisfied.

The main tool to study distances in NRn(w) is a comparison to branchingprocesses, which is particularly pretty for NRn(w). In the next two sections, weprove Theorems 6.2–6.3. When τ > 3, then the branching process approximationhas finite mean, and we can make use of the martingale limit results of the num-ber of individuals in generation k as k →∞. When τ ∈ (2, 3), on the other hand,the branching process has infinite mean. In this case, the number of individualsin generation k, conditionally on survival, grows super-exponentially, which ex-plains why distances grow doubly logarithmically. See Section 7.3, where this isexplained in more detail in the context of the configuration model.

6.2 Lower bounds on typical distances in IRGs 223

The super-exponential growth implies that a path between two vertices typ-ically passes through vertices with larger and larger weights as we move awayfrom the two vertices. Thus, starting from the first vertex o1 ∈ [n], the pathconnecting o1 to o2 uses vertices that first grow until the midpoint of the path isreached, and then decrease again to reach o2. This can be understood by notingthat the probability that a vertex with weight w is not connected to any vertexwith weight larger than y > w in NRn(w) is

e−∑i : wi>y

wwi/`n = e−w[1−F?n ](y), (6.1.6)

where F ?n(y) =

∑i : wi≤y wi/`n is the distribution function of W ?

n introduced in

(6.2.33). When (6.1.4) holds, it follows that [1 − F ?n ](y) is close to y−(τ−2), the

size-biasing increasing the power by one.Therefore, the probability that a vertexwith weight w is not connected to any vertex with weight larger than y > w inNRn(w) is approximately e−wy

−(τ−2)

. Take w large, then this probability is smallwhen y w1/(τ−2). Thus, a vertex of weight w is whp connected to a vertex ofweight w1/(τ−2). Since 1/(τ −2) > 1 when τ ∈ (2, 3), we obtain that vertices withlarge weights w are whp connected to vertices with weight at least w1/(τ−2).

The proof of Theorems 6.2–6.3 are organized as follows. In Section 6.2, weprove the lower bounds on the typical distance in NRn(w), both when τ > 3 andwhen τ ∈ (2, 3). In Section 6.3, we prove the log log n upper bound for τ ∈ (2, 3).In Section 6.4, we investigate the number of paths between sets of vertices inNRn(w), and use this to prove the log n upper bound when τ > 3. In eachof our proofs, we formulate the precise results as separate theorems, and provethem under conditions that are slightly weaker than those in Theorems 6.2–6.3.Further, in Section 6.2 we also give the proof of Theorem 6.1(i), and in Section6.4 we also give the proof of Theorem 6.1(ii).

6.2 Lower bounds on typical distances in IRGs

In this section, we prove lower bounds on typical graph distances. In Section6.2.1, we prove the lower bound in Theorem 6.1(i) in the setting of Theorem 6.2.

6.2.1 Logarithmic lower bound distances for finite-variance degrees

In this section, we prove a logarithmic lower bound on the graph distance inNRn(w). The main result is as follows:

Theorem 6.4 (Logarithmic lower bound graph distances NRn(w)) Assumethat

lim supn→∞

νn = ν, (6.2.1)

where ν ∈ (1,∞) and

νn = E[W 2n ]/E[Wn] =

∑i∈[n]

w2i /∑i∈[n]

wi. (6.2.2)


π0 = i π1 π2 π3 π4 π5 π6 π7 π8 π9 π10 π11 π12 = j

π

Figure 6.1 A 12-step self-avoiding path connecting vertices i and j.

Then, for any ε > 0,

P(distNRn(w)(o1, o2) ≤ (1− ε) logν n) = o(1). (6.2.3)

The same results hold for CLn(w) and GRGn(w) under the same conditions.

Proof The idea behind the proof of Theorem 6.4 is that it is quite unlikely thata path exists that is much shorter than logν n edges. In order to show this, weuse a first moment bound and show that the expected number of occupied pathsconnecting the two vertices chosen uniformly at random from [n] having lengthat most k is o(1). We now fill in the details.

We abbreviate kn = d(1−ε) logν ne. Then, conditioning on the uniform verticeschosen gives

P(distNRn(w)(o1, o2) ≤ kn) =1

n2

∑i,j∈[n]

P(distNRn(w)(i, j) ≤ kn)

=1

n2

∑i,j∈[n]

kn∑k=0

P(distNRn(w)(i, j) = k). (6.2.4)

In this section and in Section 6.4, we make use of path-counting techniques (seein particular Section 6.4.1). Here, we show that short paths are unlikely by givingupper bounds on the expected number of paths of various types. In Section 6.4.1,we give bounds on the variance of the number of paths of various types, so as toshow that long paths are quite likely to exist. Such bounds on the variance of thenumber of paths are quite challenging, and here we give some basics to highlightthe main ideas in a much simpler setting.

A path π = (π0, . . . , πk) of length k between vertices i and j is a sequence ofvertices connecting π0 = i to πk = j. We call a path π self-avoiding when it visitsevery vertex at most once, i.e., πi 6= πj for every i 6= j. Let Pk(i, j) denote theset of k-step self-avoiding paths between vertices i and j. See Figure 6.1 for anexample of a 12-step self-avoiding path between i and j.

When distNRn(w)(i, j) = k, there must be path of length k such that all edges(πl, πl+1) are occupied in NRn(w), for l = 0, . . . , k−1. The probability in NRn(w)that the edge (πl, πl+1) is occupied is equal to

1− e−wπlwπl+1/`n ≤ wπlwπl+1

/`n. (6.2.5)


For CLn(w) and GRGn(w), an identical upper bound holds, which explains whythe proof of Theorem 6.4 for NRn(w) applies verbatim to those models .

We say that π is occupied when all edges in π are occupied in NRn(w). Then,by the union bound or Boole’s inequality,

P(distNRn(w)(i, j) = k) ≤ P(∃π ∈ Pk(i, j) : π occupied) (6.2.6)

≤∑

π∈Pk(i,j)

P(π occupied).

For any path π ∈ Pk(i, j),

P(π occupied) =k−1∏s=0

P((πl, πl+1) occupied) ≤k−1∏l=0

wπlwπl+1/`n (6.2.7)

=wπ0

wπk`n

k∏l=1

w2πl/`n =

wiwj`n

k∏l=1

w2πl/`n.

Therefore,

P(distNRn(w)(i, j) = k) ≤ wiwj`n

∑π∈Pk(i,j)

k∏l=1

w2πl

`n(6.2.8)

≤ wiwj`n

k∏l=1

( ∑πl∈[n]

w2πl

`n

)=wiwj`n

νkn,

where νn is defined in (6.2.2). We conclude that

P(distNRn(w)(i, j) ≤ kn) ≤ 1

n2

∑i,j∈[n]

kn∑k=0

wiwj`n

νkn =`nn2

kn∑k=0

νkn (6.2.9)

=`nn2

νkn+1n − 1

νn − 1.

By (6.2.1), lim supn→∞ νn = ν ∈ (1,∞), so that, for n large enough, νn ≥ (ν−δ) >1, while `n/n = E[Wn] → E[W ] < ∞. Thus, since ν 7→ (νk+1 − 1)/(ν − 1) isincreasing for every integer k ≥ 0,

P(distNRn(w)(o1, o2) ≤ kn) ≤ O((ν − δ)kn/n) = o(1), (6.2.10)

when δ = δ(ε) > 0 is chosen such that (1 − ε)/ log(ν − δ) < 1, and since kn =d(1− ε) logν ne. This completes the proof of Theorem 6.4.

The condition (6.2.1) is slightly weaker than Condition 1.1(c), which is assumed inTheorem 6.2, as shown in Exercise 6.6. Exercise 6.7 extends the proof of Theorem6.4 to show that (distNRn(w)(o1, o2) ≤ logn

log νn)− is tight.

We close this section by extending the above result to settings where νn is notnecessarily bounded, the most interesting case being τ = 3:


Corollary 6.5 (Lower bound graph distances NRn(w) for τ = 3) Let νn begiven in (6.2.2). Then, for any ε > 0,

P(

distNRn(w)(o1, o2) ≤ (1− ε) logνn n)

= o(1). (6.2.11)


The proof of Corollary 6.5 is left as Exercise 6.8. In the case where τ = 3 and[1 − Fn](x) is, for a large range of x values, of the order x−2 (which is strongerthan τ = 3), it can be expected that νn = Θ(log n), so that in that case

P(

distNRn(w)(o1, o2) ≤ (1− ε) log n

log log n

)= o(1). (6.2.12)

Exercise 6.9 investigates the situation where τ = 3. Exercise 6.10 the case whereτ ∈ (2, 3), where Corollary 6.5 unfortunately does not give highly interestingresults.

Proof of Theorem 6.1(i)

The proof of the upper bound in Theorem 6.1(i) is closely related to that inTheorem 6.4. Note that

P(distIRGn(κn)(i, j) = k) ≤∑

i1,...,ik−1∈[n]

k−1∏l=0

κn(xil , xil+1)

n, (6.2.13)

where i0 = i, ik = j and we can restrict the vertices to be distinct, so that

P(distIRGn(κn)(i, j) = k) ≤ 1

nk

∑i0,i1,...,ik−1,ik∈[n]

k−1∏l=0

κn(xil , xil+1). (6.2.14)

If the above (k + 1)-dimensional discrete integrals could be replaced by the con-tinuous integral, then we would arrive at

1

n

∫S· · ·∫S

k∏l=0

κ(xl, xl+1)k∏i=0

µ(dxi) =1

n‖Tk+1

κ 1‖1, (6.2.15)

which is bounded from above by 1n‖Tκ‖k+1. Repeating the bound in (6.2.10)

would then prove that, when ν = ‖Tκ‖ > 1,

P(distIRGn(κn)(i, j) ≤ (1− ε) logν n) = o(1). (6.2.16)

However, in the general case, it is not so easy to replace the (k+ 1)-fold discretesum in (6.2.14) by a (k+ 1)-fold integral. We next explain how this can be done,starting with the finite-types case.


In the finite-type case, (6.2.13) turns into

P(distIRGn(κ)(i, j) = k) ≤ 1

n2

∑v1,...,vk−1∈[n]

k−1∏l=0

κn(xvl , xvl+1)

n(6.2.17)

≤ 1

n2

∑i1,...,ik−1∈[n]

k−1∏l=0

κ(n)(il, il+1)nil+1

n,

where the number of vertices of type i is denoted by ni, and where the probabilitythat there exists an edge between vertices of type i and j is equal to κ(n)(i, j)/n.Under the conditions in Theorem 6.1(i), we have that ni/n → pi and thatκ(n)(i, j) → κij. This also implies that ‖Tκn‖ → ν, where ν is largest eigenvalueof the matrix (mij)i,j∈[r] with mij = κijpj. Denoting m(n)

ij = κ(n)(i, j)nj/n→ mij,we obtain

P(distIRGn(κ)(i, j) = k) ≤ 1

n2

∑v1,...,vk−1∈[n]

k−1∏l=0

κn(xvl , xvl+1)

n(6.2.18)

≤ 1

n〈µT , [M (n)]k1〉,

where 1 is the all-one vector, µi = ni/n→ pi, and M (n)

ij = m(n)

ij . Obviously, wherethere are N <∞ types,

〈µT , [M (n)]k1〉 ≤ ‖M (n)‖k‖µ‖‖1‖ ≤ ‖M (n)‖k√N. (6.2.19)

Thus,

P(distIRGn(κ)(i, j) = k) ≤√N

n‖M (n)‖k. (6.2.20)

We conclude that

P(dIRGn(κn)(o1, o2) ≤ (1− ε) logνn n) = o(1), (6.2.21)

where νn = ‖M (n)‖ → ν. This proves the claim of Theorem 6.1(i) in the finite-typesetting.

We next extend the proof of Theorem 6.1(i) to the infinite-type setting. Assumethat the conditions in Theorem 6.1(i) hold. Recall the bound in (3.3.18), whichbounds κn from above by κ+

m, which is of finite-type. Then, use the fact that‖Tκ+

m‖ ↓ ‖Tκ‖ = ν > 1 to conclude that P(dIRGn(κn)(o1, o2) ≤ (1 − ε) logν n) =

o(1) holds under the conditions of Theorem 6.1(i). This completes the proof ofTheorem 6.1(i).

Theorem 6.1 leaves open the case when ‖Tκ‖ = ∞, which, for example forCLn(w), is the case when F has infinite second moment. (Bollobas et al., 2007,Theorem 3.14(iv)) states that when ‖Tκ‖ = ∞, the typical graph distance issmaller than log n. More precisely, (Bollobas et al., 2007, Theorem 3.14(iv)) states


that if κ is irreducible and ‖Tκ‖ = ∞, then there is a function f(n) = o(log n)such that

P(distIRGn(κ)(i, j) ≤ f(n)) = ζ2κ + o(1). (6.2.22)

Exercise 6.12 shows that one can take f(n) = o(log n).

6.2.2 log log lower bound on distances for infinite-variance degrees

In this section, we prove a log log-lower bound on the typical distances ofNRn(w) for τ ∈ (2, 3). The main result we prove is the following theorem:

Theorem 6.6 (Loglog lower bound on typical distances in NRn(w)) Supposethat the weights w = (wi)i∈[n] satisfy Condition 1.1(a) and that there exists aτ ∈ (2, 3) and c2 such that, for all x ≥ 1,

[1− Fn](x) ≤ c2x−(τ−1), (6.2.23)

Then, for every ε > 0,

P(

distNRn(w)(i, j) ≤ (1− ε) 2 log log n

| log (τ − 2)|)

= o(1). (6.2.24)


We follow the proof of Theorem 6.4 as closely as possible. The problem withthat proof is that, under the condition in (6.2.23), νn is too large. Indeed, Exercise6.10 shows that the lower bound obtained in Corollary 6.5 is a constant, whichis not very useful. What goes wrong in that argument is that there are too manyvertices with too high weight, and they provide the main contribution to νn andhence to the upper bound as in (6.2.9). However, this argument completely ignoresthe fact that it is quite unlikely that a vertex with a high weight is chosen.

Indeed, as argued in (6.1.6), when starting from a vertex with weight w, say,the probability that it is directly connected to a vertex having weight an is atmost ∑

j : wj≥y

wwj`n

= w[1− F ?n ](y), (6.2.25)

which is small when y is too large. On the other hand, the main contribution toνn comes from vertices having maximal weight of the order n1/(τ−1). This problemis resolved by a suitable truncation argument on the weights of the vertices in theoccupied paths, which effectively removes these high-weight vertices. Therefore,instead of obtaining νn =

∑s∈[n]w

2s/`n, we obtain a partial sum of this restricted

to vertices having a relatively small weight. Effectively, this means that we splitthe space of all paths into good paths, i.e., paths that avoid vertices with toolarge weight, and bad paths, which are paths that jump to vertices with too highweight.


π0 = i:wπ0 ≤ b0

π1:wπ1 ≤ b1 π9:wπ9 ≤ b1

π10 = j:wπ10 ≤ b0

π2:wπ2 ≤ b2 π8:wπ8 ≤ b2

π3:wπ3 ≤ b3 π7:wπ7 ≤ b3

π4:wπ4 ≤ b4π6:wπ6 ≤ b4

π5:wπ5 ≤ b5

π

Figure 6.2 A 10-step good path connecting i and j and the upper boundson the weight of its vertices. The height of a vertex is high for vertices withlarge weights.

We now present the details for this argument. We again start from

P(distNRn(w)(o1, o2) ≤ kn) =1

n2

∑i,j∈[n]

P(distNRn(w)(i, j) ≤ kn). (6.2.26)

When distNRn(w)(i, j) ≤ kn, there exists an occupied path π ∈ Pk(i, j) for somek ≤ kn.

We fix an increasing sequence of numbers (bl)∞l=0 that serve as truncation values

for the weights of vertices along our occupied path. We determine the precisevalues of (bl)

∞l=0, which is quite delicate, below. We say that a path π ∈ Pk(i, j) is

good when wπl ≤ bl∧bk−l for every l = 0, . . . , k, and bad otherwise. The conditionwπl ≤ bl ∧ bk−l for every l = 0, . . . , k is equivalent to the statement that wπl ≤ blfor l ≤ dk/2e, while wπl ≤ bk−l for dk/2e < l ≤ k. Thus, bl provides an upperbound on the weight of the lth vertex and the (k − l)th vertex of the occupiedpath, ensuring that the weights occurring in the occupied path can not be toolarge. See Figure 6.2 for a description of a good path and the bounds on theweight of its vertices.

Let GPk(i, j) be the set of good paths in Pk(i, j). Let

Ek(i, j) = ∃π ∈ GPk(i, j) : π occupied (6.2.27)

denote the event that there exists a good path of length k.When distNRn(w)(i, j) ≤ kn, but there does not exist a k ≤ kn and a good

occupied path π ∈ GPk(i, j), then either there exists an l ≤ dk/2e such thatwπs ≤ bs for every s < l, while wπl > bl, or there exists an l ≤ dk/2e such thatwπk−s ≤ bk−s for every s < l, while wπk−l > bk−l. Let Pk(i) = ∪l∈[n]Pk(i, l) denote


the set of all paths of length k from i, and let

BPk(i) = π ∈ Pk(i) : wπl > bl, wπs ≤ bs∀s < l (6.2.28)

denote the set of bad paths of length k, i.e., those π ∈ Pk(i) that are not inGPk(i, πk). Let Fl(i) be the event that there exists a bad path of length l startingfrom i, i.e.,

Fl(i) = ∃π ∈ BP l(i) : π occupied. (6.2.29)

Then, since distNRn(w)(i, j) ≤ kn implies that there either is a good path or a badpath,

distNRn(w)(i, j) ≤ kn ⊆⋃k≤kn

(Fk(i) ∪ Fk(j) ∪ Ek(i, j)

), (6.2.30)

so that, by Boole’s inequality,

P(distNRn(w)(i, j) ≤ kn) ≤kn∑k=0

[P(Fk(i)) + P(Fk(j)) + P(Ek(i, j))

]. (6.2.31)

In order to estimate the probabilities P(Fk(i)) and P(Ek(i, j)), we introduce somenotation. For b ≥ 0, let

νn(b) =1

`n

∑i∈[n]

w2i1wi≤b, (6.2.32)

be the restriction of νn to vertices with weights at most b, and let

F ?n(x) =

1

`n

∑i∈[n]

wi1wi≤x (6.2.33)

be the distribution function of W ?n , the size-biased version of Wn. The following

lemma gives bounds on P(Fk(i)) and P(Ek(i, j)) in terms of the tail distributionfunction 1 − F ?

n and the truncated second moment νn(b), which we will boundusing Lemma :

Lemma 6.7 (Truncated path probabilities) For every k ≥ 1, (bl)l≥0 with bl ≥ 0and l 7→ bl non-decreasing,

P(Fk(i)) ≤ wi[1− F ?n ](bk)

k−1∏l=1

νn(bl), (6.2.34)

and

P(Ek(i, j)) ≤wiwj`n

k−1∏l=1

νn(bl ∧ bk−l), (6.2.35)

where

νn(b) ≤ cνb3−τ . (6.2.36)

When bl =∞ for each l, the bound in (6.2.35) equals that obtained in (6.2.8).


Proof We start by proving (6.2.34). By Boole’s inequality,

P(Fk(i)) = P(∃π ∈ BP l(i) : π occupied) ≤∑

π∈BPl(i)

P(π occupied). (6.2.37)

By (6.2.7), (6.2.32) and (6.2.33),

P(Fk(i)) ≤∑

π∈BPl(i)

wiwπk`n

k−1∏l=1

w2πl/`n (6.2.38)

≤ wi∑

πk : wπk≥bk

wπk`n×

k−1∏l=1

∑πl : wπl≤bl

w2πl/`n

= wi[1− F ?n ](bk)

k∏l=1

νn(bl).

The same bound applies to CLn(w) and GRGn(w).The proof of (6.2.35) is similar. Indeed, by (6.2.7),

P(Ek(i, j)) ≤∑

π∈GPk(i,j)

wiwj`n

k−1∏l=1

w2πl/`n ≤

wiwj`n

k−1∏l=1

ν(bl ∧ bk−l). (6.2.39)

Now follow the steps in the proof of (6.2.34). Finally, (6.2.42) follows from (1.4.2)in Lemma 1.13, combined with `n = Θ(n) by Conditions 1.1(a)–(b). See alsoExercise 6.5 below. Again the same bound applies to CLn(w) and GRGn(w).

In order to effectively apply Lemma 6.7, we use Lemmas 1.13 and 1.14 to derivebounds on [1− F ?

n ](x) and νn(b):

Lemma 6.8 (Bounds on sums) Suppose that the weights w = (wi)i∈[n] satisfyConditions 1.1(a)–(b) and that there exist τ ∈ (2, 3) and c2 such that, for allx ≥ 1,

[1− Fn](x) ≤ c2x−(τ−1). (6.2.40)

Then, there exists a constant c∗2 > 0 such that, for all x ≥ 1,

[1− F ?n ](x) ≤ c?2x−(τ−2), (6.2.41)

and there exists a cν > 0 such that for all b ≥ 1,

νn(b) ≤ cνb3−τ . (6.2.42)

Proof The bound in (6.2.41) follows from Lemma 1.14, the bound in (6.2.42)from (1.4.3) in Lemma 1.13 with a = 2 > τ−1 when τ ∈ (2, 3). For both lemmas,the assumptions follow from (6.2.40) (which, in turn, equals (6.2.23)).

With Lemmas 6.7 and 6.8 in hand, we are ready to choose (bl)l≥0 and to completethe proof of Theorem 6.6:


Proof of Theorem 6.6. Take kn = 2(1− ε) log log n/| log (τ − 2)|. By (6.2.26) and(6.2.30),

P(distNRn(w)(i, j) ≤ kn) ≤ 1

n+

kn∑k=1

[ 2

n

∑i∈[n]

P(Fk(i)) +1

n2

∑i,j∈[n] : i 6=j

P(Ek(i, j))],

where the contribution 1/n is due to i = j for which distNRn(w)(i, i) = 0. We useLemmas 6.7 and 6.8 to provide bounds on P(Fk(i)) and P(Ek(i, j)). These boundsare quite similar.

We first describe how we choose the truncation values (bl)∞l=0 so that [1−F ?

n ](bk)is so small that P(Fk(i)) is small, and, for this choice of (bl)

∞l=0, we show that

P(Ek(i, j)) is small. Intuitively, this means that it is quite unlikely that i or j isconnected to a vertex at distance k with too high weight, i.e., having weight atleast bk. At the same time, it is also unlikely that there is a path π ∈ Pk(i, j)whose weights are all small, i.e., for which wπk ≤ bk for every k ≤ kn, because bkis too large.

By Lemma 6.7, we wish to choose bk so that P(Fk(i)) ≤ [1−F ?n ](bk)

∏k−1l=0 νn(bl)

is small. Below (6.1.6), it is argued that bk ≈ b1/(τ−2)k−1 . In order to make this

probability small, we will take bk somewhat larger. We now make this argumentprecise.

We take δ ∈ (0, τ − 2) sufficiently small and let

a = 1/(τ − 2− δ) > 1. (6.2.43)

Take b0 = eA for some constant A ≥ 0 sufficiently large and define (bl)l≥0 recur-sively by

bl = bal−1, so that bl = bal

0 = eA(τ−2−δ)−l . (6.2.44)

We start from (6.2.31). By Lemma 6.7, we obtain an upper bound on P(Fk(i))in terms of factors νn(bl) and [1−F ?

n ](bk), which are bounded in Lemma 6.8. Westart by applying the bound on νn(bl) to obtain

k−1∏l=1

νn(bl) ≤k−1∏l=1

cνbτ−3l = ckνe

K(3−τ)∑k−1l=1 a

l

(6.2.45)

≤ ck−1ν eK(3−τ)ak/(a−1) = c?2wic

k−1ν b

(3−τ)/(a−1)k .

Combining (6.2.45) with the bound on [1− F ?n ](bk) in Lemma 6.8 yields

P(Fk(i)) ≤ c?2wickνb−(τ−2)+(3−τ)/(a−1)k . (6.2.46)

Since 3− τ + δ < 1 when τ ∈ (2, 3) and δ ∈ (0, τ − 2),

(τ − 2)− (3− τ)/(a− 1) = (τ − 2)− (3− τ)(τ − 2− δ)/(3− τ + δ) (6.2.47)

= δ/(3− τ + δ) > δ,

so that

P(Fk(i)) ≤ c?2wickνb−δk . (6.2.48)

6.3 The log log upper bound 233

As a result, for each δ > 0

1

n

∑i∈[n]

kn∑k=0

P(Fk(i)) ≤ c?21

n

∑i∈[n]

wi1wi>K +1

n

∑i∈[n]

c?2wi∑k≥1

ckνb−δk (6.2.49)

= O(ckνb−δk ) ≤ ε,

by (6.2.44) and when we take A = A(δ, ε) sufficiently large.Similarly, since bl ≥ 1, by (6.2.45),

P(Ek(i, j)) ≤wiwj`n

k−1∏l=1

νn(bl ∧ bk−l) ≤wiwj`n

ck−1ν b

2(3−τ)/(a−1)dk/2e , (6.2.50)

so that, using further that l 7→ bl is increasing,

kn∑k=1

1

n2

∑i,j∈[n]

P(Ek(i, j)) ≤1

n2

kn∑k=1

∑i,j∈[n]

wiwj`n

ck−1ν b

2(3−τ)/(a−1)dk/2e (6.2.51)

≤ `nn2knc

kn−1ν b

2(3−τ)/(a−1)dkn/2e ,

by (6.2.44) and the fact that k 7→ bk is monotonically increasing. We completethe proof by analyzing this bound.

Recall that k ≤ kn = 2(1− ε) log log n/| log(τ − 2)|. Take δ = δ(ε) > 0 so smallthat (τ − 2− δ)−(kn+1)/2 ≤ (log n)1−ε/4. Then, by (6.2.44),

bdkn/2e ≤ eA(τ−2−δ)−(kn+1)/2 ≤ eA(logn)1−ε/4, (6.2.52)

and we conclude that

kn∑k=1

1

n2

∑i,j∈[n]

P(Ek(i, j)) ≤`nn2knc

knν exp

(2A(3− τ)(log n)1−ε/4) = o(1), (6.2.53)

since kn = O(log log n) and `n/n2 = Θ(1/n). This completes the proof of Theorem

6.6.

In Exercise 6.14, the above argument is extended to show that the sequence of

random variables(

distNRn(w)(o1, o2) ≤ 2 log logn| log (τ−2)|

)−

is tight.

6.3 The log log upper bound

In this section, we prove the log log upper bound on typical graph distances inthe case where the asymptotic weight distribution has infinite variance. Recall thecomparison neighborhoods of vertices in NRn(w) to branching processes discussedin Section 3.5.3. Throughout this section, we assume that there exist τ ∈ (2, 3),α > 1/2 and c1 such that, uniformly in n and x ≤ nα,

[1− Fn](x) ≥ c1x−(τ−1). (6.3.1)


The bound in (6.3.1) corresponds to the lower bound in (6.1.4). The main resultin this section is the following theorem:

Theorem 6.9 (A log log upper bound on typical distance for τ ∈ (2, 3)) Sup-pose that empirical distribution function Fn of the weights w = (wi)i∈[n] satisfiesConditions 1.1(a)-(b) and (6.3.1). Then, for every ε > 0, as n→∞,

P(

distNRn(w)(o1, o2) ≤ 2(1 + ε) log log n

| log (τ − 2)| | distNRn(w)(o1, o2) <∞)→ 1. (6.3.2)


The proof Theorem 6.9 is organized as follows. We start by showing that thegiant-weight vertices, i.e., the vertices with extremely high weight larger than nα,are all connected to one another. Thus, the giant-weight vertices form a completegraph. This is often referred to as a clique in the random graph community.In the second step, we show that connections from a vertex to the set of giantweight vertices occur at distance at most (1+ε) log logn

| log (τ−2)| . The latter is only true when

the vertex is in the giant connected component, a fact we need to carefully intoaccount. In the final step, we complete the proof of Theorem 6.9. We now startby defining the set of giant-weight vertices.

The giant-weight vertices form a clique

Recall the definition of α > 1/2 in (6.3.1). Let

Giantn = i : wi ≥ nα (6.3.3)

denote the set of vertices with giant weights. Let A ⊆ [n]. We say that A formsa clique when the edges a1a2 are occupied for all a1, a2 ∈ A. We continue byproving that, whp, Giantn forms a clique:

Lemma 6.10 (High-weight vertices form clique) Under the conditions of The-orem 6.9,

P(Giantn does not form clique) ≤ n2e−n2α/`n (6.3.4)

The same results hold for CLn(w) under the same conditions, for GRGn(w) thediameter of Giantn is at most 2.

Proof Let a1, a2 ∈ Giantn, so that wa1, wa2

≥ nα. There are at most |Giantn|2 ≤n2 pairs of vertices in Giantn, so that

P(Giantn does not form clique) ≤ n2 maxa1,a2∈Giantn

P(a1a2 vacant). (6.3.5)

The edge a1a2 is vacant with probability

P(a1a2 vacant) = e−wa1wa2

/`n ≤ e−n2α/`n , (6.3.6)

since wa ≥ nα for every a ∈ Giantn. Multiplying out gives the result. For CLn(w),P(a1a2 vacant) = 0, so the same proof applies.


For GRGn(w), we need to strengthen this analysis slightly. Indeed, for GRGn(w),for all a1, a2 ∈ Giantn,

P(a1a2 present) ≥ n2α

`n + n2α= 1−Θ(n1−2α) ≥ 1

2. (6.3.7)

Thus, the diameter of Giantn is bounded by the diameter of ERn(p) with p = 12.

Thus, it suffices to prove that the diameter of ERn( 12) is whp bounded by 2. For

this, we note that

P(diam(ERn( 12) > 2) ≤ n2P(dist

ERn(12

)(1, 2) > 2). (6.3.8)

The event distERn(

12

)(1, 2) > 2 implies that all two-hop paths between 1 and 2

are not occupied, so that, by independence

P(distERn(

12

)(1, 2) > 2) = (1− 1

4)n−2, (6.3.9)

which, combined with (6.3.8), completes the proof.

Connections to Giantn occur at log log n distances

We next show that vertices that survive up to distance m have a high probabilityof connecting to Giantn using a path of at most (1 + ε) log logn

| log (τ−2)| edges:

Proposition 6.11 (Connecting to Giantn) Let i ∈ [n] be such that wi > 1.Under the conditions of Theorem 6.9, there exist c, c?1 > 0 and η > 0 such that

P(

distNRn(w)(i,Giantn) ≥ (1 + ε)log logn

| log (τ − 2)|)≤ ce−c?1wηi . (6.3.10)

Consequently, withWm(i) =∑

k∈∂Bm(i)wk denoting the weight of vertices at graphdistance m from i,

P(

distNRn(w)(Nm(i),Giantn) ≥ (1 + ε)log log n

| log (τ − 2)| | Bm(i))≤ ce−c?1Wm(i)η .

(6.3.11)

Proof We start by proving (6.3.10). The bound in (6.3.10) is trivial unless wi islarge. We let x0 = i, and define, recursively,

x` = maxj ∈ [n] : x`−1j occupied. (6.3.12)

Thus, x` is the maximal-weight neighbor of x`−1. We stop the above recursionwhen wx` ≥ nα, since then x` ∈ Giantn. Recall the heuristic below (6.1.6), whichshows that a vertex with weight w is whp connected to a vertex with weightw1/(τ−2). We now make this precise.

We take a = 1/(τ − 2 + δ), where we choose δ > 0 so small that a > 1. By(6.2.33),

P(wx`+1< wax` | (xs)s≤`) = e

−wx`∑l : wl≥wax`

wl/`n= e−wx` [1−F

?n ](wax`

). (6.3.13)


We split the argument depending on whether wax` ≤ nα or not. Firstly, whenwax` ≤ nα, by (6.3.1) and uniformly for x ≤ nα,

[1− F ?n ](x) ≥ xn

`n[1− Fn](x) ≥ c?1x−(τ−2), (6.3.14)

where, for n large enough, we can take c?1 = c1/(2E[W ]). Therefore,

P(wx`+1< wax` | (xs)s≤`) ≤ e−c

?1w

1−(τ−2)ax` ≤ e−c

?1w

δx` , (6.3.15)

since a = 1/(τ − 2 + δ) > 1 so that 1− (τ − 2)a = aδ > δ.Secondly, when wax` > nα, but wx` < nα, we can use (6.3.14) for x = nα to

obtain

P(wx`+1< nα | (xs)s≤`) ≤ e−c

?1wx`n

α(τ−2) ≤ e−c?1nα[1−(τ−2)]/a ≤ e−c

?1nαδ/a

. (6.3.16)

Therefore, in both cases, and with η = αδ/a,

P(wx`+1< (nα ∧ wax`) | (xs)s≤`) ≤ e−c

?1w

ηx` . (6.3.17)

As a result, when x` is such that wx` is quite large, whp, wx`+1≥ wx` . This

produces, whp, a short path to Giantn. We now investigate the properties of thispath.

Let the recursion stop at some integer time k. The key observation is that whenthis occurs, we must have that wx`+1

> wax` for each ` ≤ k − 1 where k is suchthat wxk−1

∈ [nα/a, nα], and at the same time wxk ≥ nα. Then, we conclude thatthe following facts are true:

(1) wx` ≥ wa`

x0= wa

`

i for every ` ≤ k − 1,(2) distNRn(w)(i,Giantn) ≤ k.

By (1), wxk−1≥ wa

k−1

i , and wxk−1∈ [nα/a, nα]. Therefore, wa

k−1

i ≤ nα, which,in turn, implies that

ak−1 ≤ α log n, or k − 1 ≤ (log log n+ logα)(log a). (6.3.18)

Let kn = (1 + ε) log logn| log (τ−2)| . By (1) and (2), when distNRn(w)(i,Giantn) > kn occurs,

then there must exist an ` ≤ kn such that wx`+1≤ nα ∧ wax` . We conclude that

P(

distNRn(w)(i,Giantn) ≥ kn)≤

kn∑`=0

P(wx`+1≤ wax`) (6.3.19)

≤kn∑`=0

E[P(wx`+1≤ wax` | (xs)s≤`)]

≤kn∑`=0

E[e−c?1w

ηx` ] ≤

kn∑`=0

e−c?1w

δa`

i ≤ ce−c?1wηi .

This completes the proof of (6.3.10).The proof of (6.3.11) is similar, by conditioning on Bm(i) and by noting that we

can interpret Bm(i) as a single vertex having weight Wm(i) =∑

k∈Bm(i)wk.



To prove the upper bound in Theorem 6.9, for ε ∈ (0, 1), we take

kn = (1 + ε)log log n

| log (τ − 2)| , (6.3.20)

so that it suffices to show, for every ε > 0,

limn→∞

P(distNRn(w)(o1, o2) ≤ 2kn| distNRn(w)(o1, o2) <∞) = 1. (6.3.21)

Since

P(distNRn(w)(o1, o2) ≤ 2kn| distNRn(w)(o1, o2) <∞) (6.3.22)

=P(distNRn(w)(o1, o2) ≤ 2kn)

P(distNRn(w)(o1, o2) <∞),

this follows from the two bounds

lim infn→∞

P(distNRn(w)(o1, o2) <∞) ≤ ζ2, (6.3.23)

lim supn→∞

P(distNRn(w)(o1, o2) ≤ 2kn) ≥ ζ2, (6.3.24)

with ζ > 0 the survival probability of the underlying branching process approxi-mation to the neighborhoods of NRn(w) identified in Theorem 3.15. For (6.3.23),we split, for some m ≥ 1,

P(distNRn(w)(o1, o2) <∞) (6.3.25)

≤ P(distNRn(w)(o1, o2) ≤ 2m)

+ P(|∂Bm(o1)| > 0, |∂Bm(o1)| > 0,distNRn(w)(o1, o2) > 2m).

By Corollary 2.18, P(distNRn(w)(o1, o2) ≤ 2m) = o(1), and, by (2.4.2) in Corollary2.17,

P(|∂Bm(o1)| > 0, |∂Bm(o1)| > 0, distNRn(w)(o1, o2) > 2m) (6.3.26)

= P(|∂Bm(o)| > 0)2 + o(1),

where ∂Bm(o) is the set of vertices at distance m in the local weak limit ofNRn(w) identified in Theorem 3.15. The right-hand side of (6.3.26) converges toζ2 when m→∞. This proves (6.3.23).

To prove (6.3.24), we fix m ≥ 1 and write

P(2m < distNRn(w)(o1, o2) ≤ 2kn) (6.3.27)

≥ P(distNRn(w)(oi,Giantn) ≤ kn, i = 1, 2,distNRn(w)(o1, o2) > 2m

)≥ P(|∂Bm(o1)| > 0, |∂Bm(o1)| > 0, distNRn(w)(o1, o2) > 2m)

− 2P(distNRn(w)(o1,Giantn) < kn, |∂Bm(o1)| > 0

).

By (6.3.26), the first term converges to ζ2m, which in turn converges to ζ2 when

m→∞.


For the second term, we condition on Bm(o1), Bm(o2), and use that ∂Bm(o1)is measurable w.r.t. Bm(o1) to obtain

P(distNRn(w)(o1,Giantn) < kn, |Bm(o1)| > 0

)(6.3.28)

= E[P(distNRn(w)(o1,Giantn) > kn | Bm(o1)

)1|∂Bm(o1)|>0

].

By Proposition 6.11,

P(distNRn(w)(o1,Giantn) > kn | Bm(o1)

)≤ ce−c?1Wm(o1)η . (6.3.29)

By Theorem 3.15 and Conditions 1.1(a)-(b), we obtain that

Wm(o1)ηd−→|∂Bm(o)|∑i=1

W ?i , (6.3.30)

where(W ?i

)i≥1

are i.i.d. copies of random variables with distribution function F ?.

Therefore,

Wm(o1)P−→∞ (6.3.31)

when first n → ∞ followed by m → ∞, and we use that |∂Bm(o)| P−→ ∞ since|∂Bm(o)| > 0. As a result,

P(distNRn(w)(o1,Giantn) > kn | Bm(o1)

)1|∂Bm(o1)|>0

P−→ 0, (6.3.32)

which by Lebesgues Dominated Convergence Theorem [Volume 1, Theorem A.1]implies that

E[e−c

?1Wm(V1)η

1|∂Bm(o1)|>0

]→ 0, (6.3.33)

when first n→∞ followed by m→∞. This proves (6.3.24), and thus completesthe proof of the upper bound in Theorem 6.3.

6.4 Path counting and log upper bound for finite-variance weights

In this section, we give the proof of the log upper bound for finite-varianceweights. For this, we use the second moment method to show that whp there existsa path of at most (1 + ε) logν n edges between o1 and o2. To apply the second-moment method, we give a bound on the variance of the number of paths ofgiven lengths using path-counting techniques. This section is organised as follows.In Section 6.4.1, we highlight the path-counting techniques. In Section 6.4.2, weapply these methods to give upper bounds on distances for finite-variance weights.We also investigate the case where τ = 3, and prove that typical distances arebounded by log n/ log logn under appropriate conditions as well. We close inSection 6.4.3 by proving Theorems 3.16 and 6.1(ii).

6.4 Path counting and log upper bound for finite-variance weights 239

6.4.1 Path-counting techniques

In this section, we study path-counting techniques in the context of inhomoge-neous random graphs. We generalize the setting somewhat, and consider an IRGon the vertices I with edge probabilities pij = uiuj, for some weights (ui)i∈I . Weobtain CLn(w) by taking ui = wi/

√`n and I = [n]. Since the NRn(w) random

graph is closely related to CLn(w), this suffices for our purposes.For a, b ∈ I and k ≥ 1, let

Nk(a, b) = #π ∈ Pk(a, b) : π occupied (6.4.1)

denote the number of paths of length k between the vertices a and b. Let

nk(a, b) = E[Nk(a, b)] (6.4.2)

denote the expected number of occupied paths of length k connecting a and b.Define

nk(a, b) = uaub( ∑i∈I\a,b

u2i

)k−1

, nk(a, b) = uaub( ∑i∈Ia,b,k

u2i

)k−1

, (6.4.3)

where Ia,b,k is the subset of I in which a and b, as well as the k+ 2 vertices withhighest weights have been removed. In Section 6.2, we have implicitly proved anupper bound on E[Nk(a, b)] of the form (see also Exercise 6.15)

nk(a, b) ≤ nk(a, b). (6.4.4)

In this section, we will prove that nk(a, b) is a lower bound on nk(a, b), and usethese bounds to prove a variance bound on Nk(a, b).

Before stating our main result, we introduce some notation. Let

νI =∑i∈I

u2i , γI =

∑i∈I

u3i (6.4.5)

denote the sums of squares and third powers of (ui)i∈I , respectively. Our aim isto show that whp paths of length k exist between the vertices a and b for anappropriate choice of k. We do this by applying a second-moment method onNk(a, b), for which we need a lower bound on E[Nk(a, b)] and an upper bound onVar(Nk(a, b)), which are such that Var(Nk(a, b)) = o(E[Nk(a, b)]

2) (recall [Volume1, Theorem 2.18]). We prove lower bounds on E[Nk(a, b)] and upper bounds onVar(Nk(a, b)) in the following proposition, which is interesting in its own right:

Proposition 6.12 (Variance of numbers of paths) For any k ≥ 1, a, b ∈ I and(ui)i∈I,

E[Nk(a, b)] ≥ nk(a, b), (6.4.6)

while, assuming that νI > 1,

Var(Nk(a, b)) ≤ nk(a, b) (6.4.7)

+ nk(a, b)2( γIν

2I

νI − 1

( 1

ua+

1

ub

)+

γ2IνI

uaub(νI − 1)2+ ek

),


where

ek =(1 +

γIuaνI

)(1 +

γIubνI

) νIνI − 1

(e2k3γ2

I/ν3I − 1

).

The random variable Nk(a, b) is a sum of indicators. When all these indicatorswould be independent, then the upper bound nk(a, b) would hold. The secondterm on the right-hand side of (6.4.6) accounts for the positive dependence be-tween the indicators of twos paths being occupied.

We apply Proposition 6.12 in cases where E[Nk(a, b)] = nk(a, b)→∞, by takingI is a large subset of [n] and ui = wi/

√`n. In this case, νI ≈ νn ≈ ν > 1. In

our applications of Proposition 6.12, the ratio nk(a, b)/nk(a, b) will be bounded,and k3γ2

I/ν3I = o(1), so that the last term is an error term. The starting and end

vertices a, b ∈ I will correspond to a union of vertices in [n] of quite large size.As a result, γI/ua and γI/ua are typically small, so that

Var(Nk(a, b))

E[Nk(a, b)]2≈ γIν

2I

νI − 1

( 1

ua+

1

ub

)+

γ2IνI

uaub(νI − 1)2(6.4.8)

is small. The choice of a, b and I is quite delicate, which explains why we formulateProposition 6.12 in such generality.

Proof We note that Nk(a, b) is a sum of indicators

Nk(a, b) =∑

π∈Pk(a,b)

1π occupied. (6.4.9)

As a result,

E[Nk(a, b)] =∑

π∈Pk(a,b)

P(π occupied) =∑

π∈Pk(a,b)

k∏l=0

uπluπl+1(6.4.10)

= uπ0uπk

∑π∈Pk(a,b)

k−1∏l=1

u2πl.

For π ∈ Pk(a, b), π0 = a, πk = b. Further,

∑π∈Pk(a,b)

k−1∏l=1

u2πl

=∑∗

i1,...,ik−1∈I\a,b

k−1∏l=1

u2πl, (6.4.11)

where we recall that∑∗

i1,...,ir∈I denotes a sum over distinct indices. Each sumover ij yields a factor that is at least

∑i∈Ia,b,k u

2i , which proves (6.4.6).

To compute Var(Nk(a, b)), we again start from (6.4.9), which yields

Var(Nk(a, b)) =∑

π,ρ∈Pk(a,b)

[P(π, ρ occupied) (6.4.12)

− P(π occupied)P(ρ occupied)].


For π, ρ, we denote by π ∩ ρ the edges the paths π and ρ have in common. Theoccupation statuses of π and ρ are independent precisely when π∩ρ = ∅, so that

Var(Nk(a, b)) ≤∑

π, ρ ∈ Pk(a, b)π ∩ ρ 6= ∅

P(π, ρ occupied). (6.4.13)

Define ρ \ π to be the edges in ρ that are not part of π, so that

P(π, ρ occupied) = P(π occupied)P(ρ occupied | π occupied) (6.4.14)

=k∏l=0

uπluπl+1

∏e∈ρ\π

ueue,

where, for an edge e = x, y, we write e = x, e = y. When π = ρ, then

P(π, ρ occupied) = P(π occupied), (6.4.15)

and this contributes nk(a, b) to Var(Nk(a, b)). From now on, we consider π 6= ρ.The probability in (6.4.14) needs to be summed over all possible pairs of paths

(π, ρ) with π 6= ρ that share at least one edge. In order to do this effectively, westart by introducing some notation.

Let l = |π ∩ ρ| denote the number of edges in π ∩ ρ, so that l ≥ 1 preciselywhen π∩ ρ 6= ∅. Since π 6= ρ, l ≤ k− 1. When π 6= ρ, we have that l ≤ k− 1, andsince π and ρ are self-avoiding paths between a and b, l cannot be equal to k− 1,so that we consider l ≤ k − 2 from now on. Let k − l = |ρ \ π| be the number ofedges in ρ that are not part of π.

Let m denote the number of connected subpaths in ρ \ π, so that m ≥ 1whenever π 6= ρ. Since π0 = ρ0 = a and πk = ρk = b, these subpaths start andend in vertices along the path π. We can view these subpaths as excursions ofthe path ρ from the walk π. By construction, between two excursions, there is atleast one edge that π and ρ have in common.

Fix m. We define Shape(π, ρ), the shape of the pair (π, ρ), by

Shape(π, ρ) = (~xm+1, ~sm,~tm, ~om+1, ~rm+1), (6.4.16)

where

(1) ~xm+1 ∈ Nm+10 and xj ≥ 0 is the length of the subpath in ρ ∩ π in between the

(j − 1)st and jth subpath of π \ ρ. Here x1 is the number of common edges inthe subpath of ρ ∩ π that contains a, while xm+1 is the number of commonedges in the subpath of ρ ∩ π that contains b, so that x1 ≥ 0 and xm+1 ≥ 0.For j ∈ 2, . . . ,m, xj ≥ 1;

(2) ~sm ∈ Nm and sj ≥ 1 is the number of edges in the jth subpath of π \ ρ;(3) ~tm ∈ Nm and tj ≥ 1 is the number of edges in the jth subpath of ρ \ π;(4) ~om+1 ∈ [m+1]m+1 and oj is the order of the jth common subpath in ρ∩π of the

path π in ρ, i.e., o2 = 5 means that the second subpath that π has in commonwith ρ is the 5th subpath that ρ has in common with π. Note that o1 = 1 andom+1 = m+ 1, since π and ρ start and end in a and b, respectively;


π

ρ

x1 x2 x3 x4 x5r1 = 1 r2 = 1 r3 = 1 r4 = 0 r5 = 1

s1 s2 s3 s4

t1

t2

t3 t4

o1 = 1 o2 = 2 o3 = 4 o4 = 3 o5 = 5

Figure 6.3 An example of a pair of paths (π, ρ) and its correspondingshape.

(5) ~rm+1 ∈ 0, 1m+1 is such that rj describes the direction in which the jth commonsubpath in ρ∩π of the path π is traversed by ρ, with rj = 1 when the directionis the same for π and ρ and 0 otherwise. Thus, r1 = rm+1 = 1.

The information in Shape(π, ρ) is precisely what is necessary to piece togetherthe topology of the two paths, except for the information of the vertices involvedin π and ρ. See Figure 6.3 for an example of a pair of paths (π, ρ) and its corre-sponding shape.

With l = |π ∩ ρ|, we have

m+1∑j=1

xj = l,m∑j=1

sj =m∑j=1

tj = k − l. (6.4.17)

Let Shapem,l denote the set of shapes corresponding to pairs of paths (π, ρ) withm excursions and l common edges, so that (6.4.17) hold. Then,

Var(Nk(a, b)) ≤ nk(a, b)+k−2∑l=1

k−l∑m=1

∑σ∈Shapem,l

∑π, ρ ∈ Pk(a, b)

Shape(π, ρ) = σ

P(π, ρ occupied). (6.4.18)

When Shape(π, ρ) = σ for some σ ∈ Shapem,l, and since π and ρ both start andend in a and b, the union of paths π ∪ ρ visits k+ 1 + l−m distinct vertices. Thevertex a is in 1+δx1,0 edges, and b in 1+δxm+1,0 edges. Of the other k−1 vertices inπ, precisely 2m−δx1,0−δxm+1,0 are part of three edges, and k−1−2m+δx1,0+δxm+1,0

are part of two edges. The remaining k − l−m vertices in ρ that are not part ofπ are part of precisely 2 edges. By construction, the k+ 1 vertices of both π andρ are disjoint, but the remaining k− l−m vertices in ρ may intersect those of π.Therefore, denoting a1 = δx1,0, am+1 = δxm+1,0,

P(π, ρ occupied) = u1+a1a u

1+am+1

b

2m−a1−am+1∏s=1

u3vs

2(k−1)−l−m∏t=2m−a1−am+1+1

u2vt, (6.4.19)

where (v1, . . . , vk+1+l−m) ∈ Ik−1+l−m.For a fixed σ ∈ Shapem,l now bound the sum over π, ρ ∈ Pk(a, b) such that


Shape(π, ρ) = σ from above by summing (6.4.19) over all (v1, . . . , vk−1+l−m) ∈Ik−1+l−m, to obtain for any σ ∈ Shapem,l,∑

π, ρ ∈ Pk(a, b)Shape(π, ρ) = σ

P(π, ρ occupied) (6.4.20)

≤ uaubγ2mI ν2k−1−3m−l

I

(uaνIγI

)δx1,0(ubνIγI

)δxm+1,0

= nk(a, b)2γ2(m+1)I ν−3(m−1)−l

I

( γIuaνI

)1−δx1,0( γIubνI

)1−δxm+1,0 .

Therefore, we arrive at

Var(Nk(a, b)) ≤ nk(a, b) + nk(a, b)2k−2∑l=1

k−l∑m=1

γ2(m−1)I ν−3(m−1)−l

I (6.4.21)

×∑

σ∈Shapem,l

( γIuaνI

)1−δx1,0( γIubνI

)1−δxm+1,0 .

We continue to compute the number of shapes in the following lemma:

Lemma 6.13 (The number of shapes) Fix m ≥ 1 and l ≤ k−2. For m = 1, thenumber of shapes in Shapem,l with fixed a1 = δx1,0, am+1 = δxm+1,0 equals l whena1 = am+1 = 0, 1 when a1 + am+1 = 1 and 0 when a1 = am+1 = 1. For m ≥ 2,the number of shapes in Shapem,l with fixed a1 = δx1,0, am+1 = δxm+1,0 is boundedby

2m−1(m− 1)!

(k − l − 1

m− 1

)2(l

m− a1 − am+1

). (6.4.22)

Consequently, for all m ≥ 2,

|Shapem,l| ≤ k(2k3)m−1

(m− 1)!. (6.4.23)

Proof Since r1 = rm+1 = 1, there are 2m−1 directions in which the commonparts can be traversed. Since there are m distinct parts, there are m+ 1 commonparts. The first part contains vertex a, the last part contains vertex b. Thus, thereare (m− 1)! orders ~om+1 of the common parts when we have fixed the directionsin which the paths can be traversed.

In counting the number of ~xm+1, ~sm,~tm, we repeatedly use the fact that thereare

(a+b−1b−1

)possible sequences (y1, . . . , yb) ∈ Nb0 such that

∑bj=1 yj = a. This can

be seen by representing a as a sequence of a ones, separated by a−1 zeros. We drawb zeros, which we can do in

(a+b−1b−1

)possible ways. Then, we note that a sequence

(y1, . . . , yb) ∈ Nb0 such that∑b

j=1 yj = a can be obtained uniquely by letting yi be

the number of ones in between the (i− 1)st and ith chosen zero. Similarly, there

are(a−1b−1

)possible sequences (y1, . . . , yb) ∈ Nb such that

∑bj=1 yj = a, since we can

apply the previous equality to (y1 − 1, . . . , yb − 1) ∈ Nb.


Using the above, we continue to count the number of shapes. The number of(s1, . . . , sm) ∈ Nm such that sj ≥ 1 and

∑mj=1 sj = k − l equals(

k − l − 1

m− 1

). (6.4.24)

The same applies to (t1, . . . , tm) ∈ Nm such that tj ≥ 1 and∑m

j=1 tj = k − l. In

counting the number of possible ~xm+1 such that∑m+1

j=1 xj = l, we need to counttheir numbers separately for x1 = 0 and x1 ≥ 1, and for xm+1 = 0 and xm+1 ≥ 1.When m = 1, the number is zero when x1 = x2 = 0, since x1 = x2 = 0 impliesthat the paths share no edges. Denote a1 = δx1,0, am+1 = δxm+1,0, and supposethat m− a1 − am+1 ≥ 0. Then, there are(

l

m− a1 − am+1

)(6.4.25)

possible choice of ~xm+1 with fixed a1 = δx1,0, am+1 = δxm+1,0. The claim follows

by multiplying these bounds on the number of choices for ~rm+1, ~om+1, ~sm, ~tm and~xm+1.

To prove (6.4.23), we continue by bounding(k − l − 1

m− 1

)(m− 1)! =

1

(m− 1)!

( (k − l − 1)!

(k − l −m)!

)2

≤ k2(m−1)

(m− 1)!, (6.4.26)

and, using that(ab

)≤ ab/b! and l ≤ k,(

l

m− a1 − am+1

)≤ lm−a1−am+1

(m− a1 − am+1)!≤ km. (6.4.27)

Therefore, the number of shapes in Shapem,l is, for each l ≥ 1 and m ≥ 2,bounded by

2m−1 k2(m−1)

(m− 1)!km = k

(2k3)m−1

(m− 1)!. (6.4.28)

We are now ready to complete the proof of Proposition 6.12:

Proof of Proposition 6.12. By (6.4.21) and applying Lemma 6.13, it suffices tosum

2m−1(m− 1)!

(k − l − 1

m− 1

)2(l

m− a1 − am+1

)(6.4.29)

×(2γ2

I

ν3I

)m−1ν−lI( γIuaνI

)1−a1( γIubνI

)1−am+1

over l ∈ [k−2], m ∈ [k− l] and a1, am+1 ∈ 0, 1, where, by convention,(l−1

)= 0.


We start with m = 1, for which we obtain that the sum of (6.4.29) over theother variables equals

γI( 1

ua+

1

ub

) ∞∑l=1

ν−(l−1)I +

γ2I

uaubνI

∞∑l=1

lν−(l−1)I

=γIν

2I

νI − 1

( 1

ua+

1

ub

)+

γ2IνI

uaub(νI − 1)2, (6.4.30)

where we use that, for a ∈ [0, 1),

∞∑l=0

a−l = a/(1− a),∞∑l=0

la−(l−1) = a2/(1− a)2. (6.4.31)

The terms in (6.4.30) are the first two terms appearing on the right-hand side of(6.4.7).

This leaves us to bound the contribution when m ≥ 2. Since (6.4.23) is indepen-dent of l, we can start by summing (6.4.29) over l ≥ 1, and over a1, am+1 ∈ 0, 1to obtain a bound of the form

k(1 +

γIuaνI

)(1 +

γIubνI

) νIνI − 1

∑m≥2

(2k3)m−1

(m− 1)!

(γ2I

ν3I

)m−1

(6.4.32)

= k(1 +

γIuaνI

)(1 +

γIubνI

) νIνI − 1

(e2k3γ2

I/ν3I − 1

).

The term in (6.4.32) is the last term appearing on the right-hand side of (6.4.7).Summing the bounds in (6.4.30) and (6.4.32) proves (6.4.7).

Exercises 6.16–6.18 study various consequences of our path-counting tech-niques. In the next section, we use Proposition 6.12 to prove lower bounds ongraph distances.

6.4.2 Logarithmic distance bounds for finite-variance weights

In this section, we prove that two uniformly chosen vertices that are condi-tioned to be connected are with high probability within distance (1 + ε) logν n,as formulated in the following theorem:

Theorem 6.14 (Logarithmic upper bound graph distances NRn(w)) Assumethat Conditions 1.1(a)-(c) hold, where ν = E[W 2]/E[W ] ∈ (1,∞). Then, for anyε > 0,

P(distNRn(w)(o1, o2) ≤ (1 + ε) logν n | distNRn(w)(o1, o2) <∞) = 1 + o(1). (6.4.33)


Organization of the proof of Theorem 6.14

We prove Theorem 6.14 by combining the branching process comparison to asecond moment method using Proposition 6.12 on the number of paths of a givenlength. More precisely, we fix m ≥ 1 large, and recall that Bm(o1) and Bm(o2)


denote the vertices at distance at most m from o1 and o2 respectively, and let∂Bm(o1) and ∂Bm(o2) denote the vertices at distance precisely equal to m. Wecondition on Bm(o1) and Bm(o2) such that ∂Bm(o1) 6= ∅ and ∂Bm(o2) 6= ∅. Bythe local weak convergence in Theorem 3.11, the probabilities of the latter eventis close to ζ2

m, where ζm = P(∂Bm(o) 6= ∅) is the probability that the branchingprocess that is the local weak limit of NRn(w) survives to generation m. See alsoCorollary 2.17 that proves the asymptotic independence of the neighborhoods ofo1 and o2. Then, ζm → ζ when m → ∞, and, conditionally on |∂Bm(o)| > 0,|∂Bm(o)| ≥ M whp, for any M and as m → ∞. This explains the branching-process approximation.

We now state the precise branching-process approximation result that we willrely upon. We take ui = wi/

√`n,

a = ∂Bm(o1), b = ∂Bm(o2), (6.4.34)

so that

ua =1√`n

∑i∈∂Bm(o1)

wi =Wm(o1)/√`n, (6.4.35)

ub =1√`n

∑i∈∂Bm(o2)

wi =Wm(o2)/√`n.

We formalize the above ideas in the following lemma:

Lemma 6.15 (Branching process approximation) As n→∞,

(Wm(o1),Wm(o2))d−→( Z(1)

m∑j=1

W ?(1)(j),

Z(2)m∑j=1

W ?(2)(j)), (6.4.36)

where (Z(1)

m , Z(2)

m )m≥0 are the generation sizes of two independent branching pro-cesses, and (W ?(1)(j))j≥1 and (W ?(2)(j))j≥1 are two independent sequences of i.i.d.random variables with distribution F ?.

Proof By Corollary 2.17, |∂Bm(o1)| and |∂Bm(o2)| jointly converge in distri-bution to (Z(1)

m , Z(2)

m ), which are independent generation sizes of the local weaklimit of NRn(w) as in Theorem 3.15. Each of the individuals in ∂Bm(o1) and∂Bm(o2) receives a mark Mi, and its weight is wMi

. By Proposition 3.13, thesemarks are i.i.d. random variables conditioned to be unthinned. Whp no ver-tex in Bm(o1) ∪ Bm(o2) is thinned. Then, Wm(oi) =

∑|∂Bm(oi)|j=1 W ?

n(j), where

(W ?n(j))j≥1 are i.i.d. copies of W ?

n . By Condition 1.1(a), W ?n

d−→ W ?, so that

Wm(oi)d−→ ∑|∂Bm(oi)|

j=1 W?

(i)(j). The joint convergence follows in a similar fash-ion, now using local weak convergence in probability.

Second moment method and path counting

Fix k = kn = (1 + ε) logν n − 2m. We next present the details of the secondmoment method that shows that whp, on the event that ∂Bm(o1) 6= ∅ and


∂Bm(o2) 6= ∅, there exist a path of length kn − 2m connecting ∂Bm(o1) and∂Bm(o2). This ensures that, on the event that ∂Bm(o1) 6= ∅ and ∂Bm(o2) 6= ∅,the event distNRn(w)(o1, o2) ≤ kn−2m occurs whp. For this, we take ui = wi/

√`n.

We aim to apply Proposition 6.12, for which we fix K ≥ 1 sufficiently large andtake a = ∂Bm(o1), b = ∂Bm(o2) and

Ia,b = i ∈ [n] : wi ≤ K \ (Bm(o1) ∪Bm(o2)). (6.4.37)

In order to apply Proposition 6.12, we start by investigating the constantsappearing in it:

Lemma 6.16 (Parameters in path counting) Conditionally on Bm(o1) andBm(o2), and with a = ∂Bm(o1), b = ∂Bm(o2), for k = (1 + ε) logν n,

nk(a, b)P−→∞, nk(a, b) = (1 + oP(1))nk(a, b), (6.4.38)

and, as n→∞,

Var(Nk(a, b))

E[Nk(a, b)]2≤ Kν2

ν − 1

( 1√`nua

+1√`nub

)+

K2ν2

(ν − 1)`nuaub+ oP(1). (6.4.39)

Proof By (6.4.3),

nk(a, b) = uaubνk−1Ia,b

, (6.4.40)

andnk(a, b)

nk(a, b)=(νIa,b/νIa,b,k

)k−1. (6.4.41)

We start by investigating νI. Denote

ν(K) =E[W 2

1W≤K]

E[W ]. (6.4.42)

Then, by (6.4.37) and the fact that Bm(o1) and Bm(o2) contain a finite numberof vertices,

limn→∞

νIa,b = ν(K). (6.4.43)

The same applies to νIa,b,k . Then, with K > 0 chosen so large that ν(K) ≥ ν−ε/2and with k = (1 + ε) logν n,

nk(a, b) = uaubνk−1Ia,b

=Wm(o1)Wm(o2)

`nn(1+ε) log νIa,b/ log ν−1 P−→∞, (6.4.44)

where K and n are so large that (1+ε)ν(K)/ν > 1. This proves the first propertyin (6.4.38).

To prove the second property in (6.4.38), we note that the set Ia,b,k is obtainedfrom Ia,b by removing the k vertices with highest weight. Since wi ≤ K for alli ∈ I (recall (6.4.37)), νIa,b ≤ νIa,b,k + kK/`n. Since k ≤ A log n, we thereforearrive at

nk(a, b)

nk(a, b)≤(1 + kK/(`nνIa,b,k)

)k−1= ek

2K/(`nνIa,b,k ) P−→ 1, (6.4.45)


as required.To prove (6.4.39), we rely on Proposition 6.12. We have already shown that

nk(a, b) = E[Nk(a, b)]→∞, so that the first term on the right-hand side of (6.4.7)is oP(E[Nk(a, b)]

2). Further, by (6.4.37),

γI ≤ νI(maxi∈I

ui) ≤νIK√`n, (6.4.46)

so that, for k ≤ A log n with A > 1 fixed,

(1 +γIuaνI

)(1 +γIubνI

)k(e2k3γ2I/ν

3I − 1) = oP(1). (6.4.47)

Substituting these bounds into (6.4.39) and using (6.4.38) yields the claim.


Now we are are ready to complete the proof of Theorem 6.14. We must show that

P(kn < distNRn(w)(o1, o2) <∞) = o(1). (6.4.48)

Indeed, then P(distNRn(w)(o1, o2) > kn | distNRn(w)(o1, o2) < ∞) = o(1) since,P(distNRn(w)(o1, o2) <∞)→ ζ2 > 0 by Theorem 3.17. We rewrite

P(kn < distNRn(w)(o1, o2) <∞) (6.4.49)

= P(kn < distNRn(w)(o1, o2) <∞, ∂Bm(o1) 6= ∅, ∂Bm(o2) 6= ∅)

≤ P(Nkn−2m(∂Bm(o1), ∂Bm(o2)) = 0, ∂Bm(o1) 6= ∅, ∂Bm(o2) 6= ∅).

Recall that k = kn = (1 + ε) logν n. By the Chebychev inequality [Volume 1,Theorem 2.18], the conditional probability of distNRn(w)(o1, o2) > kn givenBm(o1), Bm(o2) is at most

Var(Nk−2m(a, b))

E[Nk−2m(a, b)]2≤ Kν2

ν − 1

( 1√`nua

+1√`nub

)+

K2ν2

(ν − 1)`nuaub+ oP(1). (6.4.50)

When ∂Bm(o1) 6= ∅ and ∂Bm(o2) 6= ∅, by (6.4.35),

1√`nua

+1√`nub

P−→( Z(1)

m∑j=1

W ?(1)(j))−1

+( Z(2)

m∑j=1

W ?(2)(j))−1

P−→ 0, (6.4.51)

when m→∞. Therefore,

P(Nk−2m(a, b) = 0 | ∂Bm(o1) 6= ∅, ∂Bm(o2) 6= ∅

)P−→ 0, (6.4.52)

and, by Lebesgues’ Dominated Convergence Theorem [Volume 1, Theorem A.1],

P(distNRn(w)(o1, o2) > kn, ∂Bm(o1) 6= ∅, ∂Bm(o2) 6= ∅)→ 0, (6.4.53)

when first n→∞ followed by m→∞, which completes the proof.

We close this section by describing what happens when τ = 3, and there areno slowly-varying functions.


Distances for the critical case τ = 3

When τ = 3, wi is approximately c(n/i)1/2. It turns our that this changes thedistances only by a doubly logarithmic factor:

Theorem 6.17 (Graph distances NRn(w) in critical τ = 3 case) Assume thatCondition 1.1(a)-(b) hold, and that there exists constants c2 > c1 > 0 and α > 0such that for all x ≤ nα,

[1− Fn](x) ≥ c1/x2, (6.4.54)

and for all x ≥ 0,

[1− Fn](x) ≤ c2/x2. (6.4.55)

Then, conditionally on distNRn(w)(o1, o2) <∞,

distNRn(w)(o1, o2) log log n

log nP−→ 1. (6.4.56)


The lower bound in Theorem 6.17 is already stated in Corollary 6.5. The upperbound can be proved using the path-counting techniques in Proposition 6.12 andadaptations of it. We now sketch the proof.

Let η > 0 and let

αn = eν1−ηn . (6.4.57)

Define the core of NRn(w) to be

Coren = i : wi ≥ αn. (6.4.58)

The proof of Theorem 6.17 follows from the following two propositions:

Proposition 6.18 (Typical distances in the core) Under the conditions of The-orem 6.17, let V ′1 , V

′2 ∈ Coren be chosen with probability proportional to their

weight, i.e.,

P(V ′i = j) =wj∑

v∈Corenwv, (6.4.59)

and let H ′n be the graph distance between V ′1 , V′

2 in Coren. Then, for any ε > 0,there exists an η > 0 such that

P(H ′n ≤

(1 + ε) log n

log logn

)→ 1. (6.4.60)

Proposition 6.19 (From the periphery to the core) Under the conditions ofTheorem 6.17, let o1, o2 be two vertices chosen uniformly at random from [n].Then, for any η > 0,

P(dNRn(w)(o1,Coren) ≤ ν1−ηn , dNRn(w)(o2,Coren) ≤ ν1−η

n )→ ζ2. (6.4.61)


Proof of Theorem 6.17 subject to Propositions 6.18–6.19. To see that Propositions6.18–6.19 imply Theorem 6.17, we note that

dNRn(w)(o1, o2) ≤ dNRn(w)(o1,Coren) + dNRn(w)(o2,Coren) (6.4.62)

+ dNRn(w)(V′

1 , V′

2),

where V ′1 , V′

2 ∈ Coren are the vertices in Coren found first in the breadth-firstsearch from o1 and o2, respectively. Then, by Proposition 3.13, V ′1 , V

′2 ∈ Coren

are chosen with probability proportional to their weight. Fix kn = log n/ log logn.We conclude that, when n is so large that ν1−η

n ≤ εkn/4,

P(dNRn(w)(o1, o2) ≤ (1 + ε)kn) (6.4.63)

≥ P(dNRn(w)(oi,Coren) ≤ ν1−η

n , i = 1, 2)

× P(dNRn(w)(V

′1 , V

′2) ≤ (1 + ε/2)kn | dNRn(w)(oi,Coren) ≤ ν1−η

n , i = 1, 2).

By Proposition 6.19, the first probability converges to ζ2, and by Proposition6.18, the second probability converges to 1. We conclude that

P(dNRn(w)(o1, o2) ≤ (1 + ε)

log n

log log n

)→ ζ2.

Since also P(dNRn(w)(o1, o2) <∞)→ ζ2, this completes the proof.

The proofs of Propositions 6.18–6.19 follow from path-counting techniques sim-ilar to the ones carried out above. We now sketch their proofs, starting withProposition 6.18:

Proof of Proposition 6.18. We take

a = V ′1 , b = V ′2 , I = wi : wi ∈ [K,√αn]. (6.4.64)

The whole point is that there exists a constant c > 0 such that

νI ≥ c logαn = cν1−ηn , (6.4.65)

while ua ≥ αn/√`n, ub ≥ αn/

√`n, so that

E[Nk(a, b)] ≈ α2nckνk(1−η)n /`n →∞ (6.4.66)

for k = log n/((1− η) log νn) ≤ (1 + ε/2) log n/ log νn when η is such that 1/(1−η) ≤ 1 + ε/2. Further,

γI ≤√αn/

√`n, (6.4.67)

so that Var(Nk(a, b))/E[Nk(a, b)]2 → 0 by Proposition 6.12. Since νn = Θ(log n)

when τ = 3 (see (6.4.54)–(6.4.55) in Theorem 6.17), this completes the proof.

See Exercises 6.19–6.21 for some further properties of paths within the core.

Proof of Proposition 6.19. We again condition on ∂Bm(o1) 6= ∅, ∂Bm(o2) 6= ∅,the probability of which converges to ζ2 when first n→∞ followed by m→∞.


Then, we perform a second moment method on the number of paths between∂Bm(oi) and Coren. For this, we take k = ν1−η

n and

a = ∂Bm(o1), b = Coren, I = i : wi ≤ K\(Bm(o1)∪Bm(o2)). (6.4.68)

Then we follow the proof in (6.4.50)–(6.4.53) to show that

P(dNRn(w)(o1,Coren) > ν1−ηn , ∂Bm(o1) 6= ∅, ∂Bm(o2) 6= ∅)→ 0, (6.4.69)

as required. Note, for this, that, conditionally on Bm(o1), Bm(o2)

E[Nk(a, b)] ≈ ν(K)kWm(o1)1

`n

∑i∈Coren

wi, (6.4.70)

where ν(K)→∞ as K →∞, and where, by (6.4.54),

1

n

∑i∈Coren

wi ≥ αn[1− Fn](αn) ≥ c/αn. (6.4.71)

Therefore, E[Nk(a, b)] → ∞ as soon as k ≥ 2 logαn/ log ν(K), which is satisfiedfor K sufficiently large and k = ν1−η

n . Exercise 6.22 asks you to complete theabove proof.

6.4.3 Distances for IRGn(κ): Proofs of Theorems 3.16 and 6.1(ii)

In this section, we use the path-counting techniques in Section 6.4.1 to givesome missing proofs for general inhomogeneous random graphs. We assume that(κn) is graphical sequence of kernels with limit irreducible κ, and with ν = ‖Tκ‖ ∈(1,∞). We start by proving Theorem 6.1(ii), and then we use it to prove Theorem3.16.

Logarithmic upper bound on distances in Theorem 6.1(ii)

Without loss of generality, we may assume that κ is bounded, i.e., supx,y κ(x, y) <∞. Indeed, we can always stochastically dominate v with an unbounded κn frombelow by IRGn(κn) with a bounded kernel that approximates it arbitrarily well.Since graph distances increase by decreasing κn, if we prove Theorem 6.1(ii) inthe bounded case, the unbounded case will follow immediately. Further, we canapproximate a bounded κn from above by a finite-type kernel. Therefore, it nowsuffices to prove Theorem 6.1(ii) for a kernel of finite type.

Let us set up the notation for the finite-types case. For u, v ∈ [n], we letκn(u, v) = κn;iu,iv , where iu ∈ [k] denotes the type of vertex u ∈ [n]. We assumethat, for all i, j ∈ [k],

limn→∞

κn;i,j = κi,j, (6.4.72)

and

limn→∞

µn(i) =1

n#v ∈ [n] : iv = i = µ(i). (6.4.73)

In this case, ‖Tκn‖ is the largest eigenvalue of the matrix M(n)

i,j = κn;i,jµn(j),


which converges to the largest eigenvalue of the matrix Mi,j = κn;i,jµ(j), whichequals ν = ‖Tκ‖ ∈ (1,∞), by assumption. This factor arises due to the depletion-of-points effect, which guarantees that the paths used are self-avoiding. Withoutloss of generality, we assume that µ(i) > 0 for all i ∈ [k]. This sets the stage ofour analysis.

We fix m ≥ 1, and assume that ∂Bm(o1), ∂Bm(o2) 6= ∅. We will prove that

P(dIRGn(κn)(o1, o2) ≤ (1 + ε) logν n | Bm(o1), Bm(o2)

)= 1 + oP(1). (6.4.74)

We follow the proof of Theorem 6.14, and rely on path-counting techniques. Weagain take

a = ∂Bm(o1), b = ∂Bm(o2), (6.4.75)

and

Ia,b = [n] \ (Bm(o1) ∪Bm(o2)). (6.4.76)

Recall that

Nk(a, b) = #π ∈ Pk(a, b) : π occupied. (6.4.77)

We aim to use the second moment method for Nk(a, b), for which we investigatethe mean and variance of Nk(a, b). We compute

E[Nk(a, b)] =∑

π∈Pk(a,b)

P(π occupied) =∑

π∈Pk(a,b)

k∏l=0

κn(πl, πl+1)

n(6.4.78)

≤ 1

n〈x, T kκny〉,

where x = (xi)ki=1,y = (yi)

ki=1 with xi the number of type i vertices in ∂Bm(o1)

and yi the number of type i vertices in ∂Bm(o2), respectively. An identical lowerbound holds with an extra factor (µ

n− k)/µ

n, where µ

n= minj∈[k] µn(j) →

minj∈[k] µ(j) > 0, by assumption.Recall the notation and results in Section 3.4, and in particular Theorem 3.8(b).

The types of o1 and o2 are asymptotically independent, and the probability thato1 has type j is equal to µn(j), which converges to µ(j). On the event that thetype of o1 equals j1, the vector of the numbers of individuals in ∂Bm(o1) convergesin distribution to (Z(1,j1)

m (i))i∈[k], which, by Theorem 3.8(b), is close to M∞xκ(i)for some strictly positive random variable M∞. We conclude that

xd−→ Z(1,j1)

m (i), xd−→ Z(2,j2)

m (i), (6.4.79)

where the limiting branching processes are independent. Equation (6.4.79) re-places the convergence in Lemma 6.15 for GRGn(w).

We conclude that, for k = kn = (1 + ε) logν n, and conditionally on a =∂Bm(o1), b = ∂Bm(o2) such that ∂Bm(o1), ∂Bm(o2) 6= ∅,

E[Nk(a, b)]P−→∞. (6.4.80)

This completes the analysis of the first moment.


We next analyse Var(Nk(a, b)) and prove that Var(Nk(a, b))/E[Nk(a, b)]2 P−→

∞. Here our proof will be slightly more sketchy.Recall (6.4.18). In this case, we compute, to replace (6.4.19),

P(π, ρ occupied) = P(π occupied)P(ρ occupied | π occupied). (6.4.81)

Recall the definition of a shape in (6.4.16). Fix σ ∈ Shapem,l and ρ ∈ Pk(a, b)with Shape(π, ρ) = σ. The factor P(ρ occupied | π occupied), summed out overthe free vertices of ρ (i.e., the ones that are not also vertices in π) gives rise tom factors of the form T tiκn(iπui , iπvi )/n for i ∈ [m + 1] and some vertices πui and

πvi in the path (πi)ki=0. We use that, uniformly in t ≥ 1,

k2

nmaxi,j∈[k]

T tκn(i, j) ≤ Ck2

n‖Tκn‖t, (6.4.82)

where k2 is an upper bound on the number of choices for πui and πvi . Thus, for

every of the m subpaths of length ti we obtain a factor Ck2

n‖Tκn‖ti . We arrive at∑

π, ρ ∈ Pk(a, b)Shape(π, ρ) = σ

P(π, ρ occupied) (6.4.83)

≤ E[Nk(a, b)]m∏i=1

Ck2

n‖Tκn‖ti ≤ E[Nk(a, b)]‖Tκn‖k−l

(Ck2

n

)m.

This replaces (6.4.20), and the proof can now be completed in an identical way asthe proof of (6.4.7) combined with the proof of (6.4.48) in the proof of Theorem6.14. We omit further details.

Concentration of the giant in Theorem 3.16

By Theorem 3.11, we know that IRGn(κn) converges in probability in the localweak sense. Therefore, we aim to apply Theorem 2.26, for which we need to verifythat(2.5.7) holds.

We rewrite the expectation appearing in (2.5.7) as

1

n2E[#x, y ∈ [n] : |C (x)|, |C (y)| ≥ k, x←→/ y

]= P

(|C (o1)|, |C (o2)| ≥ k, o1 ←→/ o2

). (6.4.84)

We condition on Bk(o1) and Bk(o2), and note that the events that |C (o1)| ≥ kand |C (o2)| ≥ k are measurable with respect to Bk(o1) and Bk(o2) to obtain

P(|C (o1)|, |C (o2)| ≥ k, o1 ←→/ o2

)= E

[1|C (o1)|,|C (o2)|≥kP

(o1 ←→/ o2 | Bk(o1), Bk(o2)

)]. (6.4.85)

By Exercise 6.23, together with Theorem 3.11,

limk→∞

limn→∞

P(∂Bk(o1) = ∅, |C (o1)| ≥ k

)= 0. (6.4.86)


Therefore,

P(|C (o1)|, |C (o2)| ≥ k, o1 ←→/ o2

)= E

[1|C (o1)|,|C (o2)|≥k,∂Bk(o1),∂Bk(o2) 6=∅P

(o1 ←→/ o2 | Bk(o1), Bk(o2)

)].

(6.4.87)

When ∂Bk(o1), ∂Bk(o2) 6= ∅, it immediately follows that |C (o1)|, |C (o2)| ≥ k.Thus,

P(|C (o1)|, |C (o2)| ≥ k, o1 ←→/ o2

)= E

[1∂Bk(o1),∂Bk(o2)6=∅P

(o1 ←→/ o2 | Bk(o1), Bk(o2)

)]+ ok(1), (6.4.88)

where ok(1) denotes a quantity that tends to zero when first n→∞ followed byk →∞.

In the proof of Theorem 6.1(ii), we have shown that, on ∂Bk(o1), ∂Bk(o2) 6=∅,

P(dIRGn(κn)(o1, o2) ≤ (1 + ε) logν n | Bk(o1), Bk(o2)

)= 1 + oP(1). (6.4.89)

In particular, on ∂Bk(o1), ∂Bk(o2) 6= ∅,

P(o1 ←→/ o2 | Bk(o1), Bk(o2)

)= oP(1). (6.4.90)

Since also P(o1 ←→/ o2 | Bk(o1), Bk(o2)

)≤ 1, the Dominated Convergence The-

orem [Volume 1, Theorem A.1] completes the proof of (2.5.7) for IRGn(κn), asrequired.

6.5 Related results on distances for inhomogeneous random graphs

In this section, we discuss some related results for inhomogeneous randomgraphs. While we give intuition about their proofs, we shall not include them infull detail.

6.5.1 The diameter in inhomogeneous random graphs

In this section, we investigate the diameter of IRGn(κn), which is defined to bethe maximal finite graph distance between any pair of vertices, i.e., the diameterdiam(G) of the graph G equals

diam(G) = maxu,v : distG(u,v)<∞

distG(u, v). (6.5.1)

We shall see that for IRGn(κ), the diameter tends to be much larger than thetypical graph distances, which is due to long thin lines which are distributedas subcritical IRGn(κ) with a subcritical κ by a duality principle for IRGn(κ).Before we state the results, we introduce the notion of the dual kernel :

6.5 Related results on distances for inhomogeneous random graphs255

Definition 6.20 (Dual kernel for IRGn(κ)) Let (κn) be a sequence of super-critical kernels with limit κ. The dual kernel is the kernel κ defined by κ(x, y) =κ(x, y), with reference measure dµ(x) = (1− ζκ(x))µ(dx).

The dual kernel shall describe the graph after the removal of the giant compo-nent. Here, the reference measure µ measures the size of the graph. In this case,a vertex x is in the giant component with probability 1 − ζκ(x), in which caseit must be removed. Thus, µ describes the proportion of vertices of the varioustypes which are outside the giant component. As before, we define the operatorTκ by the equality

(Tκf)(x) =

∫Sκ(x, y)f(y)dµ(y) =

∫Sκ(x, y)f(y)[1− ζκ(x)]µ(dy), (6.5.2)

and we write ‖Tκ‖ for

‖Tκ‖ = sup‖Tκf‖2 : f ≥ 0, ‖f‖µ,2 = 1

, (6.5.3)

where

‖f‖2µ,2 =

∫Sf2(x)µ(dx). (6.5.4)

Theorem 6.21 (The diameter of IRGn(κ) in the finite-types case) Let (κn) be asequence of kernels with limit κ, which has finitely many types. If 0 < ‖Tκ‖ < 1,then

diam(IRGn(κn))

log nP−→ 1

log ‖Tκ‖−1(6.5.5)

as n→∞. If ‖Tκ‖ > 1 and κ irreducible, then

diam(IRGn(κn))

log nP−→ 2

log ‖Tκ‖−1+

1

log ‖Tκ‖, (6.5.6)

where κ is the dual kernel to κ.

If we compare Theorem 6.21 to Theorem 6.2, we see that the diameter has thesame scaling as the typical graph distance, but that the limit in probability ofdiam(IRGn(κ))/ log n is strictly larger than the one for dIRGn(κ)(o1,o2)/ log n con-ditioned on dIRGn(κ)(o1,o2) < ∞. This effect is particularly noticeable in the casewhen τ ∈ (2, 3), where dIRGn(κ)(o1,o2)/ log log n, conditionally on dIRGn(κ)(o1,o2) <∞, converges in probability to a finite limit, while diam(IRGn(κ))/ log n con-verges to a non-zero limit. This can be explained by noticing that the diameterin IRGn(κ) arises due to very this lines of length of order log n. Since these thislines involve only very few vertices, they will not contribute to dIRGn(κ)(o1,o2), butthey do to diam(IRGn(κ)). This is another argument why we prefer to work withtypical graph distances than with the diameter. Exercise 6.24 investigates theconsequences for ERn(λ/n).

We will not prove Theorem 6.21 here. For GRGn(w), it will follow from The-orem 7.16, which proves a related result for the configuration model.


6.5.2 Distance fluctuations for GRGn(w) with finite-variance de-grees

We continue by studying the fluctuations of the typical graph distance whenE[W 2] < ∞. We shall impose a slightly stronger condition on the distributionfunction F of W , namely, that there exists a τ > 3 and c > 0 such that

1− F (x) ≤ cx−(τ−1). (6.5.7)

Equation (6.5.7) implies that the degrees have finite variance, see Exercise 6.25.

Theorem 6.22 (Limit law for the typical graph distance in CLn(w)) Assumethat (6.5.7) is satisfied, and let ν = E[W 2]/E[W ] > 1. For k ≥ 1, define ak =blogν kc − logν k ∈ (−1, 0]. Then, for CLn(w) with w as in (1.3.15), there existrandom variables (Ra)a∈(−1,0] with lim supK→∞ supa∈(−1,0] P(|Ra| < K) = 1 suchthat, as n→∞ and for all k ∈ Z,

P(dNRn(w)(o1, o2)− blogν nc = k | dNRn(w)(o1, o2) <∞

)(6.5.8)

= P(Ran = k) + o(1).

While Theorem 6.1 implies that, conditionally on o1 and o2 being connected,dNRn(w)(o1, o2)/ log n

P−→ 1/ log ν, Theorem 6.22 implies that the fluctuations ofdNRn(w)(o1, o2) around logν n remain uniformly bounded in probability.

The random variables (Ra)a∈(−1,0] can be determined in terms of the limit law ina branching process approximation of the neighborhoods of CLn(w), and depend

sensitively on a, which implies that although(dNRn(w)(o1, o2)− blogν nc

)∞n=2

is a

tight sequence of random variables, it does not weakly converges. See Exercises6.26–6.28 for further properties of this sequence of random variables.

6.5.3 Distances GRGn(w) for τ = 3

Theorem 6.23 (Critical case: interpolation) Consider GRGn(W ), where thei.i.d. weights satisfy

P(W1 > k) = k−2(log k)2α+o(1) (6.5.9)

for some α > 0. Consider two vertices o1, o2 chosen independently and uniformlyat random from the largest connected component Cmax of GRGn(W ). Then

distGRGn(W )(o1, o2) = (1 + oP(1))1

1 + 2α

log n

log log n. (6.5.10)



Theorem 6.1 is a simplified version of (Bollobas et al., 2007, Theorem 3.14). Afirst version of Theorem 6.1 was proved in Chung and Lu (2002a, 2003) for theexpected degree random graph, in the case of admissible deterministic weights.


We refer to (Chung and Lu, 2003, p. 94) for the definition of admissible degreesequences.

Theorem 6.3 for the expected degree random graph or Chung-Lu model isfirst proved by Chung and Lu (2002a, 2003), in the case of deterministic weightswi = c ·(i/n)−1/(τ−1), having average degree strictly greater than 1 and maximumdegree m satisfying logm log n/ log log n. These restrictions were lifted in(Durrett, 2007, Theorem 4.5.2). Indeed, the bound on the average distance isnot necessary, since, for τ ∈ (2, 3), ν = ∞ and therefore the IRG is alwayssupercritical. An upper bound as in Theorem 6.3 for the Norros-Reittu modelwith i.i.d. weights is proved by Norros and Reittu (2006).

Theorem 6.2 has a long history, and many versions of it have been proven in theliterature. We refer the reader to Chung and Lu (2002a, 2003) for the Chung-Lumodel, and van den Esker et al. (2008) for its extensions to the Norros-Reittumodel and the generalized random graph. Theorem 6.3 has also been proved inmany versions, both fully as well as in partial forms, see Norros and Reittu (2006),Chung and Lu (2002a, 2003), as well as Dereich et al. (2012).


As far as we are aware, the proof of Theorem 6.4 is new in the present con-text. Similar arguments have been used often though to prove lower bounds ondistances in various situations.

The truncated first moment method in the proof of Theorem 6.6 is inspired byDereich et al. (2012).


Theorem 6.9 is novel in its precise form, and also its proof is different from theones in the literature. See the notes of Section 6.1 for the relevant references.


The path-counting techniques in Proposition 6.12 are novel. They are inspired bythe path-counting techniques using by Eckhoff et al. (2013) for smallest-weightproblems on the complete graph, where many of the counting arguments alreadyappeared. Related proofs for the upper bound on distNRn(w)(o1, o2) when ν <∞ asin Theorem 6.14 often rely on branching process comparisons up to a generationm = mn →∞.


Theorem 6.22 is proved by van den Esker et al. (2008), both in the case of i.i.d.degrees as well as for deterministic weights under a mild further condition on thedistribution function.

Theorem 6.21 is a special case of (Bollobas et al., 2007, Theorem 3.16). Inthe special case of ERn(λ/n), it is extends a previous result of Chung and Lu(2001) that proves logarithmic asymptotics of diam(ERn(λ/n)), and it negativelyanswers a question of Chung and Lu (2001). Related results for the configurationmodel, which also imply results for the generalized random graph, can be found


in Fernholz and Ramachandran (2007). See also Theorem 7.16 below. See alsoRiordan and Wormald (2010) for additional results and branching-process proofs.There, also the case where λ > 1 with λ−1 n−1/3 are discussed. Similar resultshave been derived by Ding et al. (2011, 2010).


Exercise 6.1 (Typical distances in ERn(λ/n)) Fix λ > 1. Use either Theorem6.1 or Theorem 6.2 to prove that

dERn(λ/n)(o1, o2)/ log nP−→ 1/ log λ

conditionally on dERn(λ/n)(o1, o2) <∞.

Exercise 6.2 (Typical distances when ν =∞) Prove that

dNRn(w)(o1, o2)/ log nP−→ 0

when ν =∞, by using an appropriate truncation argument and monotonicity.

Exercise 6.3 (Power-law tails in key example of deterministic weights) Let wbe defined as in (1.3.15), and assume that F satisfies

1− F (x) = x−(τ−1)L(x), (6.7.1)

where the exponent satisfies τ ∈ (2, 3), and where x 7→ L(x) is slowly varying.Prove that (6.1.4) holds.

Exercise 6.4 (Power-law tails for i.i.d. weights) For i.i.d. weights w = (wi)i∈[n]

with distribution F satisfying that (6.7.1) with τ ∈ (2, 3), and where x 7→ L(x) isslowly varying. Prove that (6.1.4) holds with probability converging to 1.

Exercise 6.5 (Bound on truncated forward degree νn(b)) Prove (6.2.42) bycombining (1.4.2) in Lemma 1.13 with `n = Θ(n) by Conditions 1.1(a)–(b).

Exercise 6.6 (Conditions (6.2.1) and Condition 6.4) Show that when there isprecisely one vertex with weight w1 =

√n, whereas wi = λ > 1, then (6.2.1) holds,

but Condition 6.4 does not. Argue that the upper bound derived in Theorem 6.4is not sharp, since the vertex 1 can occur at most once in a self-avoiding path.

Exercise 6.7 (Lower bound on fluctuations) Adapt the proof of Theorem 6.4to show that for every ε, we can find a constant K = K(ε) > 0 such that

P(dNRn(w)(o1, o2) ≤ logνn (n)−K

)≤ ε. (6.7.2)

Conclude that if log νn = log ν + o(1/ log n), then the same statement holds withlogν n replacing logνn n.

Exercise 6.8 (Proof Corollary 6.5) Adapt the proof to Theorem 6.4 to proveCorollary 6.5.


Exercise 6.9 (Lower bound on typical distances for τ = 3) Let wi = c√

(n/i),so that τ = 3. Prove that νn/ log n→ c. Use Corollary 6.5 to obtain that for anyε > 0,

P(dNRn(w)(o1, o2) ≤ (1− ε) log n

log log n

)= o(1). (6.7.3)

Exercise 6.10 (Lower bound on typical distances for τ ∈ (2, 3)) Let wi =c/i1/(τ−1) with τ ∈ (2, 3). Prove that there exists a constant c′ > 0 such thatνn ≥ c′n(3−τ)/(τ−1). Show that Corollary 6.5 implies that dNRn(w)(o1, o2) ≥ (τ −1)/(τ − 3) in this case. How useful is this bound?

Exercise 6.11 (Convergence in probability of typical distance in IRGn(κn))Suppose that the graphical sequence of kernels (κn) satisfies supx,y,n κn(x, y) <∞,where the limit κ is irreducible and ν = ‖Tκ‖ > 1. Prove that Theorem 3.16together with Theorem 6.1(i–ii) imply that, conditionally on dIRGn(κn)(o1, o2) <∞,

dIRGn(κn)(o1, o2)/ log nP−→ 1/ log ν. (6.7.4)

Exercise 6.12 (Convergence in probability of typical distance in IRGn(κn))Suppose that the graphical sequence of kernels (κn) converges to κ, where κ is ir-reducible and ‖Tκ‖ =∞. Prove that Theorem 3.16 together with Theorem 6.1(iii)imply, conditionally on dIRGn(κn)(o1, o2) <∞,

dIRGn(κn)(o1, o2)/ log nP−→ 0. (6.7.5)

Exercise 6.13 (Distance between fixed vertices) Show that (6.2.31) and Lemma6.7 imply that for all a, b ∈ [n] with a 6= b,

P(distNRn(w)(a, b) ≤ kn) ≤ wawb`n

kn∑k=1

k−1∏l=1

νn(bl ∧ bk−l) (6.7.6)

+ (wa + wb)k?∑k=1

[1− F ?n ](bk)

k∏l=1

νn(bl).

Exercise 6.14 (Lower bound on fluctuations?) Adapt the proof of Theorem 6.6to show that for every ε, we can find a constant K = K(ε) > 0 such that

P(dNRn(w)(o1, o2) ≤ 2 log log n

| log (τ − 2)| −K)≤ ε. (6.7.7)

Hint: choose bk = Lb1/(τ−2)k−1 , where the constant L > is chosen sufficiently large.

Exercise 6.15 (Upper bound on the expected number of paths) Prove (6.4.4)for an inhomogeneous random graph with vertex set I and with edge probabilitiespij = uiuj for every i, j ∈ I.

Exercise 6.16 (Variance of two paths) Prove that Var(Nk(a, b)) ≤ E[Nk(a, b)]for k = 2.


Exercise 6.17 (Variance of three paths) Compute Var(N3(a, b)) explicitly, andcompare it to the bound in (6.4.7).

Exercise 6.18 (Variance on paths for ERn(λ/n)) Let A,B ⊆ [n], and letNk(A,B) denote the number of paths of length k connecting A to B (where apath connecting A and B avoids A and B except for the starting and end point).Show that for k ≤ K log n,

E[Nk(A,B)] = λk|A||B|(1− |A|+ |B|

n

)k(1 + o(1)). (6.7.8)

Use Proposition 6.12 to bound the variance of Nk(A,B), and prove that

Nk(A,B)/E[Nk(A,B)]P−→ 1 (6.7.9)

when |A|, |B| → ∞ with |A|+ |B| = o(n/k).

Exercise 6.19 (νn bound for τ = 3) Prove that (6.4.54) and (6.4.55) implythat νI ≥ c logαn by using

1

n

∑i∈I

w2i = E[W 2

n1Wn∈[K,√αn]] = 2

∫ √αnK

x[Fn(√αn)− Fn(x)]dx. (6.7.10)

Exercise 6.20 (Expected number of paths within Coren diverges) Prove that

E[Nk(a, b)]→∞for a = V ′1 , b = V ′2 and k = log n/((1− η) log νn).

Exercise 6.21 (Concentration of number of paths within Coren) Prove that

Var(Nk(a, b))/E[Nk(a, b)]2 → 0

for a = V ′1 , b = V ′2 and k = log n/((1− η) log νn).

Exercise 6.22 (Completion proof Proposition 6.18) Complete the proof ofProposition 6.18 by adapting the arguments in (6.4.50)–(6.4.53).

Exercise 6.23 (Size versus boundary of large clusters) Let the graph G = (V,E)converge locally weakly in probability. Prove that

limk→∞

limn→∞

P(∂Bk(o1) = ∅, |C (o1)| ≥ k

)= 0. (6.7.11)

Exercise 6.24 (Diameter of ERn(λ/n)) Recall Theorem 6.21. For ERn(λ/n),show that ‖Tκ‖ = λ and ‖Tκ‖ = µλ, where µλ is the dual parameter in [Volume1, (3.6.6)], so that Theorem 6.21 becomes

diam(ERn(λ/n))

log nP−→ 2

logµ−1λ

+1

log λ. (6.7.12)

Exercise 6.25 (Finite variance degrees when (6.5.7) holds) Prove that (6.5.7)implies that E[W 2] < ∞. Use this to prove that the degrees have uniformlybounded variance when (6.5.7) holds.


Exercise 6.26 (Tightness of centered typical graph distances in CLn(w)) Provethat, under the conditions of Theorem 6.22, and conditionally on dNRn(w)(o1, o2) <

∞, the sequence(dNRn(w)(o1, o2)− blogν nc

)∞n=2

is tight.

Exercise 6.27 (Non-convergence of centered typical graph distances in CLn(w))Prove that, under the conditions of Theorem 6.22, and conditionally on dNRn(w)(o1, o2) <∞, the sequence dNRn(w)(o1, o2)−blogν nc does not weakly converge when the dis-tribution of Ra depends continuously on a and when there are a, b ∈ (−1, 0] suchthat the distribution of Ra is not equal to the one of Rb.

Exercise 6.28 (Extension Theorem 6.22 to GRGn(w) and NRn(w)) Use [Vol-ume 1, Theorem 6.18] to prove that Theorem 6.22 holds verbatim for GRGn(w)and NRn(w) when (6.5.7) holds.

Chapter 7

Small-world phenomena in theconfiguration model

Abstract

In this chapter, we investigate the connectivity structure of the config-uration model by investigating its typical distances and its diameter.We also show that the configuration model is whp connected when theminimal degree is at least three.

In this chapter, we investigate graph distances in the configuration model (CM).We start with a motivating example.

Motivating example

Recall Figure 1.4(a) in which graph distances in the Autonomous Systems (AS)graph in Internet are shown. Such distances are known under the name of theAS-count. A relevant question is whether such a histogram can be predicted bythe graph distances in a random graph model having similar degree structureand size as the AS graph. Simulations indicating the properties of the typicalgraph distance for τ ∈ (2, 3) can be seen in Figure 7.1. In it, the distances of theAS-graph are compared to the ones in CMn(d) where n is equal to the numberof AS, which is n = 10, 940, and the best approximation to the exponent of thepower-law for the degree sequence of the AS-graph, which is τ = 2.25. We seethat the simulations of graph distances in this CMn(d) and the AS-counts arequite close.

0 1 2 3 4 5 6 7 8 9 10 110.0

0.1

0.2

0.3

0.4

Figure 7.1 Number of AS traversed in hopcount data (blue) compared tothe model (purple) with τ = 2.25, n = 10, 940.

263

264 Small-world phenomena in the configuration model

This figure also again raises the question how graph distances depend on ran-dom graph properties, such as n, but also the structure of the random graph inquestion. The configuration model is a highly flexible model, in the sense that itallows to choose the degree distribution in a highly general way. Thus, we canuse the CM to single out the relation between the graph distances and the degreestructure, in a similar way as that it allows to investigate the size of the giantcomponent and the connectivity as a function of the degree distribution as inChapter 4. Finally, we can verify whether the graph distances are closely relatedto those in inhomogeneous random graphs, as discussed in Chapter 6, so as tospot another sign of the wanted universality of structural properties of randomgraphs with similar degree distributions.


This chapter is organized as follows. In Section 7.1, we study the typical graphdistance in the configuration model. In Section 7.2, we prove these distance re-sults, using path counting techniques and comparisons to branching processes.In Section 7.3, we study infinite-mean branching processes, as these arise in theconfiguration model with infinite-variance degrees. In Section 7.4, we study thediameter of the configuration model. In Section 7.5, we state further results in theconfiguration model. We close this chapter with notes and discussion in Section7.6, and with exercises in Section 7.7.

7.1 The small-world phenomenon in CMn(d)

In this section, we describe the main results on distances in the configurationmodel, both in the case of finite-variance degrees as well as in the case of infinite-variance degrees. These results will be proved in the following sections.

Distances in configuration models with finite-variance degrees

We start by analyzing the typical graph distance in the case where the configu-ration model CMn(d) when Conditions 1.5(a)-(c) hold:

Theorem 7.1 (Typical distances in CMn(d) for finite-variance degrees) In theconfiguration model CMn(d), where the degrees d = (di)i∈[n] satisfy Conditions1.5(a)-(c) and where ν > 1, conditionally on distCMn(d)(o1, o2) <∞,

distCMn(d)(o1, o2)/ log nP−→ 1/ log ν. (7.1.1)

Theorem 7.1 shows that the typical distances in CMn(d) are of order logν n,and is thus similar in spirit as Theorem 6.2. We shall see that also its proof isquite similar.

Finite mean, infinite variance degrees

We next study the typical distance of the configuration model with degrees havingfinite mean and infinite variance. We start by formulating the precise condition on

7.2 Proofs of small-world results CMn(d) 265

the degrees that we shall work with. This condition is identical to the conditionon Fn for NRn(w) formulated in (6.1.4). Recall that Fn(x) denotes the proportionof vertices having degree at most x. Then, we assume that there exists a τ ∈ (2, 3)and for all δ > 0, there exist c1 = c1(δ) and c2 = c2(δ) such that, uniformly in n,

c1x−(τ−1+δ) ≤ [1− Fn](x) ≤ c2x

−(τ−1−δ), (7.1.2)

where the upper bound holds for every x ≥ 1, while the lower bound is only re-quired to hold for 1 ≤ x ≤ nα for some α > 1/2. The typical distances of CMn(d)under the infinite-variance condition in (7.1.2) are identified in the following the-orem:

Theorem 7.2 (Typical distances in CMn(d) for τ ∈ (2, 3)) Let the degreesd = (di)i∈[n] in the configuration model CMn(d) satisfy Conditions 1.5(a)-(b)and (7.1.2).Then, conditionally on distCMn(d)(o1, o2) <∞,

distCMn(d)(o1, o2)

log log nP−→ 2

| log (τ − 2)| . (7.1.3)

Theorem 7.2 is similar in spirit to Theorem 6.3 for NRn(w).

7.2 Proofs of small-world results CMn(d)

In this section, we give the proofs of Theorems 7.1 and 7.2 describing thesmall-world properties in CMn(d). These proofs are adaptations of the proofs ofTheorems 6.2 and 6.3, and we focus on the differences in the proofs. This section isorganized as follows. In Section 7.2.1 we give a branching process approximationfor the neighborhoods of a pair of uniform vertices in CMn(d). In Section 7.2.2we perform similar path-counting techniques as in Section 6.4.1, adapted to theconfiguration model where edges are formed by pairing half-edges.

7.2.1 Branching process approximation

In this section, we summarize some links between the breadth-first explorationin CMn(d) and branching processes.

Let, by convention, Z0(i) = 1 and, for m ≥ 1, let Zm(i) denote the numberof unpaired half-edges incident to vertices at graph distance m − 1 from vertexi, so that Z1(i) = di = S0. Thus, Z2(i) is obtained after pairing all the Z1(i)half-edges at distance 1 from the root. Then, Zm(i) equals STm(i), where Tm(i)is the time where all the Zm−1(i) half-edges at distance m − 2 from vertex ihave been paired. The above describes the breadth-first exploration from a singlevertex i. Let, as usual, V1 and V2 be two vertices chosen uniformly at randomfrom [n], and denote Z(n;i)

m = Zm(Vi), so that Z(n;i)

m is the number of unpaired half-edges at distance m − 1 from vertex Vi. The following proposition shows that,for all m ≥ 1, the processes (Z(1)

l , Z(2)

l )mk=0 are close to two independent two-stagebranching processes:


Corollary 7.3 (Coupling of neighborhoods of two vertices) Let the degreesd = (di)i∈[n] satisfy Conditions 1.5(a)-(b). Let (Z(1)

l , Z(2)

l )l≥0 be two independentbranching processes with offspring distribution D in the first generation, and off-spring distribution D? − 1 in all further generations. Then, for every m ≥ 1,

P(distCMn(d)(o1, o2) ≤ 2m) = o(1), (7.2.1)

and

(Z(n;1)

l , Z(n;2)

l )ml=0

d−→ (Z(1)

l , Z(2)

l )ml=0. (7.2.2)

Corollary 7.3 follows from the local-weak convergence in probability in Theorem4.1, combined with Corollaries 2.17–2.18.

7.2.2 Path-counting techniques

In this section, we present path-counting techniques similar to those in Section6.4.1. Since CMn(d) is a multigraph, and not a simple graph as NRn(w), we needto be precise what a path in CMn(d) is. We start by introducing some notation.

A path π of length k in CMn(d) means a sequence

π = (π0, s0), (π1, s1, t1), . . . , (πk−1, sk−1, tk−1), (πk, tk), (7.2.3)

where πi ∈ [n] denotes the ith vertex along the path, and si ∈ [dπi ] denotesthe label of the half-edge incident to πi and ti+1 ∈ [dπi+1

] denotes the label ofthe half-edge incident to πi+1. In particular, multiple edges between πi and πi+1

give rise to distinct paths through the same vertices. For a path π, we writeπ ⊂ CMn(d) when the path π in (7.2.3) is present in CMn(d), so that the half-edge corresponding to si is paired with the half-edge corresponding to ti+1 fori = 0, . . . , k − 1. Without loss of generality, we assume throughout that the pathπ is simple, i.e., π0, . . . , πk are distinct vertices.

In this section, we perform first and second moment computations on the num-ber of paths present in CMn(d). We start by proving upper bounds on the ex-pected number of paths.

Upper bounds on the expected number of paths in CMn(d)

For a, b ∈ [n], I ⊆ [n] and k ≥ 1, we let Pk(a, b) = Pk(a, b; I) denote the set ofk-paths that only use vertices in I, and we let

Nk(a, b) = Nk(a, b; I) = #π ∈ Pk(a, b) : π ⊆ CMn(d) (7.2.4)

denote the number of paths of length k between the vertices a and b. Then, weprove the following upper bound on the expected number of paths connecting aand b:

Proposition 7.4 (Expected numbers of paths) For any k ≥ 1, a, b ∈ [n] and(di)i∈[n],

E[Nk(a, b)] ≤dadb`n

(`n − 2k + 1)(`n − 2k)νk−1I , (7.2.5)


where

νI =∑

i∈I\a,b

di(di − 1)

`n. (7.2.6)

Proof The probability that the path π in (7.2.3) is present in CMn(d) is equalto

k∏i=1

1

`n − 2i+ 1, (7.2.7)

and the number of paths with fixed vertices π0, . . . , πk is equal to

dπ0

( k−1∏i=1

dπi(dπi − 1))dπk . (7.2.8)

Substituting π0 = a, πk = b, we arrive at

E[Nk(a, b)] =dadb

`n − 2k + 1

∑∗

π1,...,πk−1

k−1∏i=1

dπi(dπi − 1)

`n − 2i+ 1, (7.2.9)

where the sum is over distinct elements of I \ a, b. Let R denote the subset ofvertices of I \ a, b for which di ≥ 2. Then,

E[Nk(a, b)] =dadb

`n − 2k + 1

∑∗

π1,...,πk−1∈R

k−1∏i=1

dπi(dπi − 1)

`n − 2i+ 1, (7.2.10)

By an inequality of Maclaurin (Hardy et al., 1988, Theorem 52), for r = |R|,2 ≤ k ≤ r + 1 and any (ai)i∈R with ai ≥ 0,

(r − k + 1)!

r!

∑∗

π1,...,πk−1∈R

k−1∏i=1

ai ≤(1

r

∑i∈R

ai)k−1

. (7.2.11)

Let ai = di(di − 1), so that ∑i∈R

ai = `nνI. (7.2.12)

We arrive at

E[Nk(a, b)] =dadb

`n − 2k + 1(`nνI/r)

k−1k−1∏i=1

(r − i+ 1)

(`n − 2i+ 1)(7.2.13)

≤ dadb`n − 2k + 1

`n`n − 2k

νk−1I

k−2∏i=0

(1− ir)

(1− 2i`n

).

Further, `n =∑

i∈[n] di ≥ 2r, so that 1 − ir≤ 1 − 2i

`n. Substitution yields the

required bound.


Logarithmic lower bound typical distances CMn(d)

With Proposition 7.4 at hand, we can immediately prove the lower bound on thetypical graph distance in the case where the degrees have finite second moment(as in Theorem 6.4):

Theorem 7.5 (Logarithmic lower bound typical distances CMn(d)) Assumethat

lim supn→∞

νn > 1, (7.2.14)

where

νn = E[Dn(Dn − 1)]/E[Dn]. (7.2.15)

Then, for any ε > 0,

P(distCMn(d)(o1, o2) ≤ (1− ε) logνn n) = o(1). (7.2.16)

We leave the proof of Theorem 7.5, which is almost identical to that of Theorem6.4 with (7.2.5) in hand, as Exercise 7.2.

Truncated first moment method and log log lower bound for τ ∈ (2, 3)

We next extend the above upper bounds on the expected number of paths todeal with the case where τ ∈ (2, 3), where similarly to the setting in Section 6.2.2where NRn(w) was investigated, we need to truncate the degrees occurring in thearising paths. Our main result is as follows:

Theorem 7.6 (Loglog lower bound on typical distances in CMn(d)) Supposethat the weights d = (di)i∈[n] satisfy Condition 1.5(a) and that there exists aτ ∈ (2, 3) and c2 such that, for all x ≥ 1,

[1− Fn](x) ≤ c2x−(τ−1), (7.2.17)

Then, for every ε > 0,

P(

distCMn(d)(o1, o2) ≤ (1− ε) 2 log log n

| log (τ − 2)|)

= o(1). (7.2.18)

The proof of Theorem 7.6 is identical to that of Theorem 6.6, and we discussthe changes only. For a fixed set of distinct vertices (π0, . . . , πk), (7.2.7)-(7.2.8)yield that the probability that there exists edges between πi−1 and πi for alli = 1, . . . , k in CMn(d) is bounded above by

dπ0dπk

`n − 2k + 1

k−1∏i=1

dπi(dπi − 1)

`n − 2i+ 1. (7.2.19)

Equation (7.2.19) replaces the similar identity (6.2.7) for NRn(w). We see thatwπ0

and wπk in (6.2.7) are replaced with dπ0and dπk in (7.2.19), and, for i =

1, . . . , k − 1, the factors w2πi

in (6.2.7) are replaced with dπi(dπi − 1) in (7.2.19),while the factors `n in (6.2.7) is replaced with (`n − 2i+ 1) in (7.2.19).


Define, as in (6.2.32),

νn(b) =1

`n

∑i∈[n]

di(di − 1)1di≤b. (7.2.20)

Then, we can adapt the arguments in Section 6.2.2 to obtain that (see in partic-ular Exercise 6.13),

P(distCMn(d)(a, b) ≤ kn) ≤ dadb`n

kn∑k=1

`kn(`n − 2k − 1)!!

(`n − 1)!!

k−1∏l=1

νn(bl ∧ bk−l) (7.2.21)

+ (da + db)k?∑k=1

`kn(`n − 2k − 1)!!

(`n − 1)!![1− F ?

n ](bk)k∏l=1

νn(bl),

i.e., the bound in (6.7.6) is changed by factors `kn(`n−2k−1)!!

(`n−1)!!in the sum. For k =

O(log log n) and when Conditions 1.5(a)-(b) hold,

`kn(`n − 2k − 1)!!

(`n − 1)!!=

k∏i=1

`n`n − 2i+ i

= 1 +O(k2/`n) = 1 + o(1), (7.2.22)

so this change has only minor effect. Since (6.2.42) in Lemma 6.7 applies underthe conditions of Theorem 7.6, we can follow the proof of Theorem 6.6 verbatim.This completes the proof of Theorem 7.6.

Second moment method for the number of paths in CMn(d)

We next extend the above first moment bounds on the number of paths in CMn(d)to second moment methods. Define

nk(a, b) =`kn(`n − 2k − 1)!!

(`n − 1)!!

dadb`n

( ∑i∈I\a,b

di(di − 1)

`n

)k−1

, (7.2.23)

nk(a, b) =dadb`n

( ∑i∈Ia,b,k

di(di − 1)

`n

)k−1

, (7.2.24)

where Ia,b,k is the subset of I in which a and b, as well as the k − 1 indices withhighest degrees, have been removed. Let

νI =1

`n

∑i∈I

di(di − 1), γI =1

`3/2n

∑i∈I

di(di − 1)(di − 2). (7.2.25)

The following proposition replaces the similar Proposition 6.12, which was crucialin deriving lower bounds on typical distances:

Proposition 7.7 (Variance of number of paths) For any k ≥ 1, a, b ∈ I and(ui)i∈I,

E[Nk(a, b)] ≥ nk(a, b), (7.2.26)


while, assuming that νI > 1,

Var(Nk(a, b)) ≤ nk(a, b) + nk(a, b)2( γIν2

I

νI − 1

( 1

da+

1

db

)+

γ2IνI

dadb(νI − 1)2+ e′k

),

(7.2.27)

where

e′k =( k∏i=1

`n − 2i+ 1

`n − 2i− 2k + 1− 1)

(7.2.28)

+ k`2kn (`n − 4k − 1)!!

(`n − 1)!!

(1 +

γIdaνI

)(1 +

γIdbνI

) νIνI − 1

(e2k3γ2

I/ν3I − 1

).

Proof The proof of (7.2.26) follows immediately from (7.2.9), together with thefact that 1/(`n − 2i+ 1) ≥ 1/`n.

For the proof of (7.2.27), we follow the proof of (6.4.7), and discuss the differ-ences only. We recall that

Nk(a, b) =∑

π∈Pk(a,b)

1π⊆CMn(d) (7.2.29)

is the number of paths π of length k between the vertices a and b, where a pathis defined in (7.2.3). Since Nk(a, b) is a sum of indicators, its variance can bewritten as

Var(Nk(a, b)) (7.2.30)

=∑

π,ρ∈Pk(a,b)

[P(π, ρ ⊆ CMn(d))− P(π ⊆ CMn(d))P(ρ ⊆ CMn(d))

].

Equation (7.2.30) replaces (6.4.12) for NRn(w).We say that two paths π and ρ are disjoint when they use distinct sets of half-

edges. Thus, it is possible that the vertex sets π1, . . . , πk−1 and ρ1, . . . , ρk−1have a non-empty intersection, but then the half-edges leading in and out of thejoint vertices for π and ρ must be distinct. For NRn(w), pairs of paths using differ-ent edges are independent, so that these pairs do not contribute to Var(Nk(a, b)).For CMn(d), instead, for disjoint pairs π and ρ,

P(π, ρ ⊆ CMn(d)) =k∏i=1

`n − 2i+ 1

`n − 2i− 2k + 1P(π ⊆ CMn(d))P(ρ ⊆ CMn(d)),

(7.2.31)which explains the first contribution to e′k. For the other contributions, we followthe proof of (6.4.12) for NRn(w), and omit further details.

With Proposition 7.7 at hand, we can adapt the proof of Theorem 6.14 to CMn(d)to prove the following theorem:

Theorem 7.8 (Logarithmic upper bound graph distances CMn(d)) Assume


that Conditions 1.5(a)-(c) hold, where ν = E[D(D − 1)]/E[D] ∈ (1,∞). Then,for any ε > 0,

P(distCMn(d)(o1, o2) ≤ (1 + ε) logν n | distCMn(d)(o1, o2) <∞) = 1 + o(1). (7.2.32)

We leave the proof of Theorem 7.8 as Exercise 7.3.

7.2.3 A log log upper bound on the diameter core τ ∈ (2, 3)

In order to prove the upper bound on the typical distance for CMn(d) inTheorem 7.2, we use a different approach compared to the one in the proof ofTheorem 6.9. Our proof for the upper bound on the typical distance for CMn(d)in Theorem 7.2 is organized as follows:

(a) We first prove an upper bound on the diameter of the core of CMn(d), whichconsists of all vertices of degree at least (log n)σ for some σ > 0. This is thecontent of Theorem 7.9 below.

(b) Followed by the proof of Theorem 7.9, we use a second moment method toprove that any vertex that survives to sufficient large distance is whp quicklyconnected to the core.

Together, these two steps prove the upper bound on the typical distance forCMn(d) in Theorem 7.2. The bound on the diameter of the core is also useful instudying the diameter of CMn(d) when τ ∈ (2, 3) and dmin ≥ 3, which we discussin Section 7.4 below.

We take σ > 1/(3− τ) and define the core Coren of the configuration model tobe

Coren = i : di ≥ (log n)σ, (7.2.33)

i.e., the set of vertices with degree at least (log n)σ. Then, the diameter of thecore is bounded in the following theorem, which is interesting in its own right:

Theorem 7.9 (Diameter of the core) Fix τ ∈ (2, 3) and assume that (7.1.2)holds. For any σ > 1

3−τ , the diameter of Coren is with high probability boundedabove by

2 log log n

| log (τ − 2)| + 1. (7.2.34)

Proof We note that (7.1.2) implies that, for some α ∈ (1/2, 1/(τ − 1)),

maxi∈[n]

di ≥ u1, where u1 = nα. (7.2.35)

Define

Γ1 = i : di ≥ u1, (7.2.36)

so that Γ1 6= ∅. For some constant C > 0 to be determined later on, and fork ≥ 2, we recursively define

uk = C log n(uk−1

)τ−2. (7.2.37)


Then, we define

Γk = i : di ≥ uk. (7.2.38)

We identify uk in the following lemma:

Lemma 7.10 (Identification (uk)k≥1) For every k ≥ 1,

uk = Cak(log n)bknck , (7.2.39)

where

ck = α(τ − 2)k−1, ak = bk =1

3− τ [1− (τ − 2)k−1]. (7.2.40)

Proof We note that ck, bk, ak satisfy the recursions, for k ≥ 2,

ck = (τ − 2)ck−1, bk = 1 + (τ − 2)bk−1, ak = 1 + (τ − 2)ak−1, (7.2.41)

with initial conditions c1 = α, a1 = b1 = 0. Solving the recursions yields ourclaim.

In order to study connectivity of sets in CMn(d), we rely on the following lemma,which is of independent interest:

Lemma 7.11 (Connectivity sets in CMn(d)) For any two sets of vertices A,B ⊆[n],

P(A not directly connected to B) ≤ e−dAdB/(2`n), (7.2.42)

where, for any A ⊆ [n],

dA =∑i∈A

di (7.2.43)

denotes the total degree of vertices in A.

Proof There are dA half-edges incident to the set A, which we pair one by one.After having paired k half-edges, all to half-edges that are not incident to B, theprobability to pair the next half-edge to a half-edge that is not incident to Bequals

1− dB`n − 2k + 1

≤ 1− dB`n. (7.2.44)

Some half-edges incident to A may attach to other half-edges incident to A, sothat possibly fewer than dA half-edges need to be paired to pair all half-edgesincident to A. However, since each pairing uses up at most 2 half-edges incidentto A, we need to pair at least dA/2 half-edges, so that

P(A not directly connected to B) ≤(

1− dB`n

)dA/2≤ e−dAdB/(2`n), (7.2.45)

where we use that 1− x ≤ e−x.

The key step in the proof of Theorem 7.9 is the following proposition showingthat whp every vertex in Γk is connected to a vertex in Γk−1:


Proposition 7.12 (Connectivity between Γk−1 and Γk) Fix τ ∈ (2, 3) andassume that Conditions 1.5(a)-(b) and (7.1.2) hold. Fix k ≥ 2, and take C >2E[D]/c. Then, the probability that there exists an i ∈ Γk that is not directlyconnected to Γk−1 is o(n−δ), for some δ > 0 that is independent of k.

Proof We note that, by definition,∑i∈Γk−1

di ≥ uk−1|Γk−1| = uk−1n[1− Fn](uk−1). (7.2.46)

By (7.1.2), and since k 7→ uk is decreasing with u1 = nα,

[1− Fn](uk−1) ≥ c(uk−1)1−τ . (7.2.47)

As a result, we obtain that for every k ≥ 2,∑i∈Γk−1

di ≥ cn(uk−1)2−τ . (7.2.48)

By (7.2.48) and Lemma 7.11, using Boole’s inequality, the probability that thereexists an i ∈ Γk that is not directly connected to Γk−1 is bounded by

ne−uknuk−1[1−F (uk−1)]

2`n ≤ ne−cuk(uk−1)2−τ

2E[Dn] = n1− cC2E[Dn] . (7.2.49)

By Conditions 1.5(a)-(b), E[Dn] → E[D], so that, as n → ∞ and taking C >2E[D]/c, we obtain the claim for any δ < cC

2E[D]− 1.

We now complete the proof of Theorem 7.9:Proof of Theorem 7.9. Fix

k? =log log n

| log (τ − 2)| . (7.2.50)

As a result of Proposition 7.12, whp, the diameter of Γk? is at most 2k? + 1,because the distance between any vertex in Γk? and Γ1 is at most k?, while, byExercise 7.4, Γ1 forms a complete graph. Therefore, it suffices to prove that

Coren ⊆ Γk? . (7.2.51)

By (7.2.37), in turn, this is equivalent to uk? ≥ (log n)σ, for any σ > 1/(3 − τ).According to Lemma 7.10,

uk? = Cak? (log n)bk?nck? . (7.2.52)

We note that nck? = elogn(τ−2)k?

. Since, for 2 < τ < 3,

x(τ − 2)log x

| log (τ−2)| = x · x−1 = 1, (7.2.53)

we find with x = log n that nck? = e. Further, bk → 1/(τ − 3) as k →∞, so that(log n)bk? = (log n)1/(3−τ)+o(1), and ak = bk, so that also Cak? = C1/(τ−3)+o(1). Weconclude that

uk? = (log n)1/(3−τ)+o(1), (7.2.54)


so that, by picking n sufficiently large, we can make 1/(3 − τ) + o(1) ≤ σ. Thiscompletes the proof of Theorem 7.9.

We continue to use Theorem 7.9 to prove a log log n upper bound on the typicaldistance distCMn(d)(o1, o2) in the case where τ ∈ (2, 3). We start by describing thesetting. We assume that there exist τ ∈ (2, 3), α > 1/2 and c1 such that, uniformlyin n and x ≤ nα,

[1− Fn](x) ≥ c1x−(τ−1). (7.2.55)

Theorem 7.13 (A log log upper bound on typical distance for τ ∈ (2, 3)) Sup-pose that the empirical distribution function Fn of the degrees d = (di)i∈[n] satis-fies Conditions 1.5(a)-(b) and (7.2.55). Then, for every ε > 0,

limn→∞

P(

distCMn(d)(o1, o2) ≤ 2(1 + ε) log log n

| log (τ − 2)| | distCMn(d)(o1, o2) <∞)

= 1.

(7.2.56)

Proof We make crucial use of the branching process approximation in Section7.2.1. We let o1, o2 denote two vertices chosen uniformly at random from [n], andwe recall that, for i ∈ 1, 2, Z(n;i)

m denotes the number of unpaired or free halfedges incident to vertices in Bm(oi). By Corollary 7.3, (Z(1)

l , Z(2)

l )ml=0 can be whpperfectly coupled to (Z(n;1)

l , Z(n;2)

l )ml=0, which are two independent unimodular ortwo-stage branching processes where the root has offspring distribution D, andindividuals in all further generations have offspring distribution D? − 1.

We condition on Bm(o1), Bm(o2) which are such that Z(n;1)

m ≥ 1, Z(n;2)

m ≥ 1. Wecouple these to (Z(1)

m , Z(2)

m ). We note that, conditionally on Z(1)

m ≥ 1, Z(2)

m ≥ 1,

Z(1)

m

P−→∞, Z(2)

m

P−→∞ when m→∞.We will condition on (Bm(o1), Bm(o2)), and denote the conditional distribution

by Pm, and the expectation and variance under the measure Pm by Em andVarm, respectively. We collapse the vertices in Bm(o1) to a single vertex a1 andBmn(o2) to a single vertex a2. The distribution of the resulting random graph isagain a configuration model, with degrees da1

= Z(n;1)

m , da2= Z(n;2)

m and vertex setR = [n] ∪ a1, a2 \ (Bm(o1) ∪Bm(o2)).

We apply Proposition 7.7 with k = ε log logn, a1, b = Coren and with I = i ∈R : di ≤ K. Then, Proposition 7.7 gives that, conditionally on (Bm(o1), Bmn(o2))such that Z(n;1)

m ≥ 1, Z(n;2)

m ≥ 1,

Pm(Nk(ai, b) = 0) ≤ Varm(Nk(ai, b))/Em[Nk(ai, b)]2 (7.2.57)

≤ O(K)(1/Z(n;1)

m + 1/Z(n;2)

m

) P−→ 0,

in the iterated limit where first n → ∞, followed by m → ∞. As a result,conditionally on (Bm(o1), Bm(o2)) such that Z(n;1)

m ≥ 1, Z(n;2)

m ≥ 1, with probabilityat least 1− o(1), Nk(ai, b) ≥ 1, so that, on this event,

distCMn(d)(o1, o2) ≤ diamCMn(d)(Coren) + 2k ≤ 2(1 + ε) log log n

| log (τ − 2)| . (7.2.58)

7.3 Branching processes with infinite mean 275

We further use that, by Theorem 4.4,

P(distCMn(d)(o1, o2) <∞)→ ζ2, (7.2.59)

while

P(Z(n;1)

m ≥ 1, Z(n;2)

m ≥ 1)→ ζ2 (7.2.60)

in the iterated limit where first n → ∞, followed by m → ∞. As a result, withkn = 2(1+ε) log logn

| log (τ−2)| ,

P(distCMn(d)(o1, o2) ≤ kn | distCMn(d)(o1, o2) <∞

)(7.2.61)

≥ P(distCMn(d)(o1, o2) ≤ kn, Z(n;1)

m ≥ 1, Z(n;2)

m ≥ 1)

P(distCMn(d)(o1, o2) <∞)

=P(Z(n;1)

m ≥ 1, Z(n;2)

m ≥ 1)− o(1)

P(distCMn(d)(o1, o2) <∞)= 1− o(1),

as required. This completes the proof of Theorem 7.13.

Exercise 7.5 explores an alternative proof of Theorem 7.13.

7.3 Branching processes with infinite mean

When τ ∈ (2, 3), the branching processes (Z(1)

j )j≥0 and (Z(2)

j )j≥0 are well-defined, but have infinite mean in generations 2, 3, etc. This leads us to considerbranching processes with infinite mean. In this section, we give a scaling resultfor the generation sizes for such branching processes. This result will be crucialto describe the fluctuations of the typical distances in CMn(d), and also allowus to understand how the ultrasmall distances of order log log n arise. The mainresult in this section is the following theorem:

Theorem 7.14 (Branching processes with infinite mean) Let (Zn)n≥0 be abranching process with offspring distribution Z1 = X having distribution func-tion FX. Assume that there exist α ∈ (0, 1) and a non-negative, non-increasingfunction x 7→ γ(x), such that

x−α−γ(x) ≤ 1− FX(x) ≤ x−α+γ(x), for large x, (7.3.1)

where x 7→ γ(x) satisfies

(i) x 7→ xγ(x) is non-decreasing,

(ii)∫∞

0γ(eex) dx <∞, or, equivalently,

∫∞e

γ(y)

y log ydy <∞.

Then αn log(Zn ∨ 1)a.s.−→ Y , with P(Y = 0) equal to the extinction probability of

(Zn)n≥0.

In the analysis for the configuration model, α = τ − 2, as α corresponds tothe tail exponent of the size-biased random variable D? (recall Lemma 1.14).Theorem 7.14 covers the case where the branching process has an offspring which


has very thick tails. Indeed, it is not hard to show that Theorem 7.14 impliesthat E[Xs] =∞ for every s > α ∈ (0, 1) (see Exercise 7.7 below).

We do not prove Theorem 7.14 in full generality. Rather, we prove it in asimpler, yet still quite general case, in which γ(x) = (log x)γ−1 for some γ ∈ [0, 1).See Exercise 7.6 to see that this case indeed satisfies the assumptions in Theorem7.14.

Proof of Theorem 7.14 for γ(x) = (log x)γ−1. The proof is divided into four mainsteps. Define

Mn = αn log(Zn ∨ 1). (7.3.2)

We shall first assume that P(Z1 ≥ 1) = 1, so that η = 1. We start by splittingMn in a suitable way.

The split

For i ≥ 1, we define

Yi = αi log( (Zi ∨ 1)

(Zi−1 ∨ 1)1/α

). (7.3.3)

We can write

Mn = Y1 + Y2 + · · ·+ Yn. (7.3.4)

From this split, it is clear that almost sure convergence of Mn follows when thesum

∑∞i=0 Yi converges, which, in turn, is the case when

∞∑i=1

E[∣∣Yi∣∣] <∞. (7.3.5)

This is what we prove in the following three steps.

Inserting normalization sequences

We next investigate E[∣∣Yi∣∣]. We prove by induction on i that there exist constants

κ < 1 and C > 0 such that

E[∣∣Yi∣∣] ≤ Kκi. (7.3.6)

For i = 0, this follows from the fact that, when (7.3.1) holds, the random variableY1 = α log(Z1∨1) has a bounded absolute expectation. This initializes the induc-tion hypothesis. We next turn to the advancement of the induction hypothesis.For this, we recall the definition of un in [Volume 1, (2.6.7)], which states that

un = infx : 1− FX(x) ≤ 1/n. (7.3.7)

The interpretation of un is that it indicates the order of magnitude of maxni=1Xi,where (Xi)

ni=1 are i.i.d. random variables with distribution function FX.

Then we define

Ui = αi log( uZi−1∨1

(Zi−1 ∨ 1)1/α

), Vi = αi log

( Zi ∨ 1

uZi−1∨1

). (7.3.8)


Then, Yi = Ui + Vi, so that

E[∣∣Yi∣∣] ≤ E

[∣∣Ui∣∣]+ E[∣∣Vi∣∣]. (7.3.9)

We bound each of these terms separately.

Bounding the normalizing constants

In this step, we analyse the normalizing constants n 7→ un, assuming(7.3.1), anduse this, as well as theinduction hypothesis, to bound E

[∣∣Ui∣∣].When (7.3.1) holds and since limx→∞ γ(x) = 0, there exists a constant Cε ≥ 1

such that, for all n ≥ 1,

un ≤ Cεn1/α+ε, (7.3.10)

This gives a first bound on n 7→ un. We next substitute this bound into (7.3.1)and use that x 7→ xγ(x) is non-decreasing together with γ(x) = (log x)γ−1, toobtain that

1 + o(1) = n[1−FX(un)] ≥ n[u−(τ−1)−γ(un)n

]≥ n

[u−(τ−1)n elog

(Cεn

1α

+ε)γ]

, (7.3.11)

which, in turn, implies that there exists a constant c > 0 such that

un ≤ n1/αec(logn)γ . (7.3.12)

In a similar way, we can show the matching lower bound un ≥ n1/αe−c(logn)γ . Asa result,

E[∣∣Ui∣∣] ≤ cαiE[(log (Zi−1 ∨ 1))γ

]. (7.3.13)

Using the concavity of x 7→ xγ for γ ∈ [0, 1), as well as Jensen’s Inequality, wearrive at

E[∣∣Ui∣∣] ≤ cαi(E[(log (Zi−1 ∨ 1))

])γ= αi(1−γ)E[Mi−1]γ . (7.3.14)

By (7.3.4) and (7.3.6), which implies that E[Mi−1] ≤ Kκ/(1− κ), we arrive at

E[∣∣Ui∣∣] ≤ αi(1−γ)c

( Kκ

1− κ)γ. (7.3.15)

Logarithmic moment of an asymptotically stable random variable

In this step, we bound E[∣∣Vi∣∣]. We note that by [Volume 1, Theorem 2.33] and

for Zi quite large, the random variable (Zi ∨ 1)/(uZi−1∨1) should be close to astable random variable. We make use of this fact by bounding

E[∣∣Vi∣∣] ≤ αi sup

m≥1E[∣∣ log

(Sm/um

)∣∣], (7.3.16)

where Sm = X1+· · ·+Xm, and (Xi)mi=1 are i.i.d. copies of the offspring distribution

X. We shall prove that there exists a constant C > 0 such that, for all m ≥ 1,

E[∣∣ log

(Sm/um

)∣∣] ≤ C. (7.3.17)

In order to prove (7.3.17), we note that it suffices to bound

E[(

log(Sm/um

))+

]≤ C+, E

[(log(Sm/um

))−

]≤ C−, (7.3.18)


where, for x ∈ R, x+ = maxx, 0 and x− = max−x, 0. Since |x| = x+ +x−, wethen obtain (7.3.17) with C = C+ + C−. In order to prove (7.3.17), we start byinvestigating E

[(log(Sm/um

))−

]. We note that (log x)− = log (x−1 ∨ 1), so that

E[(

log(Sm/um

))−

]= E

[log(um/(Sm ∧ um)

)], (7.3.19)

where x∧ y = minx, y. The function x 7→ log((um/(x∧um)

)is non-increasing,

and, since Sm ≥ X(m), where X(m) = max1≤i≤mXi, we arrive at

E[

log(um/(Sm ∧ um)

)]≤ E

[log(um/(X(m) ∧ um)

)]. (7.3.20)

We next use that, for x ≥ 1, x 7→ log(x) is concave, so that, for every s,

E[

log(um/(X(m) ∧ um)

)]=

1

sE[

log((um/(X(m) ∧ um))s

)](7.3.21)

≤ 1

slog(E[(um/(X(m) ∧ um)

)s])≤ 1

s+

1

slog(usmE

[X−s

(m)

]),

where, in the last step, we made us of the fact that um/(x∧um) ≤ 1+um/x. Nowrewrite X−s

(m)= (−Y(m))

s, where Yj = −X−1j and Y(m) = max1≤j≤m Yj. Clearly,

Yj ∈ [−1, 0] since Xi ≥ 1, so that E[(−Y1)s] < ∞. Also, umY(m) = −um/X(m)

converges in distribution to −E−1/α, where E is exponential with mean 1, so itfollows from (Pickands III, 1968, Theorem 2.1) that, as m→∞,

E [(umY(m))p]→ E[E−1/α] <∞. (7.3.22)

We proceed with E[(

log(Sm/um

))+

], for which the proof is a slight adaptation

of the above argument. Now we make use of the fact that (log x)+ = log (x ∨ 1) ≤1 + x for x ≥ 0, so that we must bound

E[

log(Sm ∨ um/um

)]=

1

sE[

log((Sm ∨ um/um))s

)]≤ 1

s+ log

(E[(Sm/um

)s].

(7.3.23)The discussion on (Hall, 1981, Page 565 and Corollary 1) yields, for s < α,E[Ssm] = E[|Sm|s] ≤ 2s/2λs(t), for some function λs(m) depending on s, m andFX. Using the discussion on (Hall, 1981, Page 564), we have that λs(m) ≤Csm

s/αl(m1/α)s, where l( · ) is a slowly-varying function. With some more effort,it can be shown that we can replace l(m1/α) by `(m), which gives

E[

log(Sm ∨um/um

)]≤ 1

s+ logE

[(Smum

)s] ≤ 1

s+Cssms/α`(m)su−sm =

1

s+ 2s/2

Css,

(7.3.24)and which proves the first bound in (7.3.18) with C+ = 1

s+ 2s/2Cs

s.

Completion of the proof of Theorem 7.14 when X ≥ 1

Combining (7.3.9) with (7.3.15) and (7.3.16)–(7.3.17), we arrive at

E[∣∣Yi∣∣] ≤ αi(1−γ)c

( Kκ

1− κ)γ

+ Cαi ≤ Kκi, (7.3.25)


when we take κ = α1−γ and we take K to be sufficiently large, for example K ≥2C and K ≥ 2c

(Kκ1−κ

)γ. This completes the proof when the offspring distribution

X satisfies X ≥ 1.


We finally extend the result to the setting where X = 0 with positive probability.Since E[X] = ∞, the survival probability ζ = P(Zn ≥ 1∀n ≥ 0) satisfies ζ > 0.Conditionally on extinction, clearly Zn

a.s.−→ 0, so that, on the survival event,αn log(Zn ∨ 1)

a.s.−→ Y , where, conditionally on extinction, Y = 0.It remains to prove that αn log(Zn∨1)

a.s.−→ Y on the survival event. By [Volume1, Theorem 3.12], we have that, conditionally on survival,

Z(∞)

n

Zn

a.s.−→ ξ > 0, (7.3.26)

where we recall that Z(∞)

n are the individuals in the nth generation which havean infinite line of descent. By [Volume 1, Theorem 3.11] and conditionally onsurvival, (Z(∞)

n )n≥0 is again a branching process, now with offspring distributionp(∞) given in [Volume 1, (3.4.2)]. Note that, in particular, P(Z(∞)

1 ≥ 1) = 1, andwe wish to apply Theorem 7.14 to Z(∞)

n instead of Zn. It is not hard to show thatalso p(∞) in [Volume 1, (3.4.2)] satisfies the conditions in Theorem 7.14 with thefunction x 7→ γ?(x), given by γ?(x) = γ(x) + c/ log x . Thus, conditionally onsurvival,

αn log(Z(∞)

n ∨ 1)a.s.−→ Y (∞), (7.3.27)

and combining (7.3.26) and (7.3.27), it immediately follows that, conditionallyon survival,

αn log(Zn ∨ 1)a.s.−→ Y (∞). (7.3.28)

We conclude that Theorem 7.14 holds, where Y = 0 with probability η = 1 − ζand Y = Y (∞) with probability ζ.

We finally state some properties of the a.s. limit Y of (αn log(Zn∨1))n≥0, of whichwe omit a proof:

Theorem 7.15 (Limiting variable for infinite-mean branching processes) Underthe conditions of Theorem 7.14,

limx→∞

log P(Y > x)

x= −1, (7.3.29)

where is the a.s. limit of αn log(Zn ∧ 1).

Theorem 7.15 can be understood from the fact that, by (7.3.2)–(7.3.3),

Y =∞∑n=1

Yi, (7.3.30)


where

Y1 = α log(Z1 ∨ 1

). (7.3.31)

By (7.3.1),

P(Y1 > x) = P(Z1 > ex1/α

) = e−x(1+o(1)), (7.3.32)

which shows that Y1 satisfies (7.3.29). The equality in (7.3.30) together with(7.3.3) suggests that the tails of Y1 are equal to those of Y , which heuristicallyexplains (7.3.29).

Intuition behind ultrasmall distances using Theorem 7.14

We now use the results in Theorem 7.14 to explain how we can understand theultrasmall distances in the configuration model. First, note that

P(distCMn(d)(o1, o2) = k) =1

nE[|Bk(o1)|]. (7.3.33)

This suggests that distCMn(d)(o1, o2) should be closely related to the value of k (ifany exists) such that E[|Bk(o1)|] = Θ(n). Using the branching process approxi-mation that

|Bk(o1)| = Z(n;1)

k ≈ Z(1)

k ≈ e(τ−2)−kY1(1+oP(1)) (7.3.34)

(the first approximation being true for smallish k, but not necessarily for largek), this suggest that distCMn(d)(o1, o2) ≈ kn, where, by Theorem 7.14,

Θ(n) = Z(1)

kn= e(τ−2)−knY1(1+oP(1)), (7.3.35)

which suggests that distCMn(d)(o1, o2) ≈ log log n/| log(τ − 2)|. Of course, for suchvalues, the branching process approximation may fail miserably, and in fact itdoes. This is exemplified the fact that e(τ−2)−knY (1+oP(1)) can become much largerthan n, which is clearly impossible for |Bkn(o1)|.

More intriguingly, comparing with we see that the proposed typical distancesare a factor 2 too small compared to Theorem 7.2. The reason is that the double-exponential growth can clearly no longer be valid when Z(n;1)

k becomes too large,and thus, Z(n;1)

k must be far away from Z(1)

k in this regime. The whole problem isthat we are using the branching-process approximation well beyond its expirationdate.

So, let us try this again, but rather than using it for one neighborhood, letus use the branching process approximation from two sides. Now we rely on thestatement that

P(distCMn(d)(o1, o2) ≤ 2k) = P(Bk(o1) ∩Bk(o2) 6= ∅). (7.3.36)

Again using (7.3.35), we see that |Bk(o1)| ≈ e(τ−2)−kY1(1+oP(1)) and |Bk(o1)| ≈e(τ−2)−kY2(1+oP(1)), where Y1 and Y2 are independent. These grow roughly at thesame pace, and in particular, |Bk(o1)| = nΘ(1) roughly at the same time, namely,when k ≈ log logn/| log(τ − 2)|. Thus, we conclude that distCMn(d)(o1, o2) ≈2 log log n/| log(τ − 2)|, as rigorously proved in Theorem 7.2. We will see that

7.4 The diameter of the configuration model 281

the above growth from two sides does allow for better branching-process approx-imations in more detail in Theorem 7.21 below.

7.4 The diameter of the configuration model

We continue the discussion of distances in the configuration model by investi-gating the diameter in the model.

7.4.1 The diameter of the configuration model: logarithmic case

Before stating the main result, we introduce some notation. Recall that G?D

(x)is the probability generating function of p? = (p?k)k≥0 defined in (4.1.1) (recall also(4.2.7)). We recall that ξ is the extinction probability of the branching processwith offspring distribution p defined in (4.2.3) and further define

µ = G?D

(ξ) =∞∑k=1

kξk−1pk. (7.4.1)

When ξ < 1, we also have that µ ≤ 1. Then, the main result is as follows:

Theorem 7.16 (Diameter of the configuration model) Let Conditions 1.5(a)-(b) hold. Assume that n1 = 0 when p1 = 0, and that n2 = 0 when p2 = 0.Then,

diam(CMn(d))

log nP−→ 1ν>1

log ν+ 2

1p1>0

| logµ| +1p1=0,p2>0

| log p?2|. (7.4.2)

We note that, by Theorem 7.1 and Theorem 7.16, the diameter of the con-figuration model is strictly larger than the typical graph distance, except whenp1 = p2 = 0. In the latter case, the degrees are at least three, so that thin linesare not possible, and the configuration model is whp connected (recall Theorem4.15). We also remark that Theorem 7.16 applies not only to the finite-variancecase, but also to the finite-mean and infinite-variance case. In the latter case, thediameter is of order log n unless p1 = p2 = 0, in which case Theorem 7.16 impliesthat the diameter is oP(log n). By [Volume 1, Corollary 7.17], Theorem 7.16 alsoapplies to uniform random graphs with a given degree sequence. This shall beused in the examples below:

Random regular graphs

Let r be the degree of the random regular graph, where r ≥ 3. By [Volume 1,Corollary 7.17], the diameter of a random regular r-graph has with high prob-ability the same asymptotics as the diameter of CMn(d), where di = r withprobability 1. Thus, pr = 1 and pi = 0 for any i 6= r. We assume that nr is even,so that the degree sequence is feasible. It is not hard to see that all assumptionsof Theorem 7.16 are satisfied. Moreover, ν = r − 1. When r ≥ 3, we thus obtain


thatdiam(CMn(d))

log nP−→ 1

log (r − 1). (7.4.3)

When r = 2, on the other hand, the graph is critical, so that there is no giantcomponent. Since ν = 1, we have that µ = ν = 1, so that diam(CMn(d)) log n.This is quite reasonable, since the graph will consist of a collection of cycles. Thediameter of such a graph is equal to half the longest cycle. Exercises 7.10–7.11explore the length of the longest cycle in a random 2-regular graph, and theconsequence of having a long cycle on the diameter.

Erdos-Renyi random graph

We next study the diameter of ERn(λ/n). We let λ > 1. By [Volume 1, Theorem5.12], Conditions 1.5(a)-(b) hold with pk = e−λλk/k!. Also, µ = µλ, the dualparameter in [Volume 1, (3.6.6)] (see Exercise 7.12).

We again make essential use of [Volume 1, Theorem 7.18], which relates theconfiguration model and the generalized random graph. We note that ERn(λ/n)is the same as GRGn(w), where (recall [Volume 1, Exercise 6.1])

wi =nλ

n− λ. (7.4.4)

Clearly, w = (nλ/(n − λ))i∈[n] satisfies Conditions 1.1(a)-(c), so that also thedegree sequence of ERn(λ/n) satisfies Conditions 1.5(a)-(c), where the conver-gence holds in probability (recall [Volume 1, Theorem 5.12]). From the aboveidentifications and using [Volume 1, Theorem 7.18], we find that

diam(ERn(λ/n))

log nP−→ 1

log λ+

2

| logµλ|. (7.4.5)

This identifies the diameter of the Erdos-Renyi random graph. In particular, forthis case, Theorem 7.16 agrees with Theorem 6.21.

A sketch of the proof of Theorem 7.16

We recognize the term logν n as corresponding to the typical distances in thegraph (recall Theorem 7.1). Except when p1 = p2 = 0, there is a correction tothis term, which is due to long, yet very thin, neighborhoods.

When p1 > 0, yet ν > 1, the outside of the giant component is a subcriticalgraph, which locally looks like a subcritical branching process due to the dualityprinciple. In the subcritical case, there exist trees of size up to Θ(log n), and themaximal diameter of such trees is close to log n/| logµ| for some µ < 1. In thesupercritical case, instead, in the complement of the giant, there exist trees of sizeup to Θ(log n), and the maximal diameter of such trees is close to log n/| logµ|,where µ is the dual parameter. Of course, these will not be the components ofmaximal diameter in the supercritical case.

In the supercritical case, we can view the giant as consisting of the 2-coreand all the trees that hang off it. The 2-core is the maximal subgraph of the


giant for which every vertex has degree at least 2. It turns out that the treesthat hang off the giant have a very similar law as the subcritical trees outsideof the giant. Therefore, the maximal diameter of such trees is again of the orderlog n/| logµ|. Asymptotically, the diameter is formed between pairs of verticesfor which the diameter of the tree that they are in when cutting away the 2-core is the largest, thus giving rise to the logn/| logµ| contribution, whereas thedistances between the two vertices closest to them in the 2-core is close to logν n.This can be understood by realizing that for most vertices, these trees have abounded height, so that the typical distances in the 2-core and in the giant areclosely related.

The above intuition does not give the right answer when p1 = 0, yet p2 > 0.Assume that n1 = 0. Then, the giant and the 2-core are close to each other, so thatthe above argument does not apply. Instead, there it turns out that the diameter isrealized by the diameter of the 2-core, which is close to log n[1/ log ν+1/| log p?2|].Indeed, the 2-core will contain long paths of degree 2 vertices. The longest suchpaths will be is close to log n/| log p?2|. Therefore, the longest distance between avertex inside this long path and the ends of the path is close to log n/[2| log p?2|].Now it turns out that pairs of such vertices realize the asymptotic diameter, whichexplains why the diameter is then close to log n[1/ log ν + 1/| log p?2|].

Finally, we discuss what happens when p1 = p2 = 0. In this case, the assump-tion in Theorem 7.16 implies that n1 = n2 = 0, so that dmin ≥ 3. Then, CMn(d) iswhp connected, and the 2-core is the graph itself. Also, there cannot be any longlines, since every vertex has degree at least 3, so that local neighborhoods growexponentially with overwhelming probability. Therefore, the graph distances andthe diameter have the same asymptotics, as in Theorem 7.16.

The above case-distinctions explain the intuition behind Theorem 7.16. Thisintuition is far from a proof. The proof by Fernholz and Ramachandran (2007)involves a precise analysis of the trees that are augmented to the 2-core and theirmaximal diameter, as well as an analysis that the pairs determined above reallydo create shortest paths that are close to the diameter. We will not discuss thisfurther.

7.4.2 Diameter of the configuration model for τ ∈ (2, 3): log log case

We next use Theorem 7.9 to study the diameter of CMn(d) when τ ∈ (2, 3).Note that the diameter is equal to a positive constant times log n by Theorem7.16 when p1 + p2 > 0. Therefore, we turn to the case where p1 = p2 = 0. Whendmin ≥ 3, we know by Theorem 4.15 that CMn(d) is whp connected. The mainresult is as follows:

Theorem 7.17 (Diameter of CMn(d) for τ ∈ (2, 3)) Suppose that the empiricaldistribution function Fn of the degrees d = (di)i∈[n] satisfies Condition 1.5(a)-(b)and that (7.1.2) holds. Assume further that dmin = mini∈[n] di ≥ 3 and pdmin

=


P(D = dmin) > 0. Then,

diam(CMn(d))

log log nP−→ 2

| log (τ − 2)| +2

log (dmin − 1). (7.4.6)

When comparing Theorem 7.17 to Theorem 7.2, we see that for dmin ≥ 3, thediameter is of the same order log log n as the typical distance, but that the con-stant differs. As we have also seen in the sketch of proof of Theorem 7.16, thediameter is due to pairs of vertices that have small or thin local neighborhoods.When p1 + p2 > 0, these thin parts are close to lines, and they can be of lengththat is logarithmic in n. When dmin ≥ 3, however, we see that even the thinnestpossible neighborhoods are bounded below by a binary tree. Indeed, by assump-tion, there is a positive proportion of vertices of degree dmin. As a result, we willsee that the expected number of vertices whose (1 − ε) log log n/ log (dmin − 1)neighborhood only contains vertices with degree dmin tends to infinity. The min-imal path between two such vertices then consists of three parts: the two pathsfrom the two vertices to leave their minimally connected neighborhood, and thenthe path between the boundaries of these minimally connected neighborhoods.These minimal neighborhoods are at the typical distance 2 log log n/| log (τ − 2)|,as in Theorem 7.2. This explains why in Theorem 7.17 there is an extra term2 log log n/ log (dmin − 1).

We will not give the entire proof of Theorem 7.17, but give a rather detailedsketch of the proof. We sketch the upper and a lower bounds on the diameter.We start with the lower bound, which is the easier part:

Lower bound on the diameter

We call a vertex v minimally-k-connected when all i ∈ Bk(v) satisfy di = dmin, sothat all vertices at distance at most k have the minimal degree. Let Mk denote thenumber of minimally-k-connected vertices. To prove the lower bound in on the di-ameter, we show that Mk

P−→∞ for k = (1− ε) log log n/ log (dmin − 1). Followedby this, we show that two minimally k-connected vertices o1, o2 are such that whpthe distance between ∂Bk(o1) and ∂Bk(o2) is at least 2 log log n/| log (τ − 2)|. Westart by computing the first and second moment of Mk in the following lemma:

Lemma 7.18 (Moments of number of minimally-k-connected vertices) LetCMn(d) satisfy that dmin ≥ 3, ndmin

> dmin(dmin − 1)k−1. Then, for all k ≥ 1,

E[Mk] = ndmin

dmin(dmin−1)k−1∏i=1

dmin(ndmin− (i− 1))

`n − 2i+ 1, (7.4.7)

and, for k such that dmin(dmin − 1)k−1 ≤ `n/8,

E[M2k ] ≤ E[Mk]

2 + E[Mk][ dmin

dmin − 2(dmin − 1)k +

2ndmind2

min(dmin − 1)2k

(dmin − 2)`n

]. (7.4.8)

Consequently, for kn ≤ (1− ε)/ log (dmin − 1),

MkP−→∞. (7.4.9)


Proof We start by proving (7.4.7). We note that each vertex of degree dmin hasthe same probability of being minimally-k connected, and that there are preciselyndmin

vertices of degree dmin, so that

E[Mk] = ndminP(v is minimally-k-connected). (7.4.10)

Vertex v with dv = dmin is minimally-k-connected when all its half-edges atdistance at most k are paired to half-edges incident to a distinct vertex havingminimal degree dmin, and no cycles occur in Bk(v). When i − 1 half-edges arepaired to distinct vertices of degree dmin, then the probability that the ith half-edge is again paired to a distinct vertex of degree dmin equals


`n − 2i+ 1. (7.4.11)

Since for v to be minimally-k-connected, there are dmin(dmin − 1)k−1 half-edgesthat need to be paired to distinct vertices of degree dmin, this proves (7.4.7).

To prove (7.4.8), we note that

E[M2k ] =

∑v1,v2∈[n]

P(v1, v2 are minimally-k-connected). (7.4.12)

We split the above probability depending on whether Bk(o1)∩Bk(o2) = ∅ or not.The contribution to E[M2

k ] due to Bk(o1) ∩Bk(o2) = ∅ is, similarly to the proofof (7.4.7), equal to

ndmin(ndmin

− ik−1)2ik∏i=1


`n − 2i+ 1, (7.4.13)

where we abbreviate ik = dmin(dmin− 1)k−1 and note that ndmin− ik−1 > 0, since,

by assumption, ndmin> dmin(dmin − 1)k−1.

We use that i 7→ dmin(ndmin− (i− 1))/(`n − 2i+ 1) is decreasing since

(ndmin− (i− 1))

`n − 2i+ 1≥ ndmin

− i`n − 2i− 1

(7.4.14)

precisely when `n ≥ 2ndmin+ 1, and which is true since `n ≥ dminndmin

≥ 3ndmin.

Therefore, the contribution to E[M2k ] from v1 and v2 satisfying Bk(v1)∩Bk(v2) =

∅ is at most

n2dmin

( ik∏i=1


`n − 2i+ 1

)2

= E[M2k ], (7.4.15)

which is the first contribution to the r.h.s. of (7.4.8).We are left to deal with the contribution to E[M2

k ] from v1 and v2 such that


Bk(v1) ∩Bk(v2) 6= ∅. When v1 is k-minimally connected,

|Bk(v1)| = 1 +k∑l=1

dmin(dmin − 1)l−1 (7.4.16)

= 1 + dmin

(dmin − 1)k − 1

dmin − 2≤ dmin

(dmin − 1)k

dmin − 2.

Therefore, the contribution due to v2 ∈ Bk(v1) is bounded by

E[Mk]dmin

dmin − 2(dmin − 1)k, (7.4.17)

which is the second contribution to the r.h.s. of (7.4.8). Finally, we study thecase where Bk(v1) ∩ Bk(v2) 6= ∅, but v2 6∈ Bk(v1). When Bk(v1) ∩ Bk(v2) 6= ∅,but v2 6∈ Bk(v1), then one of the dmin(dmin − 1)k half-edges in Bk(v1) needsto be connected to one of the dmin(dmin − 1)l−k half-edges in Bl−k(v2), wherel = distCMn(d)(v1, v2) ∈ [2k]\[k]. Conditionally on v1 being k-minimally connectedand v2 being l−k-minimally connected, the probability that this occurs is at most

dmin(dmin − 1)kdmin(dmin − 1)l−k

`n − 2ik − 2il−k + 1≤ 2dmin(dmin − 1)kdmin(dmin − 1)l−k−1

`n,

(7.4.18)where, in the last inequality, we used that dmin(dmin − 1)k−1 ≤ `n/8. Therefore,this contribution is bounded by

E[Mk]2k∑

l=k+1

E[Ml−k]2dmin(dmin − 1)kdmin(dmin − 1)l−k−1

`n. (7.4.19)

We bound E[Ml−k] ≤ ndminand sum

2k∑l=k+1

(dmin − 1)l−k−1 ≤ (dmin − 1)2k

dmin − 2, (7.4.20)

to arrive at the third and final contribution to the r.h.s. of (7.4.8).

To complete the proof of the lower bound on the diameter, we fix ε > 0 suffi-ciently small, and take k? = (1− ε) log logn

log (dmin−1). Clearly,

dmin(dmin − 1)k?−1 ≤ (log n)1−ε ≤ `n/8, (7.4.21)

so that, in particular, we may use Lemma 7.18.We note that ndmin

/n → pdminby Condition 1.5(a) and pdmin

> 0 by assump-tion. Therefore, by Conditions 1.5(a)-(b), dminndmin

/`n → λdmin, where we define

λdmin= dminpdmin

/E[D]. By (7.4.7) in Lemma 7.18,

E[Mk] ≥ n(pdmin− δ)(λdmin

− δ)dmin(dmin−1)k−1

. (7.4.22)

As a result, E[Mk? ] ≥ n(pdmin−δ)(λdmin

−δ)(logn)1−ε. Further, by (7.4.8) in Lemma

7.18,

Var(Mk?) = o(E[Mk? ]2), (7.4.23)


so that

Mk?/E[Mk? ]P−→ 1. (7.4.24)

We conclude that, whp, Mk? ≥ n1−o(1). Since each minimally-k?-connected vertexuses up at most

1 +k?∑l=1

dmin(dmin − 1)l−1 = no(1) (7.4.25)

vertices of degree dmin, whp there must be at least two minimally-k?-connectedvertices whose k?-neighborhoods are disjoint. We fix two such vertices and denotethem by v1 and v2. We note that v1 and v2 have precisely dmin(dmin − 1)k

?−1

unpaired half-edges in ∂Bk?(v1) and ∂Bk?(v2). Let A12 denote the event thatv1, v2 are minimally-k?-connected with their k?-neighborhoods being disjoint.

Conditionally on A12, the random graph obtained by collapsing the half-edgesin ∂Bk?(v1) to a single vertex a and the half-edges in ∂Bk?(v1) to a single vertexb is a configuration model on the vertex set a, b ∪ [n] \ (Bk?(v1) ∪ Bk?(v2)),having degrees d given by da = db = dmin(dmin − 1)k

?−1 and di = di for everyi ∈ [n] \ (Bk?(v1) ∪Bk?(v2)).

By the truncated first moment method on paths, performed in the proof ofTheorem 7.6 (recall (7.2.21)), it follows that, for any ε > 0,

P(

distCMn(d)(∂Bk?(v1), ∂Bk?(v2)) ≤ (1− ε) 2 log log n

| log (τ − 2)| | A12

)= o(1). (7.4.26)

Therefore, whp,

diam(CMn(d)) ≥ (1− ε) 2 log log n

| log (τ − 2)| + 2k? (7.4.27)

= (1− ε) log log n[ 2

| log (τ − 2)| +2

log (dmin − 1)

].

Since ε > 0 is arbitrary, this proves the lower bound on diam(CMn(d)) in Theo-rem 7.17.

Sketch of the upper bound on the diameter

We now sketch the proof of the upper bound on the diameter. We aim to provethat, with kn = log log n[2/| log(τ − 2)|+ 2/ log (dmin − 1)],

P(∃v1, v2 ∈ [n] : distCMn(d)(v1, v2) ≥ (1 + ε)kn) = o(1). (7.4.28)

We already know that whp diam(Coren) ≤ 2 log log n/| log (τ − 2)| + 1, and wewill assume this from now on. Fix v1, v2 ∈ [n]. Then (7.4.28) follows when we canshow that

P(∃v : distCMn(d)(v,Coren) ≥ (1 + ε) log log n/ log (dmin − 1)) = o(1), (7.4.29)

and we will argue that, uniformly in v ∈ [n],

P(distCMn(d)(v,Coren) ≥ (1 + ε) log log n/ log (dmin − 1)) = o(1/n), (7.4.30)


which would prove (7.4.28).To prove (7.4.30), it is convenient to explore the neighborhood of the vertex

v by only pairing up the first dmin half-edges incident to v and the dmin − 1half-edges incident to any other vertex appearing in the neighborhood. We callthis exploration graph the k-exploration graph. One can show that it is quiteunlikely that there are many cycles within this exploration graph, so that it isactually close to a tree. Therefore, the number of vertices on the boundary of thisk-exploration graph with k? = (1 + ε/2) log log n/ log (dmin − 1) is close to

(dmin − 1)k ≈ (log n)1+ε/2. (7.4.31)

This is large, but not extremely large. However, one of these vertices is boundto have quite large degree, and thus, from this vertex, it is quite likely that weconnect to Coren quickly, meaning in o(log log n) steps. More precisely, the mainingredient in the upper bound is the statement that, for

k? = (1 + ε/2) log log n/ log (dmin − 1)

and whp, the k-exploration tree connects to Coren whp in less than k?. We omitfurther details.

7.5 Related results for the configuration model

In this section, we discuss related results for the configuration model. We startby discussing the distances in infinite-mean configuration models.

7.5.1 Distances for infinite-mean degrees

In this section, we assume that there exist τ ∈ (1, 2) and c > 0 such that

limx→∞

xτ−1[1− F ](x) = c. (7.5.1)

We study the configuration model CMn(d) where the degrees d = (di)i∈[n] are ani.i.d. sequence of random variables with distribution F satisfying (7.5.1).

We make heavy use of the notation used in [Volume 1, Theorem 7.23], whichwe first recall. Recall that the random probability distribution P = (Pi)i≥1 isgiven by

Pi = Zi/Z, (7.5.2)

where Zi = Γ−1/(τ−1)i and Γi =

∑ij=1Ei with (Ei)i≥1 an i.i.d. sequence of ex-

ponential random variables with parameter 1, and where Z =∑

i≥1 Γi. Recallfurther that MP,k is a multinomial distribution with parameters k and probabili-ties P = (Pi)i≥1. Thus, MP,k = (B1, B2, . . .),where, conditionally on P = (Pi)i≥1,Bi is the number of outcomes i in k independent trials such that each outcome isequal to i with probability Pi. In [Volume 1, Theorem 7.23], the random variableMP,D1

appears, where D1 is independent of P = (Pi)i≥1. We let M (1)

P,D1and M (2)

P,D2

be two random variables which are conditionally independent given P = (Pi)i≥1.


0 1 2 3 4 5

hopcount

0.0

0.2

0.4

0.6

pro

bab

ility

N=10

5

N=10

N=10

3

4

Figure 7.2 Empirical probability mass function of the distCMn(d)(o1, o2) forτ = 1.8 and n = 103, 104, 105.

In terms of this notation, the main result on distances in CMn(d) when thedegrees have infinite mean is the following:

Theorem 7.19 (Distances in CMn(d) with i.i.d. infinite mean degrees) Fixτ ∈ (1, 2) in (7.5.1) and let (di)i∈[n] be a sequence of i.i.d. copies of D. Then,CMn(d) satisfies

limn→∞

P(distCMn(d)(o1, o2) = 2) = 1− limn→∞

P(distCMn(d)(o1, o2) = 3) = pF ∈ (0, 1).

(7.5.3)The probability pF can be identifies as the probability that M (1)

P,D1and M (2)

P,D2have

an identical outcome, i.e., there is an outcome that occurs both in M (1)

P,D1and

M (2)

P,D2, where D1 and D2 are two i.i.d. copies of D.

Proof We sketch the proof of Theorem 7.19. First, whp, both d1 ≤ log n andd2 ≤ log n. The event that distCMn(d)(o1, o2) = 1 occurs precisely when one ofthe d1 half-edges of vertex 1 is attached to one of the d2 half-edges of vertex 2.Also, with high probability, `n ≥ n1/(τ−1)−ε. Therefore, on the event that `n ≥n1/(τ−1)−ε and d1 ≤ log n and d2 ≤ log n, the probability that distCMn(d)(o1, o2) = 1is bounded above by

(log n)2

n1/(τ−1)−ε = o(1). (7.5.4)

We note that the proof of [Volume 1, Theorem 7.23] implies that M (1)

P,d1denotes

the number of edges between vertex 1 and the largest order statistics. Indeed,M (1)

P,d1= (B(1)

1 , B(1)

2 , . . .), where B(1)

i is the number of edges between vertex i andthe vertex with degree d(n+1−i). The same applies to vertex 2. As a result, whenM (1)

P,d1and M (2)

P,d2have an identical outcome, then the typical graph distance equals

2. We are left to prove that the typical graph distance is bounded by 3 with high


probability. By [Volume 1, (2.6.17)], we have that ξkk1/(τ−1) P−→ 1 as k → ∞.

Thus, when K is large, the probability that vertex 1 is not connected to any ofthe vertices corresponding to (d(n+1−i))

Ki=1 converges to 0 when K tends to infinity.

Let Pn denote the conditional probability given the degrees (di)i∈[n]. For i ∈ [n],we let vi be the vertex corresponding to the ith order statistic d(n+1−i). By Lemma7.11,

Pn(vi not directly connected to vj) ≤ e−d(n+1−i)d(n+1−j)

2`n . (7.5.5)

Moreover, d(n+1−i), d(n+1−j) ≥ n1/(τ−1)−ε with high probability for n sufficientlylarge and any ε > 0, while whp `n ≤ n1/(τ−1)+ε. As a result, whp,

Pn(vi not directly connected to vj) ≤ e−n1/(τ−1)−3ε

. (7.5.6)

Therefore, for fixed K and for every i, j ∈ [K], the vertices vi and vj are whpneighbors. This implies that the vertices corresponding to the high order statisticsform a complete graph. We have already concluded that 1 is connected to vi forsome i ≤ K. In the same way, we conclude that vertex 2 is connected to vj forsome j ≤ K. Since vi is whp connected to vj, we conclude that

Pn(distCMn(d)(o1, o2) ≤ 3

)= 1− o(1). (7.5.7)


7.5.2 Fluctuation of distances for finite-variance degrees

We continue to study the fluctuations of the distances in the configurationmodel, starting with the case where the degrees have finite variance. We needa limit result from branching process theory before we can identify the limitingrandom variables (Ra)a∈(−1,0].

Recall that (Zk)k≥0 denotes the unimodular branching process where in thefirst generation, the offspring has distribution D with distribution function Fand in the second and further generations, the offspring has distribution D? − 1,where D? is the size-biased distribution of D. The process (Zk/E[D]νk−1)k≥1 isa martingale with uniformly bounded expectation and consequently convergesalmost surely to a limit (see e.g., [Volume 1, Theorem 2.24 and Exercise 2.26]):

limn→∞

ZnE[D]νn−1

=W a.s. (7.5.8)

In the theorem below we need two independent copies W1 and W1 of W.

Theorem 7.20 (Limit law for typical distance in CMn(d)) Let (di)i∈[n] be asequence of i.i.d. copies of a random variable D, and assume that there existτ > 3 and c <∞ such that, for all x ≥ 1,

[1− F ](x) ≤ cx−(τ−1), (7.5.9)

and let ν > 1. For k ≥ 1, let ak = blogν kc − logν k ∈ (−1, 0]. Then, CMn(d)


satisfies that there exist random variables (Ra)a∈(−1,0] such that as n → ∞ andfor all k ∈ Z,

P(distCMn(d)(o1, o2)− blogν nc = k | distCMn(d)(o1, o2) <∞

)(7.5.10)

= P(Ran = k) + o(1).

The random variables (Ra)a∈(−1,0] can be identified as

P(Ra > k) = E[

exp−κνa+kW1W2∣∣W1W2 > 0,

], (7.5.11)

where W1 and W2 are independent limit copies of W in (7.5.8) and where κ =E[D](ν − 1)−1.

In words, Theorem 7.20 states that for τ > 3, the graph distance distCMn(d)(o1, o2)between two randomly chosen connected vertices grows like the logν n, where nis the size of the graph, and that the fluctuations around this mean remain uni-formly bounded in n.

The law of Ra is involved, and can in most cases not be computed exactly.The reason for this is the fact that the random variables W that appear in itsstatement are hard to compute explicitly (see also [Volume 1, Chapter 3]).

There are two examples where the law of W is known. The first example iswhen all degrees in the graph are equal to some r ≥ 3, and we obtain the r-regular graph. In this case, E[D] = r, ν = r − 1, and W = 1 a.s. In particular,P(distCMn(d)(o1, o2) <∞) = 1 + o(1). Therefore,

P(Ra > k) = exp− r

r − 2(r − 1)a+k, (7.5.12)

and distCMn(d)(o1, o2) is asymptotically equal to logr−1 n. The second example iswhen p? is the probability mass function of a geometric random variable, in whichcase the branching process with offspring p? conditioned to be positive convergesto an exponential random variable with parameter 1. This example correspondsto

p?j = p(1− p)j−1, so that pj =1

jcpp(1− p)j−2, ∀j ≥ 1, (7.5.13)

and cp is a normalization constant. For p > 12, the law of W has the same law as

the sum of D1 copies of a random variable Y, where Y = 0 with probability 1−pp

and equal to an exponential random variable with parameter 1 with probability2p−1p

. Even in this simple case, the computation of the exact law of Ra is non-trivial.

7.5.3 Fluctuation of distances for infinite-variance degrees

We next study the fluctuations of typical distances in CMn(d) in the settingwhere the degrees are i.i.d. and satisfy that there exist τ ∈ (2, 3), γ ∈ [0, 1) andC <∞ such that

x−τ+1−C(log x)γ−1 ≤ 1− F (x) ≤ x−τ+1+C(log x)γ−1

, for large x. (7.5.14)


The condition in (7.5.14) is such that the results in Theorem 7.14 apply. Then, wecan identify the fluctuations of the typical graph distance in CMn(d) as follows:

Theorem 7.21 (Fluctuations graph distance CMn(d) for infinite-variance de-grees) Let (di)i∈[n] be a sequence of i.i.d. copies of a random variable D. Fixτ ∈ (2, 3) and assume that (7.5.14) holds. Then, CMn(d) satisfies that there existrandom variables (Ra)a∈(−1,0] such that, as n→∞ and for all l ∈ Z,

P(

distCMn(d)(o1, o2) = 2⌊ log logn

| log(τ − 2)|⌋

+ l∣∣∣ distCMn(d)(o1, o2) <∞

)(7.5.15)

= P(Ran = l) + o(1),

where an = b log logn| log(τ−2)|c−

log logn| log(τ−2)| ∈ (−1, 0]. Here, the random variables (Ra)a∈(−1,0]

are given by

P(Ra > l) = P(

mins∈Z

[(τ − 2)−sY1 + (τ − 2)s−clY2

]≤ (τ − 2)dl/2e+a

∣∣Y1Y2 > 0),

where cl = 1 if l is even, and zero otherwise, and Y1, Y2 are two independentcopies of the limit random variable in Theorem 7.14.

In words, Theorem 7.2 states that for τ ∈ (2, 3), the graph distance distCMn(d)(o1, o2)between two randomly chosen connected vertices grows proportional to log log ofthe size of the graph, and that the fluctuations around this mean remain uni-formly bounded in n.

We next discuss an extension, obtained by possibly truncating the degree dis-tribution. In order to state the result, we make the following assumption:

Condition 7.22 (Truncated infinite-variance degrees) There exists a βn ∈(1, 1/(τ − 1)] such that Fn(x) = 1 for x ≥ nβn(1+ε) for all ε > 0, while forall x ≤ nβn(1−ε),

1− Fn(x) =Ln(x)

xτ−1, (7.5.16)

with τ ∈ (2, 3), and a function Ln(x) that satisfies that, for some constant C1 > 0and η ∈ (0, 1), that

e−C1(log x)η ≤ Ln(x) ≤ eC1(log x)η . (7.5.17)

Theorem 7.23 (Fluctuations distances CMn(d) for truncated infinite-variancedegrees) Let (di)i∈[n] satisfy Condition 7.22 for some τ ∈ (2, 3). Assume thatdmin ≥ 2, and that there exists κ > 0 such that

maxdTV(Fn, F ), dTV(F ?n , F

?) ≤ n−κβn . (7.5.18)

When βn → 1/(τ − 1), we further require that the limit random variable Y inTheorem 7.14 has no pointmass on (0,∞). Then,

distCMn(d)(o1, o2)− 2log log(nβn)

| log(τ − 2)| −1

βn(τ − 3)(7.5.19)

is a tight sequence of random variables.


Which of the two terms in (7.5.19) dominates depends sensitively on the choiceof βn. When βn → β ∈ (0, 1/(τ − 1)], the first term dominates. When βn =(log n)−γ for some γ ∈ (0, 1), the second term dominates. Both terms are of thesame order of magnitude when βn = Θ(1/ log logn). Graph distances are alwaysmuch smaller than log n, the boundary point being βn = Θ(1/ log n), in whichcase nβn = Θ(1) and instead Theorem 7.1 applies. Thus, even after the truncationof the degrees, in the infinite-variance case, distances are always ultra-small.

Much more precise results are known, as we explain now. For this, we needsome extra notation. Define

Y (n)

i = (τ − 2)tn log(Z(n;i)

tn ) (7.5.20)

for an appropriate tn, which is the first t for which maxZ(n;1)

tn , Z(n;2)

tn ≥ nρn for

some appropriate ρn < βn. As it turns out, (Y (n)

1 , Y (n)

2 )d−→ (Y1, Y2). Further, let

b(i)

n be the fractional part of

log log(nβn)− log Y (n)

i

| log(τ − 2)| .

Then the most precise result available for graph distances in CMn(d) for infinite-variance degrees is the following theorem:

Theorem 7.24 (Fluctuations distances CMn(d) for truncated infinite-variancedegrees) Assume that the assumptions in Theorem 7.23 hold. Then,

distCMn(d)(o1, o2)− 2log log(nβn)

| log(τ − 2)| − b(1)

n − b(2)

n −⌈ 1βn− (τ − 2)b

(1)n − (τ − 2)b

(2)n

3− τ⌉

d−→ −1− Y1Y2

| log(τ − 2)| . (7.5.21)

Theorem 7.24 not only describes the precise fluctuations of the graph distancesin the infinite-variance case, but also identifies how this depends on the precisetruncation of the degrees. While the weak limit in Theorem 7.24 looks differentfrom that in Theorem 7.21, it turns out that they are closely related. As Exercise7.17 shows, Condition 7.22 applies for example in the case of degrees that havean exponential truncation. Exercise 7.18 investigates paths in CMn(d) where,instead of truncating the degrees, we only allow for paths through vertices ofmaximal degree nβn . This gives insight into the length of paths that must avoidthe highest-degree vertices.



Distances in the configuration model were first obtained in a non-rigorous wayin Newman et al. (2000b, 2002), see also Cohen and Havlin (2003) for resultson ultrasmall distances. Theorem 7.1 is proved by van der Hofstad et al. (2005).Theorem 7.2 is proved by van der Hofstad et al. (2007a).



Proposition 7.4 is adapted from (Janson, 2010b, Lemma 5.1). The path countingtechniques used in Section 7.2 are novel, but the combinatorial arguments arefollowing Eckhoff et al. (2013), where shortest weighted paths in the completegraph are investigated. Comparisons to branching processes appear in many pa-pers on the configuration model (see, in particular, Bhamidi et al. (2010a), vander Hofstad et al. (2005), van der Hofstad et al. (2007a)). We have strived for aconstruction that is most transparent and complete. The proof of Theorem 7.9is close in spirit to the analysis in Reittu and Norros (2004), the only differencebeing that we have simplified the argument slightly.


Theorem 7.14 is proved by Davies (1978), whose proof we follow. A related result,under stronger conditions, appeared by Darling (1970). Branching processes withinfinite mean have attracted considerable attention, see e.g., Schuh and Barbour(1977); Seneta (1973) and the references therein. There is a balance between thegenerality of the results and the conditions on the offspring distribution, and inour opinion Theorem 7.14 strikes a nice balance in that the result is relativelysimple and the conditions fairly general.


Theorem 7.16 is (Fernholz and Ramachandran, 2007, Theorem 4.1). Theorem 7.17is proved by Caravenna et al. (2019). A log n lower bound on the diameter whenτ ∈ (2, 3) is also proved by van der Hofstad et al. (2007b), but is substantiallyweaker than Theorem 7.16. A sharper result on the diameter of random regulargraphs was obtained by Bollobas and Fernandez de la Vega (1982). For a nicediscussion and results about the existence of a large k-core, we refer to Jansonand Luczak (2007).


Theorem 7.19 is proved in van den Esker et al. (2006). The explicit identificationof P(distCMn(d)(o1, o2) = 2) is novel. One might argue that including degrees largerthan n − 1 is artificial in a network with n vertices. In fact, in many real-worldnetworks, the degree is bounded by a physical constant. Therefore, in van denEsker et al. (2006), also the case where the degrees are conditioned to be smallerthan nα is considered, where α is an arbitrary positive number. Of course, wecannot condition on the degrees to be at most M , where M is fixed and indepen-dent of n, since in this case, the degrees are uniformly bounded, and this case istreated in van der Hofstad et al. (2005) instead. Therefore, van den Esker et al.(2006) considers cases where the degrees are conditioned to be at most a givenpower of n. In this setting, it turns out that the average distance is equal to k+3with high probability, whenever α ∈ (1/(τ +k), 1/(τ +k−1)). It can be expectedthat a much more detailed picture can be derived when instead conditioning onthe degrees being at most nβn as in van der Hofstad and Komjathy (2017) forthe τ ∈ (2, 3) case.


Theorem 7.20 is proved by van der Hofstad et al. (2005). Theorem 7.21 isproved by van der Hofstad et al. (2007a). The assumption that the limit Y inTheorem 7.14, conditionally on Y > 0, has a density was implicitly made by vander Hofstad et al. (2007a). However, this assumption is not necessary for Theorem7.21. Instead, it suffices to have that P(0 < Y ≤ ε) vanishes as ε 0 (see e.g.,(Hofstad et al., 2007a, Lemma 4.6)), which is generally true.

Theorem 7.23 is proved by van der Hofstad and Komjathy (2017). There alsothe assumption on the random variable Y not having point mass in (0,∞) wasexplicitly made and discussed in detail. Several extensions have been proved byvan der Hofstad and Komjathy (2017). Theorem 7.24 is (Hofstad and Komjathy,2017, Corollary 1.9). We refer to van der Hofstad and Komjathy (2017) for moredetails.


Exercise 7.1 (Random regular graph) Fix r ≥ 2 and consider the r-regulargraph on n vertices, where nr is even. Show that dTV(p?(n), p?) = 0.

Exercise 7.2 (Proof Theorem 7.5) Let o1, o2 be two independent vertices chosenuniformly at random from [n]. Use Proposition 7.4 with a = o1, b = o2, I = [n] toprove Theorem 7.5.

Exercise 7.3 (Proof Theorem 7.8) Use Proposition 7.7 to prove Theorem 7.8by adapting the proof of Theorem 6.14.

Exercise 7.4 (Γ1 is a complete graph) Use Lemma 7.11 and α > 1/2 to showthat, whp, Γ1 in (7.2.36) forms a complete graph, i.e., whp, every i, j ∈ Γ1 aredirect neighbors in CMn(d).

Exercise 7.5 (Alternative proof Theorem 7.13) Give an alternative proof ofTheorem 7.13 by adapting the proof of Theorem 6.9.

Exercise 7.6 (Example of infinite-mean branching process) Prove that γ(x) =(log x)γ−1 for some γ ∈ [0, 1) satisfies the assumptions in Theorem 7.14.

Exercise 7.7 (Infinite mean under conditions Theorem 7.14) Prove that E[X] =∞ when the conditions in Theorem 7.14 are satisfied. Extend this to show thatE[Xs] =∞ for every s > α ∈ (0, 1).

Exercise 7.8 (Conditions in Theorem 7.14 for individuals with infinite line of de-scent) Prove that p(∞) in [Volume 1, (3.4.2)] satisfies the conditions in Theorem7.14 with the function x 7→ γ?(x), given by γ?(x) = γ(x) + c/ log x.

Exercise 7.9 (Convergence for Zn + 1) Show that, under the conditions ofTheorem 7.14, also αn log(Zn + 1) converges to Y almost surely.

Exercise 7.10 (Diameter of soup of cycles) Prove that in a graph consistingsolely of cycles, the diameter is equal to the longest cycle divided by 2.


Exercise 7.11 (Longest cycle 2-regular graph) What is the limit law of the sizeof the longest cycle of the 2-regular graph?

Exercise 7.12 (Parameters for ERn(λ/n)) Fix λ > 1. Prove that ν = λ andµ = µλ, where µλ ∈ [0, 1) is the dual parameter, i.e., the unique µ satisfying that

µe−µ = λe−λ. (7.7.1)

Exercise 7.13 (Typical distance is at least 2 whp) Complete the argument thatP(distCMn(d)(o1, o2) = 1) = o(1) in the proof of Theorem 7.19.

Exercise 7.14 (Typical distance equals 2 whp for τ = 1) Let the (di)i∈[n] be asequence of i.i.d. copies of D with distribution function F satisfying that x 7→ [1−F ](x) is slowly varying at∞. Prove that CMn(d) satisfies that distCMn(d)(o1, o2)

P−→2.

Exercise 7.15 (Convergence along subsequences van der Hofstad et al. (2005))Fix an integer n1. Prove that, under the assumptions in Theorem 7.20, and con-ditionally on distCMn(d)(o1, o2) < ∞, along the subsequence nk = bn1ν

k−1c, thesequence of random variables distCMn(d)(o1, o2) − blogν nkc converges in distribu-tion to Ran1

as k →∞.

Exercise 7.16 (Tightness of the hopcount van der Hofstad et al. (2005)) Provethat, under the assumptions in Theorem 7.20,

(i) with probability 1−o(1) and conditionally on distCMn(d)(o1, o2) <∞, the randomvariable distCMn(d)(o1, o2) is in between (1± ε) logν n for any ε > 0;

(ii) conditionally on distCMn(d)(o1, o2) <∞, the random variables distCMn(d)(o1, o2)−logν n form a tight sequence, i.e.,

limK→∞

lim supn→∞

P(|distCMn(d)(o1, o2)− logν n| ≤ K

∣∣ distCMn(d)(o1, o2) <∞)

= 1.

(7.7.2)

As a consequence, prove that the same result applies to a uniform random graphwith degrees (di)i∈[n]. Hint: Make use of [Volume 1, Theorem 7.21].

Exercise 7.17 (Exponential truncation for degrees) Suppose that An = Θ(nβn).Show that Condition 7.22 holds when

1− Fn(x) =e−x/An

xτ−1, (7.7.3)

with τ ∈ (2, 3).

Exercise 7.18 (Paths through vertices with degree constraints) Suppose thatCondition 7.22 holds for βn = 1/(τ − 1). Show that Theorem 7.24 can be usedto show that minimal number of edges in paths connecting o1 and o2, for whichall the degrees on the path are at most nαn, where αn 1/ log n, scales as in(7.5.21) with βn in it replaced by αn.

Chapter 8

Small-world phenomena inpreferential attachment models

Abstract

In this chapter, we investigate graph distances in preferential attach-ment models. We focus on typical distances as well as the diameter ofpreferential attachment models. We again rely on path-counting tech-niques.

Motivation: local structure versus small-world properties

In Chapters 6–7, we have seen that random graphs with infinite-variance degreesare ultra-small worlds. This can be understood informally by two effects. First,we note that such random graph models contain super-hubs, whose degrees aremuch larger than n1/2 and that form a complete graph of connections, as wellas that these super-hubs are often part of shortest paths between two typicalvertices; and (b) vertices of large degree d 1 are typically connected to verticesof much larger degree, more precisely, of degree roughly d1/(τ−2). Combined, thesetwo effects mean that it takes roughly log log n/| log(τ − 2)| steps from a typicalvertex to reach one of the super-hubs, and thus roughly 2 log log n/| log(τ − 2)|steps to connect two typical vertices to each other. Of course, the proofs are moretechnical, but this is the bottom line.

The above explanation depends crucially on the local structure of the general-ized random graph as well as the configuration model, in which indeed verticesof high degree are whp connected to vertices of much high degree. In this chap-ter, we will see that for the preferential attachment model, such considerationneed to be subtly adapted. Indeed, fix δ ∈ (−m, 0) and m ≥ 2, so that the de-gree distribution has infinite variance. Then, in preferential attachment models,vertices of large degree d tend to be the old vertices, but old vertices are not nec-essarily connected to much older vertices, which would be necessary to increasetheir degree from d to d1/(τ−2). However, vertices of degree d 1 do tend tobe connected to typical vertices that in turn tend to be connected to verticesof degree roughly d1/(τ−2). We conclude that distances seem about twice as highin preferential attachment models as they are in the corresponding generalizedrandom graphs or configuration models, and that this can be explained by thedifferences in the local connectivity structure. Unfortunately, due to its dynamicnature, the results for preferential attachment models are harder to prove, andthey are somewhat less complete.

Throughout this chapter, we work with the preferential attachment model de-fined in [Volume 1, Section 8.2], which we denote by (PA(m,δ)

n )n≥1, unless statedotherwise. We recall that this model starts with a single vertex with m self-loops

297

298 Small-world phenomena in preferential attachment models

at time t = 1 and at each time a vertex is added with m edges which are at-tached to the vertices in the graph with probabilities given in [Volume 1, (8.2.1)]for m = 1, and as described on [Volume 1, page 282] for m ≥ 2. This modelcan also be obtained by identifying blocks of m vertices in (PA(1,δ/m)

n )n≥1. Wesometimes also discuss other variants of the model, such as (PA(m,δ)

n (b))n≥1, inwhich the m = 1 model does not have any self-loops, or (PA(m,δ)

n (d))n≥1, whichis particularly nice due to its relation to Polya urn schemes (recall Theorems 5.8and 5.10).


In Section 8.1, we start by showing that preferential attachment trees (where ev-ery vertex comes in with one edge to earlier vertices and m = 1) have logarithmicheight. In Section 8.2 we investigate graph distances in PA(m,δ)

n and formulateour main results. In Section 8.3, we investigate path counting techniques in pref-erential attachment models, which we use in Section 8.4 to prove logarithmiclower bounds on distances, and in Section 8.5 to prove doubly logarithmic lowerbounds. In Section 8.6, we prove the matching logarithmic upper bounds on graphdistances for δ ≥ 0, and in Section 8.7 the matching doubly logarithmic upperbound for δ < 0. In Section 8.8, we discuss the diameter of preferential attachmentmodels for m ≥ 2. In Section 8.9, we discuss further results about distances inpreferential attachment models. We close this chapter with notes and discussionin Section 8.10, and with exercises in Section 8.11.

Connectivity notation preferential attachment models

In this chapter, we write that uj v when the jth edge from u is connected to

v, where u ≥ v. We further write u ←→ v when there exists a path of edgesconnecting u and v, so that v ∈ C (u), where C (u) is the connected component ofu. Finally, we write u ∼ v when u, v is an edge in the preferential attachmentmodel.

8.1 Logarithmic distances in preferential attachment trees

In this section, we investigate distances in scale-free trees, arising for m = 1.We start by studying the typical distance from the root and the height of the treePA(1,δ)

n , where the height of a tree T is defined as

height(T ) = maxv∈[n]

distT (1, v), (8.1.1)

followed by its typical distances between pairs and its diameter. The conclusionis that the two-vertex objects are twice as large as the one-vertex objects, as canperhaps be expected on trees:

Theorem 8.1 (Typical distance from root in scale-free trees) Fix m = 1 andδ > −1. Let θ be the non-negative solution of

θ + (1 + δ)(1 + log θ) = 0. (8.1.2)

8.1 Logarithmic distances in preferential attachment trees 299

Then, as n→∞,

distPA

(1,δ)n

(1, o)

log nP−→ 1 + δ

2 + δ,

height(PA(1,δ)

n )

log nP−→ 1 + δ

(2 + δ)θ. (8.1.3)

Theorem 8.2 (Typical distance and diameter of scale-free trees) Fix m = 1and δ > −1. Let θ be the non-negative solution of (8.1.2). Then

distPA

(1,δ)n

(o1, o2)

log nP−→ 2(1 + δ)

2 + δ,

diam(PA(1,δ)

n )

log nP−→ 2(1 + δ)

(2 + δ)θ. (8.1.4)

The proofs of Theorems 8.1–8.2 rely on the fact that PA(1,δ)

n consists of a collec-tion of trees with precisely one self-loop. We shall perform explicit path-countingtechniques to prove Theorem 8.2. Exercises 8.1–8.2 explore properties of the con-stant θ.

In the proof of Theorems 8.1-8.2 it will be useful to work with PA(1,δ)

n (b) insteadof PA(1,δ)

n , for which a similar result holds:

Theorem 8.3 (Distances in scale-free trees PA(1,δ)

n (b)) Fix m = 1 and δ > −1,and let θ be the non-negative solution of (8.1.2). Then

distPA

(1,δ)n (b)

(1, o)

log nP−→ 1 + δ

2 + δ,

height(PA(1,δ)

n (b))

log na.s.−→ 1 + δ

(2 + δ)θ. (8.1.5)

and

distPA

(1,δ)n (b)

(o1, o2)

log nP−→ 2(1 + δ)

2 + δ,

diam(PA(1,δ)

n (b))

log nP−→ 2(1 + δ)

(2 + δ)θ. (8.1.6)

We start by proving the upper bound in Theorem 8.3. We remark that theproof will indicate that the almost sure limit of the height in Theorem 8.3 doesnot depend on the precise starting configuration of the graph PA(1,δ)

2 (b), whichwill be useful in extending the results in Theorem 8.3 to Theorems 8.1–8.2.

In the proof of the upper bound, we make use of the following result whichcomputes the probability mass function of the distance between vertex v and theroot 1. Before stating the result, we need some more notation. We write u v =

u1 v when in (PA(1,δ)

n (b))n≥1 the edge of vertex u is connected to vertex v. Notethat for this to happen, we need that u > v. For u = π0 > π2 > · · · > πk = 1,and denoting ~π = (π0, π2, . . . , πk), we write the event that ~π is present in the treePA(1,δ)

n (b) as

~π ⊆ PA(1,δ)

n (b) =k−1⋂i=0

πi πi+1. (8.1.7)

For a configuration of PA(1,δ)

n (b), we let dist(u, v) denote the unique value of ksuch that there exists a ~π satisfying u = π0 π1 · · · πk−1 πk = v. Thenthe probability mass function of dist(u, v) can be explicitly identified as follows:


Proposition 8.4 (Distribution of dist(u, v) in PA(1,δ)

n (b)) Fix m = 1 and δ >−1. Then, for all u > v,

P(dist(u, v) = k) =(1 + δ

2 + δ

)k Γ(u+ 12+δ

)Γ(s)

Γ(v + 12+δ

)Γ(t+ 1)

∑~π

k−1∏i=1

1

πi, (8.1.8)

where the sum is over ordered vectors ~π = (π0, . . . , πk) of length k+1 with π0 = uand πk = v. Further,

P(~π ⊆ PA(1,δ)

n (b)) =(1 + δ

2 + δ

)k Γ(u+ 12+δ

)Γ(v)

Γ(v + 12+δ

)Γ(u+ 1)

k−1∏i=1

1

πi, (8.1.9)

Proof of Proposition 8.4. Since the path between vertex u and v is unique, weobtain

P(dist(u, v) = k) =∑~π

P( k−1⋂i=0

πi πi+1), (8.1.10)

where again the sum is over all ordered vectors ~π = (π0, . . . , πk) of length k + 1with π0 = u and πk = v. Therefore, (8.1.8) follows immediately from (8.1.9).

We claim that the events πi πi+1 are independent, i.e., for every sequence~π = (π0, . . . , πk)

P( k−1⋂i=0

πi πi+1)

=k−1∏i=0

P(πi πi+1). (8.1.11)

We prove the independence in (8.1.11) by induction on k ≥ 1. For k = 1, thereis nothing to prove, and this initializes the induction hypothesis. To advance theinduction hypothesis in (8.1.11), we condition on PA(1,δ)

π0−1(b) to obtain

P(~π ⊆ PA(1,δ)

n (b)) = E[P( k−1⋂i=0

πi πi+1∣∣∣PA(1,δ)

π1−1(b))]

= E[1⋂k−1

i=1 πi πi+1P(π0 π1 | PA(1,δ)

π0−1(b))], (8.1.12)

since the event⋂k−1i=1 πi πi+1 is measurable with respect to PA(1,δ)

π0−1(b) becauseπ0 − 1 ≥ πi for all i ∈ [k − 1]. Furthermore, from [Volume 1, (8.2.2)],

P(π0 π1 | PA(1,δ)

π0−1(b))

=Dπ1

(π0 − 1) + δ

(2 + δ)(π0 − 1). (8.1.13)

In particular,

P(π0 π1

)= E

[Dπ1(π0 − 1) + δ

(2 + δ)(π0 − 1)

]. (8.1.14)


Therefore,

P(~π ⊆ PA(1,δ)

n (b)) = E[1⋂k−1

i=1 πi πi+1Dπ2

(π1 − 1) + δ

(2 + δ)(π1 − 1)

]= P

( k−1⋂i=1

πi πi+1)E[Dπ1

(π0 − 1) + δ

(2 + δ)(π0 − 1)

], (8.1.15)

since the random variable Dπ1(π0 − 1) only depends on how many edges are

connected to π1 after time π1, and is thus independent of the event⋂k−1i=1 πi

πi+1, which only depends on the attachment of the edges up to and includingtime π1. We conclude that

P(~π ⊆ PA(1,δ)

n (b)) = P(π0 π1

)P( k−1⋂i=1

πi πi+1). (8.1.16)

The claim in (8.1.11) for k follows from the induction hypothesis.Combining (8.1.10) with (8.1.11) and (8.1.14), we obtain that

P(~π ⊆ PA(1,δ)

n (b)) =k−1∏i=0

E[Dπi+1

(πi − 1) + δ

(2 + δ)(πi − 1)

]. (8.1.17)

By [Volume 1, (8.11.3)], for all n ≥ s,

E[Ds(n) + δ

(2 + δ)n

]= (1 + δ)

Γ(n+ 12+δ

)Γ(s)

(2 + δ)nΓ(n)Γ(s+ 12+δ

)

=1 + δ

2 + δ

Γ(n+ 12+δ

)Γ(s)

Γ(n+ 1)Γ(s+ 12+δ

), (8.1.18)

so that

P(~π ⊆ PA(1,δ)

n (b)) (8.1.19)

=(1 + δ

2 + δ

)k k−1∏i=0

Γ(πi + 12+δ

)Γ(πi+1)

Γ(πi + 1)Γ(πi+1 + 12+δ

)

=(1 + δ

2 + δ

)kΓ(π0 + 12+δ

)Γ(πk + 1)

Γ(πk + 12+δ

)Γ(π0)

k−1∏i=1

1

πi

=(1 + δ

2 + δ

)k Γ(u+ 12+δ

)Γ(v)

Γ(v + 12+δ

)Γ(u+ 1)

k−1∏i=1

1

πi. (8.1.20)

This completes the proof of Proposition 8.4.

Proof of the upper bounds in Theorem 8.3. We first use Proposition 8.3 to provethat, for o chosen uniformly at random from [n],

dist(1, o)

log n≤ (1 + oP(1))

1 + δ

2 + δ. (8.1.21)


and, almost surely for large enough n,

dist(1, n)

log n≤ (1 + ε)

(1 + δ)

(2 + δ)θ, (8.1.22)

where θ is the non-negative solution of (8.1.2). We start by noting that (8.1.21)–(8.1.22) immediately prove (8.1.5). Further, (8.1.22) implies that, almost surelyfor large enough n,

height(PA(1,δ)

n (b)) ≤ (1 + ε)(1 + δ)

(2 + δ)θ. (8.1.23)

By the triangle inequality,

distPA

(1,δ)n (b)

(o1, o2) ≤ dist(1, o1) + dist(1, o2), (8.1.24)

diam(PA(1,δ)

n (b)) ≤ 2height(PA(1,δ)

n (b)), (8.1.25)

so that (8.1.21)–(8.1.22) imply the upper bounds in Theorem 8.3.We proceed to prove (8.1.21) and (8.1.22), and start with some preparations.

We use (8.1.8) and symmetry to obtain

P(dist(1, n) = k) =(1 + δ

2 + δ

)k Γ(n+ 12+δ

)

Γ(1 + 12+δ

)Γ(n+ 1)

∑∗

~tk−1

1

(k − 1)!

k−1∏i=1

1

ti, (8.1.26)

where the sum now is over all vectors ~tk−1 = (t1, . . . , tk−1) with 1 < ti < n withdistinct coordinates. We can upper bound this sum by leaving out the restrictionthat the coordinates of ~tk−1 are distinct, so that

P(dist(1, n) = k) ≤(1 + δ

2 + δ

)k Γ(n+ 12+δ

)

Γ(1 + 12+δ

)Γ(n+ 1)

1

(k − 1)!

( n−1∑s=2

1

s

)k−1

. (8.1.27)

Since x 7→ 1/x is monotonically decreasing

n−1∑s=2

1

s≤∫ n

1

1

xdx = log n. (8.1.28)

Also, we use [Volume 1, (8.3.9)] to bound, for some constant Kδ > 0,

P(dist(1, n) = k) ≤ Kδn− 1+δ

2+δ

(1+δ2+δ

log n)k−1

(k − 1)!(8.1.29)

= KδP(Poi(1 + δ

2 + δlog n

)= k − 1

).

Now we are ready to prove (8.1.21). We note that o is chosen uniformly in [n],


so that, with C denoting a generic constant that may change from line to line,

P(dist(1, o) = k) =1

n

n∑s=1

P(dist(vs, v1) = k) ≤ 1

n

n∑s=1

Kδs− 1+δ

2+δ

(1+δ2+δ

log s)k−1

(k − 1)!

≤

(1+δ2+δ

log n)k−1

t(k − 1)!

n∑s=1

Kδs− 1+δ

2+δ ≤ C

(1+δ2+δ

log n)k−1

(k − 1)!n−

1+δ2+δ

= CP(Poi(1 + δ

2 + δlog n

)= k − 1

). (8.1.30)

Therefore,

P(dist(1, o) > k) ≤ CP(Poi(1 + δ

2 + δlog n

)≥ k

). (8.1.31)

Now we fix ε > 0 and take k = kn = (1+ε)(1+δ)

(2+δ)log n, to arrive at

P(dist(1, o) > kn) ≤ CP(Poi(1 + δ

2 + δlog n

)≥ (1 + ε)(1 + δ)

(2 + δ)log n

)(8.1.32)

= o(1),

by the law of large numbers and for any ε > 0, as required.We continue to prove (8.1.22). By (8.1.29),

P(dist(1, n) > k) ≤ KδP(Poi(1 + δ

2 + δlog n

)≥ k

). (8.1.33)

Now take kn = a log n with a > (1+δ)/(2+δ). We use the Borel-Cantelli lemmasto see that dist(1, n) > kn will occur only finitely often when (8.1.33) is summable.We then use the large deviation bounds for Poisson random variables in [Volume1, Exercise 2.20] with λ = (1 + δ)/(2 + δ) to obtain that

P(dist(1, n) > a log n) ≤ Kδn−[a(log (a(2+δ)/(1+δ))−1)+ 1+δ

2+δ ]. (8.1.34)

Let x be the solution of

x(log (x(2 + δ)/(1 + δ))− 1) +1 + δ

2 + δ= 1, (8.1.35)

so that x = (1+δ)

(2+δ)θ. Then, for every a > x,

P(dist(1, n) > a log n) = O(n−p), (8.1.36)

where

p = [a(log (a(2 + δ)/(1 + δ))− 1) + (1 + δ)(2 + δ)] > 1. (8.1.37)

As a result, by the Borel-Cantelli Lemma, the event dist(1, n) > kn occurs onlyfinitely often, and we conclude that (8.1.22) holds.


Proof of the lower bound on distPA

(1,δ)n (b)

(1, o) in Theorem 8.3. We use (8.1.30) toobtain that

P(distPA

(1,δ)n (b)

(1, o) ≤ k) ≤ CP(Poi(1 + δ

2 + δlog n

)≤ k

). (8.1.38)

Fix kn = (1+δ)(1−ε)(2+δ)

log n, and note that P(distPA

(1,δ)n (b)

(1, o) ≤ kn) = o(1) by the

law of large numbers.

To complete the proof of height(PA(1,δ)

n (b))/ log n

a.s.−→ (1+δ)(1−ε)(2+δ)θ

in Theorem 8.3,

we use the second moment method to prove that height(PA(1,δ)

n (b))≤ (1+δ)(1−ε)

(2+δ)θlog n

has vanishing probability. Together with (8.1.23), this certainly proves that

height(PA(1,δ)

n (b))/ log n

P−→ (1 + δ)(1− ε)(2 + δ)θ

.

However, since height(PA(1,δ)

n (b))

is a non-decreasing sequence of random vari-ables, this also implies that this convergence even holds almost surely, as weargue in more detail below Proposition 8.5.

We formalize the statement that height(PA(1,δ)

n (b))≤ (1−ε)(1+δ)

(2+δ)θlog n in the

following proposition:

Proposition 8.5 (Height of PA(1,δ)

n (b) converges in probability) For every ε > 0there exists a η = η(ε) > 0 such that

P(

height(PA(1,δ)

n (b))≤ (1− ε)(1 + δ)

(2 + δ)θlog n

)≤ O(n−η). (8.1.39)

Proof of lower bound on height(PA(1,δ)

n (b))

in Theorem 8.3 subject to Proposition8.5. Fix α > 0, and take tk = tk(α) = eαk. For any α > 0, by Proposition 8.5 and

the fact that t−ηk is summable, almost surely, height(PA(1,δ)

tk(b))≥ (1−ε)(1+δ)

(2+δ)θlog tk.

This proves the almost sure lower bound on height(PA(1,δ)

n (b))

along the subse-quence (tk)k≥0. To extend this to an almost sure lower bound when n → ∞, weuse that t 7→ height

(PA(1,δ)

n (b))

is non-decreasing, so that, for every t ∈ [tk−1, tk],

height(PA(1,δ)

n (b))≥ height

(PA(1,δ)

tk−1(b))

(8.1.40)

≥ (1− ε)(1 + δ)

(2 + δ)θlog tk−1

≥ (1− ε)(1− α)(1 + δ)

(2 + δ)θlog n,

where the third inequality follows from the almost sure lower bound on height(PA(1,δ)

tk−1(b)).

The above bound holds for all ε, α > 0, so that letting ε, α ↓ 0 proves ourclaim.

Proof of Proposition 8.5. We perform a path counting argument. We fix T ∈ [n]and k ∈ N. Recall that a path π = (π0, . . . , πk) is a sequence of vertices. In thissection we assume that πi > πi+1, since our paths will be part of the scale-free


tree PA(1,δ)

n (b). We write π ∈ PA(1,δ)

n (b) for the event that the edge from πi isconnected to πi+1 for all i = 0, . . . , k − 1. We let

Nk(n) = #π ⊆ PA(1,δ)

n (b) : πk ∈ [T ] (8.1.41)

denote the number of k-step paths in PA(1,δ)

n (b) with an endpoint in [T ]. ByProposition 8.4,

E[Nk(n)] =(1 + δ

2 + δ

)k n∑πk=1

n∑π0=πk

Γ(π0 + 12+δ

)Γ(πk)

Γ(πk + 12+δ

)Γ(π0 + 1)

∑~π

k−1∏i=1

1

πi(8.1.42)

≥(1 + δ

2 + δ

)k n∑πk=1

Γ(πk)

Γ(πk + 12+δ

)

n∑π0=πk

Γ(π0 + 12+δ

)

Γ(π0 + 1)

∑~π

k−1∏i=1

1

πi,

where again the sum is over all ordered ~π = (π0, . . . , πk) with πk ∈ [T ] andπ0 ∈ [n]. We can bound this from below by

E[Nk(n)] ≥(1 + δ

2 + δ

)k T∑πk=1

Γ(πk)

Γ(πk + 12+δ

)

n∑π0=πk

Γ(π0 + 12+δ

)

Γ(π0 + 1)

1

(k − 1)!

∑~tk

k−1∏i=1

1

ti,

(8.1.43)

where now the sum is over all vectors ~tk = (t1, . . . , tk−1) with distinct coordinateswith ti ∈ [πk + 1, π0 − 1]. For fixed π0, πk,∑

~tk

k−1∏i=1

1

ti≥( π0−1∑s=πk+k

1

s

)k−1

. (8.1.44)

We can lower bound

π0−1∑s=πk+k

1

s≥∫ π0

πk+k

1

xdx = log(π0/(πk + k)) ≥ (1− ε) log n, (8.1.45)

when π0 ≥ n/2, πk ≤ T and log[2(T + k)] ≤ ε log n. Thus, we conclude that

E[Nk(n)] ≥(1 + δ

2 + δ

)k T∑πk=1

Γ(πk)

Γ(πk + 12+δ

)

n∑π0=n/2

Γ(π0 + 12+δ

)

Γ(π0 + 1)

1

(k − 1)![(1−ε) log n]k−1.

(8.1.46)Using [Volume 1, (8.3.9)], we therefore arrive at

E[Nk(n)] ≥ (1 + o(1))(1 + δ

2 + δ

)k T∑πk=1

π−1/(2+δ)k

n∑π0=n/2

π−(1+δ)/(2+δ)0

1

(k − 1)![(1− ε) log n]k

≥ c(1 + δ

2 + δ

)kT (1+δ)/(2+δ)n1/(2+δ) 1

(k − 1)![(1− ε) log n]k−1

= cT (1+δ)/(2+δ)n[1+(1−ε)(1+δ)]/(2+δ)P(Poi(1 + δ

2 + δ(1− ε) log n

)= k − 1

).

(8.1.47)


When we take k = kn = (1−ε)(1+δ)

(2+δ)θlog n, then

P(Poi(1 + δ

2 + δ(1− ε) log n

)= k − 1

)≥ cn−(1−ε)/

√log n, (8.1.48)

so that

E[Nk(n)] ≥ T (1+δ)/(2+δ)nε/(2+δ)+o(1). (8.1.49)

We next take T = nε to arrive at

E[Nk(n)] ≥ nε+o(1). (8.1.50)

This provides the required lower bound on E[Nk(n)]. We defer the proof of anupper bound on Var(Nk(n)) to Section 8.3, where we prove that there exist C > 0and η = η(ε) > 0 such that Var(Nk(n)) ≤ CE[Nk(n)]2n−η (see Lemma 8.12below). As a result, by the Chebychev inequality ([Volume 1, Theorem 2.18])

P(Nk(n) = 0) ≤ Var(Nk(n))

E[Nk(n)]2≤ Cn−η. (8.1.51)

Since height(PA(1,δ)

n (b)) ≥ k when Nk(n) ≥ 1, this proves the claim in Proposition8.5.

We complete this section by proving Theorems 8.2–8.3:Proof of Theorems 8.2 and 8.3. We first prove the upper bound on the diameterof PA(1,δ)

n (b) in Theorem 8.3, for which we use that

diam(PA(1,δ)

n (b)) ≤ 2 · height(PA(1,δ)

n (b)). (8.1.52)

Equation (8.1.52) together with the upper bound in Theorem 8.3 imply that

lim supn→∞

diam(PA(1,δ)

n (b))

log n≤ 2(1 + δ)

(2 + δ)θ. (8.1.53)

For the lower bound, we use the lower bound on diam(PA(1,δ)

n (b)) in Theorem8.3 and the decomposition of scale-free trees in Theorem 5.4. Theorem 5.4 statesthat the scale-free tree PA(1,δ)

n (b) can be decomposed into two scale-free trees,having a similar distribution as copies PA(b1)

S1(n)(1, δ) and PA(1,δ)

n−S1(n)(b2), where

(PA(1,δ)

n (b1))n≥1 and (PA(1,δ)

n (b2))n≥1 are independent scale-free tree processes, andthe law of S1(t) is described in (5.1.29). By this tree decomposition

diam(PA(1,δ)

n (b))≥ height

(PA(1,δ)

S1(n)(b1))

+ height(PA(1,δ)

n−S1(n)(b2)). (8.1.54)

The two trees (PA(1,δ)

n (b1))n≥1 and (PA(1,δ)

n (b2))n≥1 are not exactly equal in dis-tribution to (PA(1,δ)

n (b))n≥1, because the initial degree of the starting vertices attime n = 2 is different. However, the precise almost sure scaling in Theorem5.4 does not depend in a sensitive way on d1 and d2, and also the height of thescale-free tree in Theorem 8.3 does not depend on the starting graphs PA(1,δ)

2 (b1)and PA(1,δ)

2 (b1) (see the remark below Theorem 8.3). Since S1(n)/na.s.−→ U ,

with U having a Beta-distribution with parameters a = (3 + δ)/(2 + δ) and

b = (1 + δ)/(2 + δ), we obtain that height(PA(1,δ)

S1(n)(b1))/ log n

a.s.−→ (1+δ)

(2+δ)θand

8.2 Small-world phenomena in preferential attachment models 307

height(PA(1,δ)

n−S1(n)(b2))/ log n

a.s.−→ (1+δ)

(2+δ)θ. Thus, we conclude that, almost surely

for large enough n,

diam(PA(1,δ)

n (b))

log n≥ (1− ε)2(1 + δ)

(2 + δ)θ. (8.1.55)

Combining (8.1.53) and (8.1.55) proves the convergence of diam(PA(1,δ)

n (b))/ log n

in Theorem 8.3.We proceed with the proof of the convergence of dist

PA(1,δ)n (b)

(o1, o2)/ log n inTheorem 8.3. We write

distPA

(1,δ)n (b)

(o1, o2) = distPA

(1,δ)n (b)

(1, o1) + distPA

(1,δ)n (b)

(1, o2) (8.1.56)

− distPA

(1,δ)n (b)

(1, V ),

where V is the last vertex that is on both paths from 1 → o1 as well as from1→ o2. The asymptotics of the first two terms are identified in (8.1.5) in Theorem8.3, so that it suffices to show that dist

PA(1,δ)n (b)

(1, V ) = oP(log n). We defer thisresult to Lemma 8.13 below, after we have discussed path-counting techniques forpreferential attachment models.

To prove Theorem 8.2, we note that the connected components of PA(1,δ)

n aresimilar in distribution to single scale-free tree PA(1,δ)

t1(b1), . . . ,PA(1,δ)

tNn(bNn), apart

from the initial degree of the root. Here ti denotes the size of the ith tree attime n, and we recall that Nn denotes the total number of trees at time n. Since

Nn/ log nd−→ (1 + δ)/(2 + δ) (recall Exercise 5.23), whp the largest connected

component has size at least εn/ log n. Since

log(εn/ log n

)= log n(1 + o(1)), (8.1.57)

Theorem 8.2 follows along the same lines as in the proof of Theorem 8.3.

8.2 Small-world phenomena in preferential attachment models

In the next sections we investigate distances in preferential attachment modelsfor m ≥ 2. These results are not as complete as those for inhomogeneous randomgraphs or the configuration model as discussed in Chapters 6 and 7, respectively.This is partly due to the fact that the dynamic preferential attachment modelsare substantially harder to analyze than these static models. We investigate boththe diameter as well as typical distances.

By Theorem 5.26, PA(m,δ)

n is whp connected when m ≥ 2. Recall that in aconnected graph, the typical distance dist

PA(m,δ)n

(o1, o2) is the graph distance be-tween two vertices chosen uniformly at random from the vertex set [n]. Recallfurther that the power-law degree exponent for PA(m,δ)

n is equal to τ = 3 + δ/m.Therefore, τ > 3 precisely when δ > 0. For the generalized random graph andthe configuration model, we have seen that distances are logarithmic in the sizeof the graph when τ > 3, and doubly logarithmic when τ ∈ (2, 3). We will seesimilar results for PA(m,δ)

n .


Logarithmic distances in PA models with m ≥ 2 and δ > 0.

We start by investigating the case where δ > 0 so that also the power-law de-gree exponent τ = 3 + δ/m satisfies τ > 3. In this case, the typical distance islogarithmic in the size of the graph:

Theorem 8.6 (Logarithmic bounds for typical distances of PA(m,δ)

n for δ > 0)Fix m ≥ 2 and δ > 0. There exists ν ∈ (1,∞) such that, as n→∞,

distPA

(m,δ)n

(o1, o2)/ logν(n)P−→ 1. (8.2.1)

We will see that ν again refers to the exponential growth of an appropriate multi-type branching process, given by the Polya point tree.

Distances in PA models with m ≥ 2 and δ = 0.

For δ = 0, τ = 3. For NRn(w), distances grow as log n/ log log n in this case(recall Theorem 6.17). The same turns out to be true for PA(m,δ)

n :

Theorem 8.7 (Typical distances of PA(m,δ)

n for δ = 0) Fix m ≥ 2 and δ = 0.As n→∞,

distPA

(m,0)n

(o1, o2)log logn

log nP−→ 1. (8.2.2)

Theorem 8.7 shows that distances for τ = 3 are similar in PA(m,δ)

n as in NRn(w).

Doubly logarithmic distances in PA models with m ≥ 2 and δ < 0.

We close this section by discussing the case where δ ∈ (−m, 0), so that τ ∈ (2, 3).In this case, it turns out that distances again grow doubly logarithmically in thesize of the graph:

Theorem 8.8 (Doubly logarithmic asymptotics for the diameter for δ < 0) Fixm ≥ 2 and assume that δ ∈ (−m, 0). As n→∞,

distPA

(m,δ)n

(o1, o2)

log log nP−→ 4

| log (τ − 2)| . (8.2.3)

Interestingly, the term 4/| log (τ − 2)| appearing in Theorem 8.8 replaces the term2/| log (τ − 2)| in Theorem 6.3 for the Norros-Reittu model NRn(w) and in The-orem 7.2 for the configuration model CMn(d) when the power-law exponent τsatisfies τ ∈ (2, 3). Thus, typical distances are twice as big for PA(m,δ)

n comparedto CMn(d) with the same power-law exponent. This can be intuitively explainedas follows. For the configuration model CMn(d), vertices with high degrees arelikely to be directly connected (see e.g. Lemma 7.11). For PA(m,δ)

n , this is not thecase. However, vertices with high degrees are likely to be at distance two, as whpthere is a young vertex that will connect to both of the old vertices of high degree.This makes distances in PA(m,δ)

n effectively twice as big as those for CMn(d) withthe same degree sequence. This effect is special for δ < 0 and is studied in moredetail in Exercise 8.3.

8.3 Path counting in preferential attachment models 309

Universality in distances for scale-free graphs

The available results are all consistent with the prediction that distances in pref-erential attachment models have the same asymptotics as distances in the con-figuration model with the same degree sequence. This suggest a strong form ofuniversality, which is interesting in its own right. However, certain local effectsof the graph may change graph distances a little bit, as exemplified by the factthat distances in Theorem 8.8 are twice as big as for the Norros-Reittu modelNRn(w) in Theorem 6.3, and for the configuration model CMn(d) in Theorem7.2. This shows that the details of the model are relevant.

Organisation of the proof of small-world properties of the PAM

We prove these results in the following four sections. This proof is organizedas follows. We start in Section 8.3 by discussing path-counting techniques forpreferential attachment models. While preferential attachment models lack thekind of independence or weak dependence between the edge statuses of the graphpresent in inhomogeneous random graphs and configuration models, it turns outthat the probability of the existence of paths can still be bounded from above byproducts of probabilities, due to an inherent negative correlation between edgestatuses. This allows us to still obtain upper bounds for the expected number ofpaths of given lengths connecting several vertices. Therefore, the lower boundson typical graph distances can be performed in a highly similar way as for rank-1inhomogeneous random graphs as in Theorem 6.4 or for the configuration modelin Theorem 7.5. These bounds will also apply to δ = 0. For δ < 0, instead, weagain use a truncated path-counting argument alike in the proof of Theorems 6.6and 7.6. The resulting lower bounds are performed in Sections 8.4 and 8.5 forδ ≥ 0 and δ < 0 respectively.

For the upper bounds instead, we can rely on the weak upper bounds for m = 1in Section 8.1 for δ > 0, which are obviously not sharp. For δ < 0, instead, weagain define a notion of a core as for the configuration model in Theorem 7.9.However, due to the dynamics, the precise definition will be more involved andincorporate the dynamics in an appropriate way. This upper bound in performedin Sections 8.6 and 8.7 for δ ≥ 0 and δ < 0 respectively, completing the proofs ofthe typical distances in PA(m,δ)

n . Sub-results will be stated explicitly, as these aresometimes sharper than the results stated in this section and are thus interestingin their own right.

8.3 Path counting in preferential attachment models

In this section we study the probability that a certain path is present in PA(m,δ)

n .Recall that we call a path ~π = (π0, π1, . . . , πl) self-avoiding when πi 6= πj for all1 ≤ i < j ≤ l. The following proposition studies the probability that a path ~π ispresent in PA(m,δ)

n :

Proposition 8.9 (Path counting in PA(m,δ)

n ) Denote γ = m/(2m+δ). Fix l ≥ 0


and let ~π = (π0, π1, . . . , πl) be an l-step self-avoiding path consisting of the l + 1unordered vertices π0, π1, . . . , πl. Then, there exists a constant C > 0 such that,for all l ≥ 1,

P(~π ⊆ PA(m,δ)

n ) ≤ (Cm2)ll−1∏i=0

1

(πi ∧ πi+1)γ(πi ∨ πi+1)1−γ . (8.3.1)

Paths are formed by repeatedly forming edges. When m = 1, paths alwaysgo from younger to older vertices. When m ≥ 2, this monotonicity property ofpaths is lost, which generally makes proofs harder. We start by investigatingintersections of events that specify which edges are present in PA(m,δ)

n .We recall some notation. The event that the jth edge of vertex u is attached

to the earlier vertex v, where u, v ∈ [n], is denoted by

u j v, j ∈ [m]. (8.3.2)

It will often be convenient to translate statements from PA(m,δ)

n to PA(1,δ/m)

nm . In-

deed, for PA(m,δ)

n , the event u j v means that in PA(1,δ/m)

nm , the edge from vertexm(u− 1) + j is attached to one of the vertices [mv] \ [m(v − 1)].

It is a direct consequence of the definition of PA-models that the event (8.3.2)increases the preference for vertex v, and hence decreases (in a relative way) thepreference for the other vertices in [n] \ v. It should be intuitively clear thatanother way of expressing this effect is to say that, for different v1 6= v2, the

events u1j1 v1 and u2

j2 v2 are negatively correlated. We now formalize thisresult. For an integer nv ≥ 1, we denote the event that the jith edge of vertexu(v)

i is attached to the earlier vertex v, for i ∈ [nv], by

Env,v =nv⋂i=1

u(v)

i

j(v)i v

. (8.3.3)

We start by proving that the events Env,v, for different s, are negatively correlatedfor each choice of k ≥ 1 and all possible choices of u(v)

i , j(v)

i :

Lemma 8.10 (Negative correlation for connection of edges) For distinct v1, v2, . . . , vk ∈[n] and all nv1

, . . . , nvk ≥ 1, both for PA(m,δ)

n as well as for PA(m,δ)

t (b),

P( k⋂s=1

Envs ,vs)≤

k∏s=1

P(Envs ,vs). (8.3.4)

Proof We only prove the statement for PA(m,δ)

n , the proof for PA(m,δ)

t (b) is identi-

cal. We define the edge number of the event u j v to be m(u−1) + j, which is

the order of the edge when we consider the edges as being attached in sequencein PA(1,δ/m)

mt .We use induction on the largest edge number present in the events Env1 ,v1

, . . . , Envk ,vk .The induction hypothesis is that (8.3.4) holds for all k, all distinct v1, v2, . . . , vk ∈[n], all nv1

, . . . , nvk ≥ 1, and all choices of u(vs)

i , j(vs)

i such that maxi,sm(u(vs)

i −1) + j(vs)

i ≤ e, where induction is performed with respect to e.


To initialize the induction, we note that for e = 1, the induction hypothesisholds trivially, since

⋂ks=1 Envs ,vs can be empty or consist of exactly one event,

and in the latter case there is nothing to prove. This initializes the induction.To advance the induction, we assume that (8.3.4) holds for all k, all dis-

tinct v1, v2, . . . , vk ∈ [n], all nv1, . . . , nvk ≥ 1, and all choices of u(vs)

i , j(vs)

i suchthat maxi,sm(u(vs)

i − 1) + j(vs)

i ≤ e − 1, and we extend it to all k, all distinctv1, v2, . . . , vk ∈ [n], all nv1


i , j(vs)

i such thatmaxi,sm(u(vs)

i − 1) + j(vs)

i ≤ e. Clearly, for kall distinct v1, v2, . . . , vk ∈ [n], allnv1


i , j(vs)

i such that maxi,sm(u(vs)

i −1)+j(vs)

i ≤e− 1, the bound follows from the induction hypothesis, so we may restrict atten-tion to the case that maxi,sm(u(vs)

i − 1) + j(vs)

i = e.We note that there is a unique choice of u, j such that m(u− 1) + j = e. There

are two possibilities: (1) Either there is exactly one choice of s and u(vs)

i , j(vs)

i suchthat u(vs)

i = t, j(vs)

i = j, or (2) there are at least two such choices. In the latter

case,⋂ks=1 Envs ,vs = ∅, since the eth edge is connected to a unique vertex. Hence,

there is nothing to prove.We are left to investigate the case where there exists a unique s and u(vs)

i , j(vs)

i

such that u(vs)

i = u, j(vs)

i = j. Denote the restriction of Envs ,vs to all other edgesby

E ′nvs ,vs =nv⋂

i=1: (u(vs)i ,j

(vs)i ) 6=(u,j)

u(vs)

i

j(vs)i vs

(8.3.5)

Then we can write

k⋂t=1

Envt ,vt =u

j vs

∩ E ′nvs ,vs ∩

k⋂t=1: vt 6=vs

Envt ,vt . (8.3.6)

By construction, all the edge numbers of the events in E ′nv,v ∩⋂ki=1: si 6=s Envi ,vi are

at most e− 1.By conditioning, we obtain

P( k⋂t=1

Envt ,vt)≤ E

[1E′nvs ,vs∩

⋂kt=1: vt 6=vs

Envt ,vtPe−1(u

j vs)

], (8.3.7)

where Pe−1 denotes the conditional probability given the edge attachments up tothe (e − 1)st edge, or, equivalently, given PA(1,δ/m)

e−1 , and we have used that the

event E ′nv,v∩⋂kt=1: vt 6=vs Envt ,vt is measurable with respect to PA(1,δ/m)

e−1 . We compute

Pe−1(uj vs) =

Dvs(u− 1, j − 1) + δ

zu,j, (8.3.8)

where we recall that Dvs(u− 1, j − 1) is the degree of vertex s after j − 1 edgesof vertex u have been attached and we write the normalization in (8.3.8) as

zu,j = zu,j(δ,m) = (2m+ δ)(u− 1) + (j − 1)(2 + δ/m) + 1 + δ. (8.3.9)


We wish to use the induction hypothesis. For this, we note that

Dvs(u− 1, j − 1) = m+∑

(u′,j′) : mu′+j′≤e−1

1u′ j

′ vs

, (8.3.10)

where we recall that e− 1 = m(u− 1) + j − 1.

Each of the events u′ j′

vs in (8.3.10) has edge number strictly smaller thane and occurs with a non-negative multiplicative constant. As a result, we mayuse the induction hypothesis for each of these terms. Thus, we obtain, using alsom+ δ ≥ 0, that,

P( k⋂s=1

Envs ,vs)≤ m+ δ

zu,jP(E ′nvs ,vs)

k∏t=1: vt 6=vs

P(Envt ,vt) (8.3.11)

+∑

(u′,j′) : mu′+j′≤e−1

P(E ′nv,v ∩ u′j′

vs)zu,j

k∏t=1: vt 6=vs

P(Envt ,vt).

We use (8.3.10) to recombine the above as

P( k⋂s=1

Envs ,vs)≤ E

[1E′nvs ,vs

Dvs(u− 1, j − 1) + δ

zu,j

] k∏t=1: vt 6=vs

P(Envt ,vt), (8.3.12)

and the advancement is completed when we note that

E[1E′nvs ,vs

Dvs(u− 1, j − 1) + δ

zu,j

]= P(Envs ,vs). (8.3.13)

This advances the induction hypothesis. The claim in Lemma 8.10 then followsby induction.

We next study the probabilities of the events Env,v when nv ≤ 2:

Lemma 8.11 (Edge connection events for at most two edges) Consider PA(m,δ)

n

and denote γ = m/(2m+δ). There exist absolute constants M1 = M1(δ,m),M2 =M2(δ,m), such that the following bounds hold:(i) For m = 1 and any u > v,

P(u1 v) =

1 + δ

2 + δ

Γ(u)Γ(v − 12+δ

)

Γ(u+ 1+δ2+δ

)Γ(v). (8.3.14)

Consequently, for m ≥ 2 and any 1 ≤ j ≤ m and u > v,

P(u

j v

)=

m+ δ

2m+ δ

1

u1−γvγ(1 + o(1)) ≤ M1

u1−γvγ. (8.3.15)

the asymptotics in the first equality in (8.3.15) referring to the limit when v growslarge.(ii) For m = 1 and any u2 > u1 > v,

P(u1

1 v, u2

1 v

)=

1 + δ

2 + δ

Γ(u2)Γ(u1 + 12+δ

)Γ(v + 1+δ2+δ

)

Γ(u2 + 1+δ2+δ

)Γ(u1 + 1)Γ(v + 3+δ2+δ

). (8.3.16)


Consequently, for m ≥ 2 and any 1 ≤ j1, j2 ≤ m and u2 > u1 > v,

P(u1

j1 v, u2j2 v

)≤ m+ δ

2m+ δ

m+ 1 + δ

2m+ δ

1

(u1u2)1−γv2γ(1 + o(1)) (8.3.17)

≤ M2

(u1u2)1−γv2γ,

the asymptotics in the first equality in (8.3.17) referring to the limit when v growslarge.

Proof We only prove (8.3.14) and (8.3.16), the bounds in (8.3.15) and (8.3.17)follow immediately from the Stirling-type formula in [Volume 1, (8.3.9)].

By the definition of (PA(m,δ)

n )n≥1 in terms of (PA(1,δ/m)

n )n≥1, this implies theresult for general m ≥ 1, where the factors of m follow from the fact that vertexu in PA(m,δ)

n corresponds to vertices mu, . . . ,m(u+ 1)− 1 in PA(1,δ/m)

mn , which are

all at least mu. Note, in particular, that uj v for m ≥ 2 in PA(m,δ)

n is equivalent

to m(u− 1) + j1 [ms] \ [m(s− 1)] in PA(1,δ/m)

mn . We will thus start the proofs ofboth parts (i) and (ii) with m = 1, and then use the results obtained to deducethe results for m ≥ 2.

Fix m = 1 and consider part (i). For (8.3.14), we use [Volume 1, Theorem 8.2]to compute, for u > v,

P(u

1 v

)= E

[E[1u 1

v | PA(1,δ)

u−1]]

= E[ Dv(u− 1) + δ

(2 + δ)(u− 1) + 1 + δ

]=

1 + δ

(2 + δ)(u− 1) + 1 + δ

Γ(u)Γ(v − 12+δ

)

Γ(u− 1 + 1+δ2+δ

)Γ(v)

=1 + δ

2 + δ

Γ(u)Γ(v − 12+δ

)

Γ(u+ 1+δ2+δ

)Γ(v), (8.3.18)

as required. The proof of (8.3.15) follows by recalling that uj v occurs when

(m − 1)u + j1 [mv] \ [m(v − 1)]. Since j ∈ [m] and m ≥ 2 is fixed, we obtain

(8.3.15).

We proceed with the proof of part (ii) and start with (8.3.16). Take u2 > u1.We compute

P(u1

1 v, u2

1 v

)= E

[P(u1

1 v, u2

1 v

∣∣PA(1,δ)

u2−1

)](8.3.19)

= E[1u1

1 v

Dv(u2 − 1) + δ

(u2 − 1)(2 + δ) + 1 + δ

].


We use the iteration, for u2 − 1 ≥ u,

E[1u1

1 v (Dv(u) + δ)

](8.3.20)

=(

1 +1

(2 + δ)(u− 1) + 1 + δ

)E[1u1

1 v (Dv(u− 1) + δ)

]=

u

u− 1 + 1+δ2+δ

E[1u1

1 v (Dv(u− 1) + δ)

]=

Γ(u+ 1)Γ(u1 + 1+δ2+δ

)

Γ(u+ 1+δ2+δ

)Γ(u1 + 1)E[1u1

1 v (Dv(u1) + δ)

].

Therefore,

P(u1

1 v, u2

1 v

)(8.3.21)

=1

(u2 − 1)(2 + δ) + 1 + δ

Γ(u2)Γ(u1 + 1+δ2+δ

)

Γ(u2 − 12+δ

)Γ(u1 + 1)E[1u1

1 v (Dv(u1) + δ)

]=

1

2 + δ

Γ(u2)Γ(u1 + 1+δ2+δ

)

Γ(u2 + 1+δ2+δ

)Γ(u1 + 1)E[1u1

1 v (Dv(u1) + δ)

].

We are lead to compute E[1u1

1 v (Dv(u1) + δ)

]. We use recursion to obtain

E[1u1

1 v (Dv(u1) + δ)

∣∣PA(m,δ)

u1−1

](8.3.22)

= E[1u1

1 v (Dv(u1)−Dv(u1 − 1))

∣∣PA(m,δ)

u1−1

]+ E

[1u1

1 v (Dv(u1 − 1) + δ)

∣∣PA(m,δ)

u1−1

]=

(Dv(u1 − 1) + δ)(Dv(u1 − 1) + 1 + δ)

(u1 − 1)(2 + δ) + 1 + δ.

By [Volume 1, Proposition 8.15],

E[(Dv(u) + δ)(Dv(u) + 1 + δ)] =1

c2(u)E[Zv,2(u)] =

c2(v)

c2(u)(2 + δ)(1 + δ). (8.3.23)

Recalling that ck(j) = Γ(j + 1+δ2+δ

)/Γ(j + k+1+δ2+δ

), this brings us to

E[(Dv(u) + δ)(Dv(u) + 1 + δ)] =Γ(u+ 3+δ

2+δ)Γ(v + 1+δ

2+δ)

Γ(u+ 1+δ2+δ

)Γ(v + 3+δ2+δ

)(2 + δ)(1 + δ). (8.3.24)


Consequently,

E[1u1

1 v (Dv(u1 − 1) + δ)

](8.3.25)

=Γ(u1 + 1

2+δ)Γ(v + 1+δ

2+δ)

[(u1 − 1)(2 + δ) + 1 + δ]Γ(u1 − 12+δ

)Γ(v + 3+δ2+δ

)(2 + δ)(1 + δ)

= (1 + δ)Γ(u1 + 1

2+δ)Γ(v + 1+δ

2+δ)

Γ(u1 + 1+δ2+δ

)Γ(v + 3+δ2+δ

).

Combining (8.3.21)–(8.3.25), we arrive at

P(u1

1 v, u2

1 v

)(8.3.26)

=1 + δ

2 + δ

Γ(u2)Γ(u1 + 1+δ2+δ

)

Γ(u2 + 1+δ2+δ

)Γ(u1 + 1)×

Γ(u1 + 12+δ

)Γ(v + 1+δ2+δ

)

Γ(u1 + 1+δ2+δ

)Γ(v + 3+δ2+δ

)

=1 + δ

2 + δ

Γ(u2)Γ(u1 + 12+δ

)Γ(v + 1+δ2+δ

)

Γ(u2 + 1+δ2+δ

)Γ(u1 + 1)Γ(v + 3+δ2+δ

),

as claimed in (8.3.16).

The proof of (8.3.17) follows again by recalling that uj v occurs when (m−

1)u+j1 [mv]\ [m(v−1)]. Now there are two possibilities, depending on whether

m(u1−1)+j11 v1 and m(u2−1)+j2

1 v2 for the same v1 = v2[mv]\ [m(v−1)],

or for two different v1, v2 ∈ [mv] \ [m(v − 1)].In the case where v1 = v2, we use (8.3.16) to obtain a contribution that is

asymptotically equal to

m

m2

m+ δ

2m+ δ

1

(u1u2)1−γv2γ(1 + o(1)), (8.3.27)

where the factor m comes from the m distinct choices for v1, and the factor 1/m2

originates since we should multiply u1, u2 and v in (8.3.16) by m.In the case where v1 6= v2, we use the negative correlation in Lemma 8.10 to

bound this contribution from above by the product of the probabilities in (8.3.15),so that this contribution is asymptotically bounded by

m(m− 1)

m2

( m+ δ

2m+ δ

)2 1

(u1u2)1−γv2γ(1 + o(1)). (8.3.28)

Summing (8.3.27) and (8.3.28) completes the proof of (8.3.17).

With Lemmas 8.10 and 8.11 in hand, we are ready to prove Proposition 8.9:

Proof of Proposition 8.9. Since ~π is self-avoiding, we can write

~π ⊆ PA(m,δ)

n =k⋂s=1

Envs ,vs , (8.3.29)


where either

Envs ,vs = u j vs (8.3.30)

for some u > v and some j ∈ [m], or

Envs ,vs = u1j1 vs, u2

j2 vs, (8.3.31)

for some u1, u2 > v and some j1, j2 ∈ [m].In the first case, by (8.3.14),

P(Envs ,vs) = P(u

j vs

)≤ M1

u1−γvγs, (8.3.32)

whereas in the second case, according to (8.3.16),

P(Envs ,vs) = P(u1j1 vs, u2

j2 vs) ≤M2

(u1u2)1−γv2γ=

M2

u1−γ1 vγu1−γ

2 vγ. (8.3.33)

In both cases Mi, i = 1, 2, is an absolute constant. Lemma 8.10 then yields (8.3.1),where the factor m2l originates from the number of possible choices of ji ∈ [m]for i ∈ [k] and the possible si that are collapsed to the same vertex.

We close this section by proving some results that have been used earlier. Thefollowing variance estimate was used in the proof of Proposition 8.5:

Lemma 8.12 (A variance estimate on Nk(n)) Recall the definition of Nk(n) in(8.1.41) and let T = nε for some ε > 0. Then there exist constants C > 0 andη = η(ε) > 0 such that

Var(Nk(n)) ≤ CE[Nk(n)]2n−η. (8.3.34)

Proof By Exercise 8.4, when the path επ = (π0, . . . , πk) is completely disjointfrom ~ρ = (ρ0, . . . , ρk),

P( k−1⋂i=0

πi πi+1 ∩k−1⋂i=0

ρi ρi+1)

(8.3.35)

≤ P( k−1⋂i=0

πi πi+1)P( k−1⋂i=0

ρi ρi+1).

Therefore, the indicators of disjoint paths are negatively correlated. As a result,we can bound

Var(Nk(n)) ≤∑

~π,~ρ : ~π∩~ρ6=∅

P(~π, ~ρ ⊆ PA(1,δ)

n (b)). (8.3.36)

Since m = 1, the paths ~π, ~ρ of length k must merge at some point, before movingoff to their common end point in [n]. When ~ρ = ~π,then we obtain a contributionE[Nk(n)], so that from now on we assume that ~π 6= ~ρ.

Write ~π = (π0, . . . , πk) and ~ρ = (ρ0, . . . , ρk) and ~π 6= ~ρ. Then there must be an


l ∈ [k − 1] such that πj = ρj for all j = l, . . . , k. Fix two fixed paths ~π and ~ρ forwhich πj = ρj for all j = l, . . . , k, while πl−1 6= ρl−1. Then, by Lemma 8.10,

P(π, ρ ⊆ PA(1,δ)

n (b)) ≤l−1∏i=1

P(πi−1 πi)P(ρi−1 ρi) (8.3.37)

× P(πl−1, ρl−1 πl)k∏

j=l+1

P(πj−1 πj).

By (8.1.9) in Proposition 8.4,

l−1∏i=1

P(πi−1 πi) =(1 + δ

2 + δ

)l−1 Γ(π0 + 12+δ

)Γ(πl−1)

Γ(πl−1 + 12+δ

)Γ(π0 + 1)

l−2∏i=1

1

πi(8.3.38)

By symmetry, we may without loss of generality that assume πl−1 > ρl−1. Then,by (8.3.16),

P(πl−1, ρl−1 πl) = (1 + δ)Γ(ρl−1 − δ

2+δ)Γ(πl−1 − 1+δ

2+δ)Γ(πl)

Γ(ρl−1 + 12+δ

)Γ(πl−1)Γ(πl + 22+δ

). (8.3.39)

As a result,

P(~π, ~ρ ⊆ PA(1,δ)

n (b)) (8.3.40)

≤(1 + δ

2 + δ

)l+k−3 Γ(π0 + 12+δ

)Γ(πl−1)

Γ(πl−1 + 12+δ

)Γ(π0 + 1)

Γ(ρ0 + 12+δ

)Γ(ρl−1)

Γ(ρl−1 + 12+δ

)Γ(ρ0 + 1)

× (1 + δ)Γ(ρl−1 − δ

2+δ)Γ(πl−1 − 1+δ

2+δ)Γ(πl)

Γ(ρl−1 + 12+δ

)Γ(πl−1)Γ(πl + 22+δ

)

l−2∏i=1

1

πiρi

×Γ(πl + 1

2+δ)Γ(πk)

Γ(πk + 12+δ

)Γ(πl + 1)

k−1∏i=l+1

1

πi

= (1 + δ)(1 + δ

2 + δ

)l+k−3 Γ(π0 + 12+δ

)

Γ(π0 + 1)

Γ(ρ0 + 12+δ

)

Γ(ρ0 + 1)

Γ(ρl−1)Γ(ρl−1 − δ2+δ

)

Γ(ρl−1 + 12+δ

)Γ(ρl−1 + 12+δ

))

×Γ(πl + 1

2+δ)

Γ(πl + 22+δ

)

Γ(πk)

Γ(πk + 12+δ

)

1

(πl−1 + 12+δ

)

l−2∏i=1

1

πiρi

k−1∏i=l

1

πi.

By [Volume 1, (8.3.9)], this can be bounded by

C(1 + δ

2 + δ

)l+k(π0ρ0)−(1+δ)/(2+δ)(πlπk)

−1/(2+δ)l−1∏i=1

1

ρi

k−1∏i=1

1

πi, (8.3.41)

where C is a uniform constant. We need to sum the above over l < k, all decreasing~π = (π0, . . . , πk) with πk ∈ [n] and all decreasing (ρ0, . . . , ρl−1) with ρl−1 > πl. Thesum can be bounded from above by summing over all decreasing (ρ0, . . . , ρl−1),


and bounding π−1/(2+δ)l ≤ 1, to obtain an upper bound as in (8.1.43) of the form

E[Nk(n)] ≤(1 + δ

2 + δ

)l (log n)l

l!

∑s∈[T ]

s−(1+δ)/(2+δ). (8.3.42)

Therefore,

Var(Nk(n)) ≤ E[Nk(n)] + E[Nk(n)]k−1∑l=1

(1 + δ

2 + δ

)l (log n)l

l!

∑s∈[T ]

s−(1+δ)/(2+δ).

(8.3.43)When T = nε, it is not hard to adapt the arguments in (8.1.43)–(8.1.49) to showthat this is at most E[Nk(n)]2n−η for some η = η(ε) > 0, as required.

We use similar means as in the above argument to show the missing ingredientdist

PA(1,δ)n (b)

(1, V ) = oP(log n) in the proof of Theorem 8.3:

Lemma 8.13 (Distance to most-recent common ancestor) Fix o1, o2 to be twovertices in [n] chosen independently and uniformly at random, and let V be theoldest vertex that the path from 1 to o1 and that from 1 to o2 have in common inPA(1,δ)

n . Then,

distPA

(1,δ)n (b)

(1, V )

log nP−→ 0. (8.3.44)

Proof We will show that, for every ε > 0,

P(dist

PA(1,δ)n (b)

(1, V ) > ε log n)

= o(1). (8.3.45)

By definition,

P(dist

PA(1,δ)n (b)

(1, V ) > ε log n)

(8.3.46)

=1

n2

∑k>ε logn

∑v,u1,u2,s1,s2

P(dist

PA(1,δ)n (b)

(1, v) = k, s1, s2 v, s1 ←→ u1, s2 ←→ u2

).

We use Lemma 8.10 to bound this by

P(dist

PA(1,δ)n (b)

(1, V ) > ε log n)

(8.3.47)

=1

n2

∑k≥ε logn

∑v,u1,u2,s1,s2

P(dist(1, v) = k)P(s1, s2 v)P(s1 ←→ u1)P(s2 ←→ u2),

where we also rely on (8.1.11) to note that

P(dist(s1, u1) = l) =∑

π1,...,πl−1

P(s1 π1 · · · πl−1 u1) (8.3.48)

=∑

π1,...,πl−1

P(s1 π1)P(πl−1 u1)l−2∏i=1

P(πi πi+1).

This allows us to recombine products of probabilities into connection events.


By (8.1.9) in Proposition 8.4,

P(dist(s1, u1) = l) =(1 + δ

2 + δ

)l Γ(u1 + 12+δ

)Γ(s1)

Γ(s1 + 12+δ

)Γ(u1 + 1)

∑~πl

l−1∏i=1

1

πi(8.3.49)

≤(1 + δ

2 + δ

)l Γ(u1 + 12+δ

)Γ(s1)

Γ(s1 + 12+δ

)Γ(u1 + 1)

log(u1/s1)l−1

(l − 1)!.

Summing this over l ≥ 1 leads to

P(s1 ←→ u1) ≤ C(u1

s1

)(1+δ)/(2+δ) Γ(u1 + 12+δ

)Γ(s1)

Γ(s1 + 12+δ

)Γ(u1 + 1)(8.3.50)

≤ C(u1

s1

)(1+δ)/(2+δ)

s−1/(2+δ)1 u

−(1+δ)/(2+δ)1 =

C

s1

.

Thus, we obtain

P(dist

PA(1,δ)n (b)

(1, V ) > ε log n)

(8.3.51)

≤ C

n2

∑k>ε logn

∑v,s1,s2

P(dist(1, v) = k)P(s1, s2 v)(n− s1)(n− s2)

s1s2

,

By (8.3.17) in Lemma 8.11, we can further bound this by

P(dist

PA(1,δ)n (b)

(1, V ) > ε log n)

(8.3.52)

≤ C∑

k>ε logn

∑v,s1,s2

P(dist(1, v) = k)1

(s1s2)2−γv2γ,

with the restriction that v < s1, s2. Summing out over s1, s2 > v and using that2− γ > 1 leads to

P(dist

PA(1,δ)n (b)

(1, V ) > ε log n)≤ C

∑k>ε logn

∑v

P(dist(1, v) = k)1

v2. (8.3.53)

Again using (8.3.49), we are led to

P(dist

PA(1,δ)n (b)

(1, V ) > ε log n)

(8.3.54)

≤ C∑

k>ε logn

∑v

(1 + δ

2 + δ

)kΓ(v + 12+δ

)

Γ(v + 1)

(log v)k−1

(k − 1)!

1

v2

≤ C∑

k>ε logn

∑v

(1 + δ

2 + δ

)k (log v)k−1

(k − 1)!

1

v2+(1+δ)/(2+δ)

≤ CP(Poi((1 + δ)(logW )/(2 + δ)

)≥ ε log n

),

where P(W = v) = A/v2+(1+δ)/(2+δ) for an appropriate A > 0 and for all v ≥ 1.Since (1 + δ)(logW )/(2 + δ) is a finite random variable, this probability vanishes,as required.

In Exercise 8.8, you are asked to conclude from the above proof that distPA

(1,δ)n (b)

(1, V )is a tight sequence of random variables.


8.4 Small-world effect in PAMs: logarithmic lower bounds for δ ≥ 0

In this section we prove lower bounds on distances in PA(m,δ)

n with m ≥ 2, usingthe bounds derived in Section 8.3 on the probability that a path exists in PA(m,δ)

n ,which is our main tool in this section. We prove the lower bounds on distancesfor δ > 0 in Section 8.4.1 and for δ = 0 in Section 8.4.2. The doubly logarithmicupper bounds on distances for δ < 0 are deferred to Section 8.5.

8.4.1 Logarithmic lower bounds on distances for δ > 0

In this section we investigate logarithmic lower bounds on the distances whenδ > 0, in which case γ = m/(2m + δ) < 1

2, as stated in the lower bound in

Theorem 8.6.We start by proving a lower bound with the wrong constant. By Proposition

8.9,

P(

distPA

(m,δ)n

(1, n) = k)≤ ck

∑~π

k−1∏j=0

1

(πj ∧ πj+1)γ(πj ∨ πj+1)1−γ , (8.4.1)

where c = m2C, and where the sum is over all self-avoiding paths π = (π0, . . . , πk)with πk = n, π0 = 1. Define

fk(i, n) =∑~π

k−1∏j=0

1

(πj ∧ πj+1)γ(πj ∨ πj+1)1−γ , (8.4.2)

where now the sum is over all self-avoiding ~π = (π0, . . . , πk) with πk = n, π0 = i,so that

P(

distPA

(m,δ)n

(i, n) = k)≤ ckfk(i, n). (8.4.3)

We study the function fk(i, n) in the following lemma, which shows that k 7→fk(i, n) grows at most exponentially, and gives uniform control over the depen-dence on i and n as well:

Lemma 8.14 (A bound on fk) Fix γ < 12. Then, for every b > γ such that

γ + b < 1, there exists a Cγ,b > 0 such that, for every 1 ≤ i < n and all k ≥ 1,

fk(i, n) ≤ Ckγ,b

ibn1−b . (8.4.4)

We will prove sharper asymptotics on the graph distances in PA(m,δ)

n in thesequel. However, also in that proof, Lemma 8.14 turns out to be convenient.

Proof We prove the lemma using induction on k ≥ 1. To initialize the inductionhypothesis, we note that, for 1 ≤ i < n and every b ≥ γ,

f1(i, n) =1

(i ∧ t)γ(i ∨ n)1−γ =

1

iγn1−γ =1

n

(ni

)γ≤ 1

n

(ni

)b=

1

ibn1−b . (8.4.5)

This initializes the induction hypothesis whenever Cγ,b ≥ 1.

8.4 Small-world effect in PAMs: logarithmic lower bounds for δ ≥ 0321

To advance the induction hypothesis, note that

fk(i, n)≤i−1∑s=1

1

sγi1−γfk−1(s, n) +

∞∑s=i+1

1

iγs1−γ fk−1(s, n). (8.4.6)

We now bound each of these two contributions in (8.4.6), making use of theinduction hypothesis. We bound the first sum in (8.4.6) by

i−1∑s=1

1

sγi1−γfk−1(s, n) ≤ Ck−1

γ,b

i−1∑s=1

1

sγi1−γ1

sbn1−b =Ck−1γ,b

i1−γn1−b

i−1∑s=1

1

sγ+b(8.4.7)

≤ 1

1− γ − bCk−1γ,b

ibn1−b ,

since γ + b < 1. We bound the second sum in (8.4.6) by

∞∑s=i+1

1

iγs1−γ fk−1(s, n) ≤ Ck−1γ,b

n−1∑s=i+1

1

iγs1−γ1

sbn1−b + Ck−1γ,b

∞∑s=n+1

1

iγs1−γ1

nbs1−b

=Ck−1γ,b

iγn1−b

n−1∑s=i+1

1

s1−γ+b+Ck−1γ,b

iγtb

∞∑s=n+1

1

s2−γ−b

≤ 1

b− γCk−1γ,b

ibn1−b +1

1− γ − bCk−1γ,b

ibn1−b , (8.4.8)

since 1 + b− γ > 1, 2− γ − b > 1, b > γ and (n/i)γ ≤ (n/i)b. We conclude that

fk(i, n) ≤ Ck−1γ,b

ibn1−b

( 1

b− γ +2

1− γ − b)≤ Ck

γ,b

ibn1−b , (8.4.9)

whenever

Cγ,b =1

b− γ +2

1− γ − b ≥ 1. (8.4.10)

This advances the induction hypothesis, and completes the proof of Lemma 8.14.

We next extend the above discussion to typical distances:

Lemma 8.15 (Typical distances for δ > 0) Fix m ≥ 1 and δ > 0. Letdist

PA(m,δ)n

(o1, o2) be the distance between two uniformly chosen vertices in [n].Then, whp, dist

PA(m,δ)n

(o1, o2) ≥ c2 log n for c2 = c2(m, δ) > 0 sufficiently small.

Proof By Lemma 8.14, with K = log (cCa,b ∨ 2) and γ < b < 1− γ, and for all1 ≤ i < j ≤ t,

P(dist

PA(m,δ)n

(i, j) = k)≤ ckfk(i, j) ≤

eKk

ibj1−b . (8.4.11)

As a result,

P(

distPA

(m,δ)n

(i, j) ≤ c2 log n)≤ nKc2

ibj1−beK

eK − 1, (8.4.12)


and thus, using also∑j−1

i=1 i−b ≤ j1−b/(1− b),

P(distPA

(m,δ)n

(o1, o2) ≤ c2 log n) (8.4.13)

=2

n2

∑1≤i<j≤n

P(

distPA

(m,δ)n

(i, j) ≤ c2 log n)

≤ 2∑

1≤i<j≤n

nKc2

ibj1−b = O(nKc2−1) = o(1),

for every c2 > 0 such that Kc2 + 1 < 2.

8.4.2 Extension to lower bounds on distances for δ = 0 and m ≥ 2

In this section we investigate lower bounds on the distances in PA(m,δ)

n whenm ≥ 2 and δ = 0 and prove the lower bound in Theorem 8.7.

We again start from Proposition 8.9, which as we will show implies that forδ = 0,

k?n =log n

log(2Cm2 log n)(8.4.14)

is a lower bound for the typical distances in PA(m,δ)

n . Consider a path ~π =(π0, π1, . . . , πk), then (8.3.1) in Proposition 8.9 implies that

P(~π ⊆ PA(m,δ)

n ) ≤ (Cm2)kk−1∏j=0

1√πjπj+1

=(Cm2)k√π0πk

k−1∏j=1

1

πj. (8.4.15)

We use∑n

a=1 1/√a ≤

∫ n0

1/√xdx = 2

√n to compute that

P(distPA

(m,δ)n

(o1, o2) = k) ≤ 4(Cm2)k

n2

∑~π

1√π0πk

k−1∏j=1

1

πj(8.4.16)

≤ 4(Cm2)k

n

∑1≤π1,...,πk−1≤n

k−1∏j=1

1

πj.

Thus,

P(distPA

(m,δ)n

(o1, o2) = k) ≤ 4(Cm2)k

n

(n∑s=1

1

s

)k−1

(8.4.17)

≤ 4(Cm2)k

n(log n)k−1

≤ 4( 12)k(log n)−1 → 0,

precisely when (2Cm2 log n)k ≤ n, or, equivalently,

k ≤ log n

log(2Cm2 log n). (8.4.18)

8.5 Typical distance in PA-models: log log-lower bound for δ < 0 323

Equality in (8.4.18) holds for k = k?n in (8.4.14). This implies that the typicaldistances are whp at least k?n in (8.4.14), and completes the proof of the lowerbound on the graph distances for δ = 0 in Theorem 8.7.

In Exercise 8.6, you are asked to prove that also the distance between verticesn− 1 and n is also whp at least k?n in (8.4.14). In Exercise 8.7, you are asked tocheck whether the above proof also implies that the distance between vertices 1and 2 is also whp at least k?n in (8.4.14).

8.5 Typical distance in PA-models: log log-lower bound for δ < 0

In this section we prove the lower bound in Theorem 8.8. We do so in a moregeneral setting, by assuming an upper bound on the existence of paths in themodel that is inspired by Proposition 8.9:

Assumption 8.16 (Path probabilities) There exist κ and γ such that, for alln and self-avoiding paths ~π = (π0, . . . , πk) ∈ [n]l,

P(~π ⊆ PAn) ≤k∏i=1

κ(πi−1 ∧ πi)−γ(πi ∨ πi−1)γ−1. (8.5.1)

By Proposition 8.9, Assumption 8.16 is satisfied for PA(m,δ)

n with γ = m/(2m+δ). We expect log log-distances in such networks if and only if δ ∈ (−m, 0), sothat 1

2< γ < 1. Theorem 8.17, which is the main result in this section, gives a

lower bound on the typical distance in this case:

Theorem 8.17 (Doubly logarithmic lower bound on distances PAMs) Let(PAn)n≥1 be a random graph model that satisfies Assumption 8.16 for some γsatisfying 1

2< γ < 1. Fix random vertices o1 and o2 chosen independently and

uniformly from [n], and define

τ =2− γ1− γ ∈ (2, 3). (8.5.2)

Then, for K large,

P(

distPAn(o1, o2) ≥ 4 log log n

| log(τ − 2)| −K)

= 1− o(1). (8.5.3)

Here, o(1) refers to a term that vanishes as first n→∞, followed by K →∞.

We will prove Theorem 8.17 in the form where | log(τ − 2)| is replaced withlog(γ/(1− γ)). For PA(m,δ)

n , γ = m/(2m+ δ), so that

γ

1− γ =m

m+ δ=

1

τ − 2, (8.5.4)

where we recall that τ = 3 + δ/m. Therefore, Theorem 8.17 proves the lowerbound in Theorem 8.8, and even extends this to lower tightness for the distances.


We will prove a slightly different version of Theorem 8.17, namely, that, uni-formly in u, v ≥ εn, we can choose K = Kε > 0 sufficiently large, so that,uniformly in n ≥ 1,

P(distPAn(u, v) ≤ 4 log log n

| log(τ − 2)| −K)≤ ε. (8.5.5)

Since P(o1 ≤ εn) ≤ ε, this clearly implies Theorem 8.17.

The proof of Theorem 8.17 is based on a constrained or truncated first momentmethod, similar to the ones used for NRn(w) in Theorem 6.6 and for CMn(d)in Theorem 7.6. Due to the fact that the probability for existence of paths inAssumption 8.16 satisfies a rather different bound compared to those for NRn(w)and CMn(d), this truncated first order method looks rather different comparedto those presented in the proof of Theorems 6.6 and 7.6.

Let us now briefly explain the truncated first moment method. We start withan explanation of the (unconstrained) first moment bound and its shortcomings.Let v, w be distinct vertices of PAn. We think of v, w ≥ εn for ε > 0 sufficientlysmall. Then, for kn ∈ N,

P(distPAn(u, v) ≤ 2kn) = P

(2kn⋃k=1

⋃~π

~π ⊆ PAn)

≤2kn∑k=1

∑π

k∏j=1

p(πj−1, πj), (8.5.6)

where ~π = (π0, . . . , πk) is a self-avoiding path in PAn with π0 = u and πk = vand, for a, b ∈ N, we define

p(a, b) = κ(a ∧ b)−γ(a ∨ b)γ−1. (8.5.7)

To each path ~π = (π0, . . . , πk), we assign the weight

p(~π) =k∏j=1

p(πj−1, πj), (8.5.8)

so that the upper bound is just the sum over the weights of all paths from u tov of length no more than 2kn.

The shortcoming of the above bound is that the paths that contribute mostto the total weight are those that connect v, respectively w, quickly to verticeswith extremely small indices, and consequently extremely high degrees. Since onthe other hand, such paths are quite unlikely to be present, it is actually quiteunlikely that this occurs. This explain why they have to be removed in order toget a reasonable estimate, and why this leads to small errors when we do so.

To this end we define a decreasing sequence g = (gk)k=0,...,k of positive integersand consider a path ~π = (π0, . . . , πk) to be good when πl ∧ πk−l ≥ gl for alll ∈ 0, . . . , k. We denote the event that there exists a good path of length k


between v and w by Ek(v, w). We further denote the event that there exists a badpath of length k in PAn starting at v by Fk(v). This means that there exists apath ~π with u = π0 ↔ · · · ↔ πl, such that π0 ≥ g0, . . . , πl−1 ≥ gl−1, but πl < gl,i.e., a path that traverses the threshold after exactly l steps. For fixed verticesu, v ≥ g0, we thus obtain

P(distPAn(u, v) ≤ 2k) ≤k∑l=1

P(Fl(u)) +k∑l=1

P(Fl(v)) +2k∑k=1

P(El(u, v)). (8.5.9)

The truncated first moment estimate arises when bounding the events of theexistence of certain good or bad paths by their expected number. Due to thesplit in good and bad paths, these sums will now behave better than without thissplit.

Equation (8.5.9) is identical to the inequality (6.2.31) used in the proof ofTheorem 6.6. However, the notion of good has changed due to the fact that verticesno longer have a weight, but rather an age, and vertices that have appeared earlyin PAn are the most likely to have large degrees. This explains why good verticeshave high indices for PAn, while good vertices have high weights for NRn(w).

By assumption,

P(~π ⊆ PAn) ≤ p(~π) (8.5.10)

so that for v ≥ `0 and l ∈ [k], and with ~π = (π0, . . . , πl) with π0 = u,

P(Fl(u)) ≤n∑

π1=g1

. . .n∑

πl−1=gl−1

gl−1∑πl=1

p(~π). (8.5.11)

Given ε > 0 we choose g0 = dεne and (gj)j=0,...,k decreasing fast enough so thatthe first two summands on the right hand side of (8.5.9) together are no largerthan 2ε. For l ∈ [k] and u, v ∈ [n], we set

fl,n(u, v) := 1v≥g0

n∑π1=g1

. . .n∑

πl−1=gl−1

p(u, π1, . . . , πl−1, v), (8.5.12)

and set f0,n(u, v) = 1v=u1u≤n. To rephrase the truncated first moment esti-mate in terms of fl,n, note that p is symmetric so that, for all l ≤ 2k,

P(El(u, v)) ≤n∑

π1=g1

. . .n∑

πbl/2c=gbl/2c

. . .n∑

πl−1=g1

p(u, π1, . . . , πbl/2c)p(πbl/2c, . . . , πl−1, v)

=n∑

πbl/2c=gbl/2c

fbl/2c,n(u, πbl/2c)fdl/2e,n(v, πbl/2c). (8.5.13)

Using the recursive representation

fk+1,n(u, v) =n∑

w=gk

fk,n(u,w)p(w, v), (8.5.14)


we establish upper bounds on fk,n(u, v) and use these to show that the rightmostterm in (8.5.9) remains small when k = kn is chosen appropriately. This leadsto the lower bounds for the typical distance in Theorem 8.17. Let us now makethese ideas precise, starting with the derivation of some intermediate results, aswell as bounds on the growth of k 7→ fk,n(u, v).

We assume that Assumption 8.16 holds for a γ ∈ ( 12, 1) with a fixed constant κ.

Recall the definition of fk,n and the key estimates (8.5.9), (8.5.11) and (8.5.13),which combined give the truncated first moment bound

P(distPAn(u, v) ≤ 2kn) ≤kn∑k=1

gk−1∑w=1

fk,n(u,w) +kn∑k=1

gk−1∑w=1

fk,n(v, w) (8.5.15)

+2kn∑k=1

n∑πbk/2c=gbk/2c

fbk/2c,n(u, πbk/2c)fdk/2e,n(v, πbk/2c).

The remaining task of the proof is to choose kn ∈ N, as well as a decreasingsequence (gk)

knk=0 such that 2 ≤ gkn ≤ · · · ≤ g0 ≤ n, that allow us to bound the

right-hand side of (8.5.15).

Our aim is to provide an upper bound of the form

fk,n(u, v) ≤ αkv−γ + 1v>gk−1βkvγ−1, (8.5.16)

for suitably chosen parameters αk, βk ≥ 0. Key to this choice is the followinglemma:

Lemma 8.18 (A recursive bound on fk,n(v,m) for γ ∈ ( 12, 1)) Let γ ∈ ( 1

2, 1)

and suppose that 2 ≤ ` ≤ n, α, β ≥ 0 and q : [n]→ [0,∞) satisfies

q(w) ≤ 1w≥`(αw−γ + βwγ−1) for all w ∈ [n]. (8.5.17)

Then there exists a constant c = c(γ, κ) > 1 such that, for all u ∈ [n],

n∑w=1

q(w)p(w, u) ≤ c(α log(n/`) + βn2γ−1

)u−γ (8.5.18)

+ c1u>`(α`1−2γ + β log(n/`)

)uγ−1.

Proof We use (8.5.7) to rewrite

n∑w=1

q(w)p(w, u) =n∑

w=u∨`

q(w)p(w, u) + 1u>`

u−1∑w=`

q(w)p(w, u) (8.5.19)

=n∑

w=u∨`

κ(αw−γ + βwγ−1)wγ−1u−γ

+ 1u>`

u−1∑w=`

κ(αw−γ + βwγ−1)w−γuγ−1.


Simplifying the sums leads to

n∑w=1

q(w)p(w, u) ≤ κ(α

n∑w=u∨`

w−1 + βn∑

w=u∨`

w2γ−2)u−γ (8.5.20)

+ κ1u>`(αu−1∑w=`

w−2γ + βu−1∑w=`

w−1)uγ−1

≤ κ(α log

(u

`− 1

)+

β

2γ − 1n2γ−1

)u−γ

+ κ1u>`( α

1− 2γ(`− 1)1−2γ + β log

(u

`− 1

))uγ−1.

This immediately implies the assertion since ` ≥ 2 and u ∈ [n] by assumption.

We aim to apply Lemma 8.18 iteratively. We use induction in k to prove thatthere exist (gk)k≥0, (αk)k≥1 and (βk)k≥1 such that

fk,n(u, v) ≤ αkv−γ + βkvγ−1 for all u, v ∈ [n]. (8.5.21)

The sequences (gk)k≥0, (αk)k≥1 and (βk)k≥1 are chosen as follows:

Definition 8.19 (Choices of parameters (gk)k≥0, (αk)k≥1 and (βk)k≥1) Fix R >1 and ε ∈ (0, 1). We define

g0 = dεne , α1 = Rgγ−10 , β1 = Rg−γ0 , (8.5.22)

and recursively, for k ≥ 1,

(1) gk is the smallest integer such that

1

1− γαkg1−γk ≥ 6ε

π2k2; (8.5.23)

(2)

αk+1 = c(αk log(n/gk) + βkn

2γ−1); (8.5.24)

(3)

βk+1 = c(αkg

1−2γk + βk log(n/gk)

), (8.5.25)

where c = c(R, γ) > 1 is the constant appearing in Lemma 8.18.

One can check that k 7→ gk is non-increasing, while k 7→ αk, βk are non-decreasing.

We recall that fk,n(u, v) was introduced in (8.5.12), with p(z, w) defined in(8.5.7). As a consequence, fk,n satisfies the recursion, for all k ≥ 1,

fk+1,n(u, v) =n∑

z=gk

fk,n(u, z)p(z, v). (8.5.26)

The following lemma derives recursive bounds on fk,n:


Lemma 8.20 (Recursive bound on fk,n) For the sequences in Definition 8.19,for every l ∈ [n] and k ∈ N,

fk,n(u, v) ≤ αkv−γ + 1v>gk−1βkvγ−1. (8.5.27)

Proof We prove (8.5.27) by induction on k. For k = 1, using α1 = Rgγ−10 and

β1 = Rg−γ0 ,

f1,n(u, v) = p(u, v)1u≥g0 ≤ Rgγ−10 v−γ + 1v>g0Rg

−γ0 vγ−1 (8.5.28)

= α1v−γ + 1l>g0β1v

γ−1,

as required. This initiates the induction hypothesis.We now proceed to advance the induction: suppose that gk−1, αk and βk are

such that

fk,n(u, v) ≤ αkv−γ + 1v>gk−1βkvγ−1. (8.5.29)

We use the recursive property of fk,n in (8.5.26). We apply Lemma 8.18, withg = gk and q(m) = fk,n(x,m)1m≥gk, so, by Definition 8.19,

fk+1,n(u, v) ≤ c[αk log(n/gk) + βkn

2γ−1]v−γ (8.5.30)

+ c1v>gk[αkg

1−2γk + βk log(n/gk)

]vγ−1

= αk+1v−γ + 1v>gkβk+1v

γ−1,

as required. This advances the induction hypothesis, and thus completes theproof.

We next use (8.5.21) to prove Theorem 8.17. Summing over (8.5.27) in Lemma8.20, and using (8.5.21) and (8.5.23) we obtain

gk−1∑w=1

fk,n(v, w) ≤ 1

1− γαkg1−γk ≤ 6ε

π2k2, (8.5.31)

which, when summed over all k ≥ 1 is bounded by ε. Hence the first two sum-mands on the right-hand side in (8.5.15) together are smaller than 2ε. This showsthat the probability that there exists a bad path from either u or v is small, uni-formly in u, v ≥ εn. It thus remains to deal with the good paths.

For this, it remains to choose kn as large as possible, while ensuring that gkn ≥ 2and at the same time

2kn∑k=1

n∑πbk/2c=gbk/2c

fbk/2c,n(u, πbk/2c)fdk/2e,n(v, πbk/2c) = o(1). (8.5.32)

Proving (8.5.56) will be the main content of the remainder of this section.

Recall from Definition 8.19 that gk is the largest integer satisfying (8.5.23)and that the parameters αk, βk are defined via equalities in (8.5.24)–(8.5.25). Toestablish lower bounds for the decay of gk, we instead investigate the growth ofηk = n/gk > 0 for large k. For this, we first derive a recursive bound on ηk:


Lemma 8.21 (Recursive bound on ηk) Recall Definition 8.19, and let ηk =n/gk. Then there exists a constant C > 0 such that

η1−γk+2 ≤ C

[ηγk + η1−γ

k+1 log ηk+1

]. (8.5.33)

Proof By definition of gk in (8.5.23),

η1−γk+2 = n1−γgγ−1

k+2 ≤ n1−γ 1

1− γπ2

6ε(k + 2)2αk+2. (8.5.34)

By definition of αk in (8.5.24),

n1−γ 1

1− γπ2

6ε(k + 2)2αk+2 (8.5.35)

= n1−γ c

1− γπ2

6ε(k + 2)2

[αk+1 log ηk+1 + βk+1n

2γ−1].

We bound each of the two terms in (8.5.35) separately.By definition of gk, the relation in (8.5.23) holds with the opposite inequality if

we replace gk by gk − 1 in the left hand side. This, with k+ 1 instead of k, yields

αk+1 ≤6(1− γ)ε

π2(k + 1)2(gk+1 − 1)γ−1. (8.5.36)

Since αk+1 ≥ 2, we must have that gk+1 ≥ 2, so that

(gk+1 − 1)γ−1 ≤ 21−γgγ−1k+1 . (8.5.37)

We conclude that the first term in (8.5.35) is bounded by

n1−γ c

1− γπ2

6ε(k + 2)2αk+1 log ηk+1 (8.5.38)

≤ n1−γ c21−γ

1− γπ2

6ε(k + 2)2 6(1− γ)ε

π2(k + 1)2gγ−1k+1 log ηk+1

= c21−γ (k + 2)2

(k + 1)2η1−γk+1 log ηk+1,

which is part of the second term in (8.5.33).We now have to bound the second term in (8.5.35) and show that it is bounded

by the right hand side of (8.5.33). This term equals

n1−γ c

1− γπ2

6ε(k + 2)2βk+1n

2γ−1 =c

1− γπ2

6ε(k + 2)2βk+1n

γ . (8.5.39)

We use the definition of βk in (8.5.25) to write

c

1− γπ2

6ε(k + 2)2βk+1n

γ =c

1− γπ2

6ε(k + 2)2nγc

[αkg

1−2γk + βk log ηk

], (8.5.40)

which again leads to two terms that we bound separately. For the first term in


(8.5.40), we again use the fact that αk ≤ 21−γ 6(1−γ)ε

π2k2 gγ−1k , we arrive at that

c

1− γπ2

6ε(k + 2)2nγcαkg

1−2γk (8.5.41)

≤ c21−γ

1− γπ2

6ε(k + 2)2nγc

6(1− γ)ε

π2k2gγ−1k g1−2γ

k = c221−γ (k + 2)2

k2ηγk ,

which contributes to the first term on the right hand side of (8.5.33).By Definition 8.19, we have cβkn

2γ−1 ≤ αk+1, so that, using (8.5.36) and(8.5.37), the second term in (8.5.40) is bounded by

c

1− γπ2

6ε(k + 2)2nγcβk log ηk (8.5.42)

≤ c

1− γπ2

6ε(k + 2)2αk+1n

1−γ log ηk

≤ c21−γ (k + 2)2

(k + 1)2gγ−1k+1n

1−γ log ηk

= c21−γ (k + 2)2

(k + 1)2η1−γk+1 log ηk.

Since k 7→ gk is decreasing, k 7→ ηk is increasing, so that

c21−γ (k + 2)2

(k + 1)2η1−γk+1 log ηk ≤ c21−γ (k + 2)2

(k + 1)2η1−γk+1 log ηk+1, (8.5.43)

which again contributes to the second term in (8.5.33). This proves that bothterms in (8.5.40), and thus the second term in (8.5.40), are bounded by the righthand side of (8.5.33)

Putting together all the bounds and taking a sufficiently large constant C =C(γ), we obtain (8.5.33).

We can now obtain a useful bound on the growth of ηk:

Proposition 8.22 (Inductive bound on ηk) Recall Definition 8.19, and let ηk =n/gk. Assume that ε ∈ (0, 1). Then, there exists a constant B = Bε such that, forany k = O(log log n),

ηk ≤ eB(τ−2)−k/2

, (8.5.44)

where we recall τ from (8.5.2).

Recall that we sum over πk ≥ gk, which is equivalent to n/πk ≤ ηk. The sums in(8.5.15) are such that the summands obey this bound for appropriate values of k.Compare this to (6.2.31), where, instead, the weights obey the bounds wπk ≤ bk.We see that ηk plays a similar role as bk. Recall indeed that wπk is close to thedegree of πk in GRGn(w), while the degree of vertex πk in PA(m,δ)

n is close to(n/πk)

1/(τ−1). Thus, the truncation n/πk ≤ ηk can be interpreted as a bound

e(τ−2)−k/2

on the degree of πk. Note, however, that bk ≈ e(τ−2)−k by (6.2.44),


which grows roughly twice as slowly as ηk. This is again a sign that distances inPA(m,δ)

n for δ < 0 are twice as large as those in GRGn(w).

Proof We prove the proposition by induction on k, and start by initializing theinduction. For k = 0,

η0 = n/g0 =n

dεne ≤ ε−1 ≤ eB, (8.5.45)

when B ≥ log(1/ε), which initializes the induction.We next advance the induction hypothesis. Suppose that the statement is true

for l ∈ [k − 1], and we will advance it to k. We use that

(z + w)1/(1−γ) ≤ 21/(1−γ)(z1/(1−γ) + w1/(1−γ)

). (8.5.46)

By Lemma 8.21, we can then write, for a different constant C,

ηk ≤ C[ηγ/(1−γ)k−2 + ηk−1(log ηk−1)1/(1−γ)

](8.5.47)

= C[η

1/(τ−2)k−2 + ηk−1(log ηk−1)1/(1−γ)

].

Using this inequality, we can write

ηk−2 ≤ C[η

1/(τ−2)k−4 + ηk−3(log ηk−3)1/(1−γ)

], (8.5.48)

so that, by (z + w)1/(τ−2) ≤ 21/(τ−2)(z1/(τ−2) + w1/(τ−2)),

ηk ≤ C(2C)1/(τ−2)[η

1/(τ−2)2

k−4 + η1/(τ−2)k−3 (log ηk−3)1/[(1−γ)(τ−2)]

](8.5.49)

+ Cηk−1(log ηk−1)1/(1−γ).

Renaming 2C as C for simplicity, and iterating these bounds, we obtain

ηk ≤ C∑k/2l=0(τ−2)−lη

(τ−2)−k/2

0 (8.5.50)

+

k/2∑i=1

C∑i−1l=0 (τ−2)−lη

(τ−2)−(i−1)

k−2i+1 (log ηk−2i+1)(τ−2)−(i−1)

1−γ .

For the first term in (8.5.50), we use the upper bound η0 ≤ 1/ε to obtain

C∑k/2l=0(τ−2)−lη

(τ−2)−k/2

0 ≤ C∑k/2l=0(τ−2)−leB(τ−2)−k/2

(8.5.51)

≤ 12eB(τ−2)−k/2

,

again when B ≥ log(1/ε).For the second term in (8.5.50), we use the induction hypothesis to obtain

k/2∑i=1

C∑i−1l=0 (τ−2)−lη

(τ−2)−(i−1)

k−2i+1 (log ηk−2i+1)(τ−2)−(i−1)

1−γ (8.5.52)

≤k/2∑i=1

C∑i−1l=0 (τ−2)−leB(τ−2)−(k−1)/2

[B(τ − 2)−(k−2i+1)/2

] (τ−2)−(i−1)

1−γ.


We can write

eB(τ−2)−(k−1)/2

= eB(τ−2)−k/2

eB(τ−2)−k/2(√τ−2−1). (8.5.53)

Since√τ − 2−1 < 0, for k = O(log log n), we can take B large enough such that,

uniformly in k ≥ 1,

k/2∑i=1

C∑i−1l=0 (τ−2)−leB(τ−2)−k/2(

√τ−2−1)

[B(τ − 2)−(k−2i+1)/2

] (τ−2)−(i−1)

1−γ< 1

2. (8.5.54)

We can now sum the bounds in (8.5.51) and (8.5.52)–(8.5.54) to obtain

ηk ≤(

12

+ 12

)eB(τ−2)−k/2

= eB(τ−2)−k/2

, (8.5.55)

as required. This advances the induction hypothesis, and thus completes the proofof Lemma 8.22.

We are now ready to complete the proof of Theorem 8.17:

Completion of the proof of Theorem 8.17. Recall that we were left to prove(8.5.56), i.e., uniformly in u, v ≥ εn,

2kn∑k=1

n∑πbk/2c=gbk/2c

fbk/2c,n(u, πbk/2c)fdk/2e,n(v, πbk/2c) = o(1). (8.5.56)

We note that, by Lemma 8.22,

gk ≥ n/ηk ≥ neB(τ−2)−k/2

. (8.5.57)

We now use (8.20) to estimate, writing u = πbk/2c,

2kn∑k=1

n∑πbk/2c=gbk/2c

fbk/2c,n(u, πbk/2c)fdk/2e,n(v, πbk/2c) (8.5.58)

≤2kn∑k=1

n∑w=gbk/2c

(αbk/2cw

−γ + 1w>gbk/2c−1βbk/2cwγ−1)(αdk/2ew

−γ + 1w>gdk/2e−1βdk/2ewγ−1).

We use that k 7→ αk and k 7→ βk are non-decreasing, while k 7→ gk is non-increasing, to estimate the above as

2kn∑k=1

n∑πbk/2c=gbk/2c


≤2kn∑k=1

n∑w=gbk/2c

(αdk/2ew

−γ + 1w>gdk/2e−1βdk/2ewγ−1)2

≤ 22kn∑k=1

n∑w=gbk/2c

α2dk/2ew

−2γ + 1w>gdk/2e−1β2dk/2ew

2(γ−1).


This gives two terms that we estimate one by one. We start with, using thatγ > 1

2and using that k 7→ αk is non-decreasing, while k 7→ gk is non-increasing,

22kn∑k=1

n∑w=gbk/2c

α2dk/2ew

−2γ ≤ 1

2γ − 1

2kn∑k=1

α2dk/2eg

1−2γbk/2c (8.5.60)

≤2kn∑k=1

α2dk/2eg

1−2γdk/2e = 2

ηknn

kn∑k=1

α2kg

2−2γk .

Using (8.5.23), we obtain

α2kg

2−2γk ≤

(k−2 6ε

π2(1− γ)

)2

, (8.5.61)

so that

22kn∑k=1

n∑w=gbk/2c

α2dk/2ew

−2γ ≤ Cηkn+1

n

∑k≥1

ε2k−4 ≤ Cηkn+1ε2

n. (8.5.62)

This bounds the first term on the right hand side of (8.5.59).For the second term on the right hand side of (8.5.59), we again use that k 7→ gk

is non-increasing, to obtain

22kn∑k=1

n∑w=gdk/2e−1

β2dk/2ew

2(γ−1) ≤ 4

2γ − 1

kn∑k=1

β2kn

2γ−1. (8.5.63)

By the definition of αk in (8.5.24), we get βkn2γ−1 ≤ αk+1. Thus,

kn∑k=1

β2kn

2γ−1 ≤kn∑k=1

α2k+1n

1−2γ =1

nη2−2γkn+1

kn∑k=1

α2k+1g

2−2γk+1 ≤

Cηkn+1ε2

n, (8.5.64)

as in (8.5.62).We conclude that, using (8.5.44) in Proposition 8.22,

2kn∑k=1

n∑πbk/2c=gbk/2c


≤ C ε2ηkn+1

n≤ Cε2

neB(τ−2)−kn/2 .

Hence, choosing

kn ≤2 log log n

| log (τ − 2)| −K, (8.5.66)

we obtain that, for ε > 0 and K = Kε sufficiently large,

P(distPA

(m,δ)n

(u, v) ≤ 2kn) ≤ 2ε+ ε, (8.5.67)

whenever u, v ≥ g0 = dεne. Note from (8.5.44) in Proposition 8.22 that this choicealso ensures that gkn = n/ηkn ≥ 2 when K is taken sufficiently large. This impliesthe statement of Theorem 8.17.


8.6 Small-world PAMs: logarithmic upper bounds for δ ≥ 0

In this section, we derive the upper bounds on the typical distances as statedin Section 8.2 for δ ≥ 0. This section is organised as follows. In Section 8.6.1,we start by investigating the logarithmic upper bounds in Theorem 8.6. We closein Section ?? by studying the upper bound on typical distances for δ = 0 inTheorem 8.7.

8.6.1 Logarithmic upper bounds for δ > 0

In this section we prove lower bounds on distances in PA(m,δ)

n . We start by prov-ing a relatively simple logarithmic bound on the typical distances for (PA(m,δ)

n )n≥1,after which we extend the result to the optimal constant, which requires morework:

Proof of a logarithmic upper bound in Theorem 8.6. We start by proving a loga-rithmic bound on the diameter of (PA(m,δ)

n (b))n≥1. Since (PA(m,δ)

n (b))n≥1 is obtainedfrom (PA(1,δ/m)

mn (b))n≥1 by collapsing m successive vertices, diam(PA(m,δ)

n (b)) diam(PA(1,δ/m)

mn (b)), and the result follows from Theorem 8.3.We next extend this result to a logarithmic bound on typical distances for

(PA(m,δ)

n )n≥1. We make use of various results from Chapter 5. We start by usingthe local weak convergence result in Theorem 5.24. This implies that |B(n)

r (o(1)

n )|and |B(n)

r (o(2)

n )| converge to two independent sizes of the rth generations of thePolya Point Tree, which is a multi-type branching process. Denote the limits byZ(1)

r and Z(2)

r . Since this multi-type branching process is supercritical, each willcontain a large number of vertices when r is large. We follow the first edge ofthe vertices in the boundaries of B(n)

r (o(1)

n ) and B(n)

r (o(2)

n ), and subsequently followthese edges leading to older vertices. By Theorem 5.5, these paths will end invertex 1 with strictly positive probability, conditionally independently given U ′.As a result, whp, one of these paths will indeed end in vertex 1 whp. We concludethat, for r > 1 large, the diameter of (PA(m,δ)

n )n≥1 is at most 2r plus the diameterof the tree containing vertex 1 in PA(1+δ/m)

nm , which is of logarithmic order againby Theorem 8.3 and its extension to (PA(1,δ)

n )n≥1. This completes the proof.

8.7 Small-world effect in PAMs: loglog upper bounds for δ < 0

In this section we investigate PA(m,δ)

n with m ≥ 2 and δ ∈ (−m, 0) and provethe a first step towards the upper bound in Theorem 8.8.

The proof of Theorem 8.8 is divided into two key steps. In the first, in Theorem8.23 in Section 8.7.1, we give a bound on the diameter of the core which consistsof the vertices with degree at least a certain power of log n. This argument isclose in spirit to the argument in Reittu and Norros (2004) used to prove boundson typical distances in the configuration model, but substantial adaptations arenecessary to deal with preferential attachment, an argument that was alreadyadapted to the configuration model in Theorem 7.9. After this, in Theorem 8.28,

8.7 Small-world effect in PAMs: loglog upper bounds for δ < 0 335

we derive a bound on the distance between vertices with a small degree and thecore. We start by defining and investigating the core of the preferential attach-ment model. In the sequel, it will be convenient to prove Theorem 8.8 for 2nrather than for n. Clearly, this does not make any difference for the results.

8.7.1 The diameter of the core for δ < 0

We adapt the proof of Theorem 7.9 to PA(m,δ)

n . We recall that

τ = 3 +δ

m, (8.7.1)

so that −m < δ < 0 corresponds to τ ∈ (2, 3). Throughout this section, we fixm ≥ 2.

We take σ > 13−τ = −m

δ> 1 and define the core Coren of the PA-model PA(m,δ)

2n

to be

Coren =i ∈ [n] : Di(n) ≥ (log n)σ

, (8.7.2)

i.e., all the vertices in [n] which at time n have degree at least (log n)σ.

Let us explain the philosophy of the proof. Note that Coren only requiresinformation of PA(m,δ)

n , while we will study its diameter in PA(m,δ)

2n . This will allowus to use the edges originating from vertices in [2n]\[n] as a sprinkling of the graphthat will create shortcuts in PA(m,δ)

n . Such shortcuts will tremendously shorten thedistances. We will call the vertices that create such shortcuts n-connectors.

Basically, this argument will show that a vertex v ∈ [n] of large degree Dv(n)1, will likely have an n-connector to a vertex u ∈ [n] satisfying that Du(n) ≥Dv(n)1/(τ−2). Thus, it takes two steps to link a vertex of large degree to anothervertex of even larger degree. In the proof for the configuration model in Theorem7.9, this happened in only one step. Therefore, distances in PA(m,δ)

n are (at leastin terms of upper bounds) twice as large as the corresponding ones for a configu-ration model with similar degree structure. Let us now state our main result, forwhich we need to define some notation.

For a graph G with vertex set [n] and a given edge set, recall that we writedistG(i, j) for the shortest-path distance between i and j in the graph G. Also,for A ⊆ [n], we write

diamn(A) = maxi,j∈A

distPA

(m,δ)n

(i, j). (8.7.3)

Then, the diameter of the core in the graph PA(m,δ)

2n , which we denote by diam2n(Coren),is bounded in the following theorem:

Theorem 8.23 (Diameter of the core) Fix m ≥ 2 and δ ∈ (−m, 0). For everyσ > 1

3−τ , whp there exists a K = Kδ > 0 such that

diam2n(Coren) ≤ 4 log log n

| log (τ − 2)| +K. (8.7.4)


The proof of Theorem 8.23 is divided into several smaller steps. We start byproving that the diameter of the inner core Innern, which is defined by

Innern =i ∈ [n] : Di(n) ≥ n 1

2(τ−1) (log n)−12

, (8.7.5)

is whp bounded by some finite constant Kδ < ∞. After this, we will show thatthe distance from the outer core, which is Outern = Coren\Innern, to the innercore can be bounded by a fixed constant times log log n. This also shows that thediameter of the outer core is bounded by a different constant times log log n. Wenow give the details, starting with the diameter of the inner core:

Proposition 8.24 (Diameter of the inner core) Fix m ≥ 2 and δ ∈ (−m, 0).Then, whp,

diam2n(Innern) < Kδ. (8.7.6)

Before proving Proposition 8.24, we first introduce the important notion of ann-connector between a vertex i ∈ [n] and a set of vertices A ⊆ [n], which plays acrucial role throughout the proof. Fix two sets of vertices A and B. We say thatthe vertex j ∈ [2n] \ [n] is an n-connector between A and B if one of the edgesincident to j connects to A and another edge incident to j connects to a vertexin B. Thus, when there exists an n-connector between A and B, the distancebetween A and B in PA(m,δ)

2n is at most 2. The following lemma gives bounds onthe probability of an n-connector not existing:

Lemma 8.25 (Connectivity sets in PA(m,δ)

2n ) Fix m ≥ 2 and δ ∈ (−m, 0). Forany two sets of vertices A,B ⊆ [n], there exists η > 0 such that

P(no n-connector for A and B | PA(m,δ)

n ) ≤ e−ηDA(n)DB(n)/n, (8.7.7)

where, for any A ⊆ [n],

DA(n) =∑i∈A

Di(n), (8.7.8)

denotes the total degree of vertices in A at time n.

Proof We note that for two sets of vertices A and B, conditionally on PA(m,δ)

n ,the probability that j ∈ [2n] \ [n] is an n-connector for A and B is at least

(DA(n) + δ|A|)(DB(n) + δ|B|)[2n(2m+ δ)]2

, (8.7.9)

independently of the fact whether the other vertices are n-connectors or not.Since Di(n) + δ ≥ m+ δ > 0 for every i ≤ n, and δ < 0, for every i ∈ B,

Di(n) + δ = Di(n)(

1 +δ

Di(n)

)≥ Di(n)(1 +

δ

m) = Di(n)

m+ δ

m, (8.7.10)

and, thus, also DA(n)+δ|A| ≥ DA(n)m+δm

. As a result, for η = (m+δ)2/(2m(2m+δ))2 > 0, the probability that j ∈ [2n]\[n] is an n-connector for A and B is at leastηDA(n)DB(n)

n2 , independently of the fact whether the other vertices are n-connectors


or not. Therefore, the probability that there is no n-connector for A and B is,conditionally on PA(m,δ)

n , bounded above by(1− ηDA(n)DB(n)

n2

)n≤ e−ηDA(n)DB(n)/n, (8.7.11)

as required.

We make use of Lemma 8.25 in several places throughout the proof, particularlywith B = i. The bound in Lemma 8.25 replaces the related bound in Lemma7.11 for the configuration model.

We now give the proof of Proposition 8.24:

Proof of Proposition 8.24. From [Volume 1, Theorem 8.3] whp, Innern containsat least

√n vertices and denote the first

√n vertices of Innern by I. Observe that

n(τ−1)−1−1 ↓ 0 for τ > 2, so that, for any i, j ∈ I, the probability that there existsan n-connector for i and j is bounded below by

1− exp−ηn 1τ−1−1(log n)−1 ≥ pn ≡ n

1τ−1−1(log n)−2, (8.7.12)

for n sufficiently large.

We wish to couple Innern with an Erdos-Renyi random graph with Nn =√n

vertices and edge probability pn, which we denote by ERNn(pn). For this, fori, j ∈ [Nn], we say that an edge between i and j is present when there existsan n-connector connecting the ith and jth vertex in I. We now prove that thisgraph is bounded below by ERNn(pn). Note that (8.7.12) does not guarantee thiscoupling, instead we should prove that the lower bound holds uniformly, when iand j belong to I. For this, we order the Nn(Nn−1)/2 edges in an arbitrary way,and bound the conditional probability that the lth edge is present conditionallyon the previous edges from below by pn, for every l. This would prove the claimedstochastic domination by ERNn(pn).

Indeed, the lth edge is present precisely when there exists an n-connector con-necting the corresponding vertices which we call i and j in I. Moreover, we shallnot make use of the first vertices which were used to n-connect the previous edges.This removes at most Nn(Nn − 1)/2 ≤ n/2 possible n-connectors, after which atleast another n/2 remain. The probability that one of them is an n-connector forthe ith and jth vertex in I is, for n sufficiently large, bounded below by

1− exp−ηn 1τ−1−2(log n)−1n/2 = 1− exp−ηn 1

τ−1−1(log n)−1/2 (8.7.13)

≥ pn ≡ n1

τ−1−1(log n)−2,

using 1− e−x ≥ x/2 for x ∈ [0, 1] and η/2 ≥ log n−1 for n sufficiently large.This proves the claimed stochastic domination of the random graph on the

vertices I and ER(Nn, pn). Next, we show that diam(ERNn(pn)) is, whp, boundedby a uniform constant.

For this we use the result in (Bollobas, 2001, Corollary 10.12), which givessharp bounds on the diameter of an Erdos-Renyi random graph. Indeed, thisresult implies that if pdnd−1 − 2 log n → ∞, while pd−1nd−2 − 2 log n → −∞,


then diam(ERn(p)) = d, whp. In our case, n = Nn = n1/2 and p = pn =

n1

τ−1−1(log n)−2, which implies that, whp, τ−13−τ < d ≤ τ−1

3−τ + 1. Thus, we obtain

that the diameter of I in PA(m,δ)

2n is whp bounded by 2d ≤ 2( τ−13−τ + 1).

We finally show that for any i ∈ Innern \ I, the probability that there doesnot exist an n-connector between i and I is small. Indeed, this probability is,since DI(n) ≥ √nn 1

τ−1 (log n)−1/2, and Di(n) ≥ n1

2(τ−1) (log n)−1/2, the probabil-ity of there not existing an n-connector between i and I is bounded above bye−ηn

1/(τ−1)−1/2(logn)−1

, which is tiny since τ < 3. This proves that whp the distancebetween any vertex i ∈ Innern \ I and I is bounded by 2, and, together with thefact that diam2n(I) ≤ 2( τ−1

3−τ +1) thus implies that diam2n(Innern) ≤ 2( τ−13−τ +2) ≡

Kδ.

We proceed by studying the distances between the outer core Coren \ Innernand Innern in the following proposition, which is the main ingredient in the proof:

Proposition 8.26 (Distance between outer and inner core) Fix m ≥ 2 andδ ∈ (−m, 0). The inner core Innern can whp be reached from any vertex in theouter core Outern using no more than 2 log logn

| log (τ−2)| edges in PA(m,δ)

2n , i.e., whp,

maxi∈Outern

minj∈Innern

distPA

(m,δ)2n

(i, j) ≤ 2 log log n

| log (τ − 2)| . (8.7.14)

Proof Recall that

Outern = Coren \ Innern. (8.7.15)

and define

Γ1 = Innern = i : Di(n) ≥ u1, (8.7.16)

where

u1 = n1

2(τ−1) (log n)−1/2. (8.7.17)

We now recursively define a sequence uk, for k ≥ 2, so that for any vertex i ∈ [n]with degree at least uk, the probability that there is no n-connector for the vertexi and the set

Γk−1 = j : Dj(n) ≥ uk−1, (8.7.18)

conditionally on PA(m,δ)

n , is tiny, as we will show in Lemma 8.27 below. Accordingto Lemma 8.25 and [Volume 1, Exercise 8.20], this probability is at most

exp− ηBn[uk−1]2−τuk

n

. (8.7.19)

To make this sufficiently small, we define

uk = D log n(uk−1

)τ−2, (8.7.20)

with D exceeding (ηB)−1. Note that the recursion in (8.7.20) is identical to thatin (7.2.37). Therefore, by Lemma 7.10,

uk = Dak(log n)bknck , (8.7.21)


where

ck =(τ − 2)k−1

2(τ − 1), bk =

1− (τ − 2)k−1

3− τ − 12(τ − 2)k−1, ak =

1− (τ − 2)k−1

3− τ .

(8.7.22)The key step in the proof of Proposition 8.26 is the following lemma:

Lemma 8.27 (Connectivity between Γk−1 and Γk) Fix m, k ≥ 2 and δ ∈(−m, 0). Then the probability that there exists an i ∈ Γk that is not at distancetwo from Γk−1 in PA(m,δ)

2n is o(n−1).

Proof We note that, by [Volume 1, Exercise 8.20], with probability exceeding1− o(n−1), for all k, ∑

i∈Γk−1

Di(n) ≥ Bn[uk−1]2−τ . (8.7.23)

On the event that the bounds in (8.7.23) hold, we obtain by Lemma 8.25 thatthe conditional probability, given PAm, δ(t), that there exists an i ∈ Γk such thatthere is no n-connector between i and Γk−1 is bounded, using Boole’s inequality,by

ne−ηB[uk−1]2−τuk = ne−ηBD logn = o(n−1), (8.7.24)

where we have used (8.7.20) and we have taken D > 2(ηB)−1.

We now complete the proof of Proposition 8.26. Fix

k? =⌊ log log n

| log (τ − 2)|⌋. (8.7.25)

As a result of Lemma 8.27, we have that the distance between Γk? and Innern isat most 2k?. Therefore, we are done when we can show that

Outern ⊆ i : Di(n) ≥ (log n)σ ⊆ Γk? = i : Di(n) ≥ uk?, (8.7.26)

so that it suffices to prove that (log n)σ ≥ uk? , for any σ > 13−τ . For this, we note

that, by Lemma 7.10 as noted in (8.7.21),

uk? = Dak? (log n)bk?nck? . (8.7.27)

We compute that nck? = O(1) = (log n)o(1), (log n)bk? = (log n)1

3−τ +o(1), andDak? = (log n)o(1). Thus,

uk? = (log n)1

3−τ +o(1), (8.7.28)

so that, by picking n sufficiently large, we can make σ ≥ 13−τ +o(1). This completes

the proof of Proposition 8.26.

Proof of Theorem 8.23. We note that whp diam2n(Coren) ≤ Kδ + 2k?, wherek? is the upper bound on maxi∈Outern minj∈Innern dPA

(m,δ)2n

(i, j) in Proposition 8.26,

and we have made use of Proposition 8.24. This proves Theorem 8.23.


8.7.2 Connecting the periphery to the core for δ < 0

In this section, we extend the results of the previous section and, in particular,study the distance between the vertices not in the core Coren and the core. Themain result in this section is the following theorem:

Theorem 8.28 (Connecting the periphery to the core) Fix m ≥ 2. For everyσ > 1

3−τ , whp, the distance between a uniformly chosen vertex o1 ∈ [2n] and

Coren in PA(m,δ)

2n is whp bounded from above by C log log log n for some C > 0.

Together with Theorem 8.23, Theorem 8.28 proves the upper bound in Theorem8.8:

Proof of the upper bound in Theorem 8.8. Choose o1, o2 ∈ [2n] uniformly atrandom. We bound, using the triangle inequality,

distPA

(m,δ)2n

(o1, o2) (8.7.29)

≤ distPA

(m,δ)2n

(o1,Coren) + distPA

(m,δ)2n

(o2,Coren) + diam(Coren).

By Theorem 8.28, the first two terms are each whp bounded by C log log log n.Further, by Theorem 8.23, the third term is bounded by (1 + oP(1)) 4 log logn

| log (τ−2)| ,

which completes the proof of the upper bound in Theorem 8.8.

Exercise 8.9 shows that distPA

(m,δ)2n

(o1, o2)−2 log log n/| log(τ−2)| is upper tight

when distPA

(m,δ)2n

(o1,Coren) is tight. Exercise 8.10 investigates the tightness of

distPA

(m,δ)2n

(o′1,Coren) when o1 is chosen uniformly at random in [n].

Proof of Theorem 8.28. We use the same ideas as in the proof of Theorem 8.23,but now start from a vertex of large degree at time n instead. We need to showthat, for fixed ε > 0, a uniformly chosen vertex o ∈ [(2 − ε)n] can be connectedto Coren using no more than C log log log n edges in PA(m,δ)

2n . This is done in twosteps.

In the first step, we explore the neighborhood of o in PA(m,δ)

2n until we find avertex v0 with degree Dv0

(n) ≥ u0, where u0 will be determined below. Denote theset of all vertices in PA(m,δ)

2n that can be reached from o using exactly k differentedges from PA(m,δ)

2n by Sk. Denote the first k for which there is a vertex in Skwhose degree at time n is at least u by

T (o)

u = infk : Sk ∩ v : Dv(n) ≥ u 6= ∅

. (8.7.30)

Recall the local weak convergence in Theorem 5.24, as well as the fact thateach vertex v has m older neighbors v1, . . . , vm, whose locations xv1

, . . . , xvm areuniform on [0, xv], where xv is the location of v. Therefore, whp, there will be avertex in Sk with arbitrarily small location, and thus also with arbitrarily largedegree at time n. As a result, there exists a C = Cu,ε such that, for sufficientlylarge n,

P(T (o)

u ≥ Cu,ε) ≤ ε. (8.7.31)

8.8 Diameters in preferential attachment models 341

The second step is to show that a vertex v0 satisfying Dv0(n) ≥ u0 for suffi-

ciently large u0 can be joined to the core by using O(log log log n) edges. To thisend, we apply Lemma 8.25, to obtain, for any vertex a with Da(n) ≥ wa, theprobability that there does not exist a vertex b with Db(n) ≥ wb, conditionallyon PA(m,δ)

n , is at most

exp −ηDa(n)DB(n)/n , (8.7.32)

where B = b : Db(n) ≥ wb. Since, as in (8.7.23),

DB(n) ≥ wb#b : Db(n) ≥ ub ≡ wbN≥wb(n) ≥ cnw2−τb , (8.7.33)

we thus obtain that the probability that such a b does not exists is at most

exp−η′waw2−τ

b

, (8.7.34)

where η′ = ηc. Fix ε > 0 such that (1− ε)/(τ − 2) > 1. We then iteratively take

uk = u(1−ε)/(τ−2)k−1 , to see that the probability that there exists a k for which there

does not exist a vk with Dvk(n) ≥ uk is at most

k∑l=1

exp−η′uεl−1

. (8.7.35)

Since

uk = uκk

0 , where κ = (1− ε)/(τ − 2) > 0, (8.7.36)

we obtain that the probability that there exists a k for which there does not exista vk with Dvk(n) ≥ uk is at most

k∑l=1

exp−η′uεκl−1

0

. (8.7.37)

Now fix k = kn = C log log log n and choose u0 so large that

kk∑l=1

exp−η′uεκl−1

0

≤ ε/2. (8.7.38)

Then we obtain that, with probability at least ε/2, v0 is connected in k steps to

a vertex vkn with Dvkn(n) ≥ uκkn0 is at most ε/2. Since, for C ≥ 1/ log κ,

uκkn

0 ≥ ulog logn0 ≥ (log n)σ, (8.7.39)

when log u0 ≥ σ, we obtain that vkn ∈ Coren whp.

8.8 Diameters in preferential attachment models

In this section, we investigate the diameter in preferential attachment models.We start by discussing logarithmic bounds for δ > 0, continue with the doubly-logarithmic bounds for δ < 0, and finally discuss the case where δ = 0.


Logarithmic bounds on the diameter

The following theorem shows that for δ > 0, the diameter grows logarithmicallyin the size of the graph:

Theorem 8.29 (A log n bound for the diameter in PAMs) Fix m ≥ 1 andδ > 0. For PA(m,δ)

n there exist 0 < b1 < b2 <∞ such that, as n→∞,

P(b1 log n ≤ diam(PA(m,δ)

n ) ≤ b2 log n) = 1− o(1). (8.8.1)

Similar bounds hold for PA(m,δ)

n (b).

Proof The lower bound on the diameter of PA(m,δ)

n follows from the lower boundin Theorem 8.6. The upper bound for PA(m,δ)

n (b) has already been proved inSection 8.6.1. We omit the proof for PA(m,δ)

n , which can be completed along similarlines as in Section 8.6.1.

Doubly-logarithmic bounds on the diameter

When δ < 0, we know that the typical distances grow like log log n. The followingtheorem shows that this extends to the diameter of PA(m,δ)

n :

Theorem 8.30 (Diameter of PA(m,δ)

n for δ < 0) Fix m ≥ 2 and δ ∈ (−m, 0).For PA(m,δ)

n , as n→∞,

diam(PA(m,δ)

n )

log lognP−→ 4

| log (τ − 2)| +2

logm. (8.8.2)

Theorem 8.30 is an adaptation of Theorem 7.17 to PA(m,δ)

n . Indeed, the firstterm on the right-hand side of corresponds to the typical distances as in Theorem8.17, just like the first term on the right-hand side of (7.4.6) in Theorem 7.17corresponds to the typical distances in Theorem 7.2. Further, the additional term2/ logm can be interpreted as 2/ log (dmin − 1), where dmin is the minimal degreeof an internal vertex in the neighborhood of a vertex in PA(m,δ)

n , as in Theorem7.17. In turn, this can be interpreted as twice 1/ logm, where log log n/ logm canbe viewed as the depth of the worst trap.

For the upper bound, we explore the r-neighborhoods of vertices with r =(1+ε) log log n/ logm. Then, these boundaries are so large, that with probabilityof order 1 − o(1/n2), pairs of such boundaries are quickly connected to the coreCoren. An application of Theorem 8.23 then completes the upper bound.

We give some more details on the lower bound. The proof of the lower boundproceeds by defining the notion of a vertex being k-minimally connected, whichmeans that the k-neighborhood of the vertex is as small as possible. In this case,this means that it has size mk (rather than dmin(dmin− 1)k−1 as it is in CMn(d)).It is then shown that there are plenty of such k-minimally connected verticesas long as k ≤ (1 − ε) log log n/ logm, similarly to the analysis in Lemma 7.18for CMn(d). However, the analysis itself is quite a bit harder, due to the moreinvolved nature of PA(m,δ)

n compared to CMn(d). We provide a few details, and


show that the number Mk of k-minimally connected vertices tends to infinity inprobability.

We define

Mk =v ∈ [n] : Dv(n) = m,Du(n) = m+ 1 ∀u ∈ Bk−1(v) \ v, (8.8.3)

Bk(v) ∩ [n/2] = ∅,

and

Mk = |Mk|. (8.8.4)

In words, Mk consists of those vertices whose k-neighborhood is minimally con-nected at time n, and only contains vertices in [n/2]. The following lemma shows

that, for k ≤, MkP−→∞:

Lemma 8.31 (Many minimally k-connected vertices in PA(m,δ)

n ) Consider PA(m,δ)

n

and PA(m,δ)

n (d) with m ≥ 2 and δ ∈ (−m, 0). For k ≤ (1− ε) log log n/ logm,

MkP−→∞. (8.8.5)

Proof We prove the lemma for PA(m,δ)

n (d), and rely on the arguments in theproof of Proposition 5.23.

As in the proof of Lemma 7.18, we use a second moment method. We start byanalyzing E[Mk]. Note that

E[Mk] = nP(o ∈Mk) = n∑

t : V (t)⊂[n/2]

P(Bk(o) = t), (8.8.6)

where the sum is over all rooted trees t with root o, V (t) ⊆ [n/2], depth k, andhaving root degree m and degree m + 1 for all other non-leaf vertices. By theanalysis in the proof of Proposition 5.23, see also (5.3.62),

Pn(Bk(o) = t) =∏

v∈V (t)

ψa′vv

n∏s=1

(1− ψs)b′s

∏v∈V (t)

∏u,j : u

j

6 v

[1− p(u, v)], (8.8.7)

where we recall p(u, v) in (5.3.61), and we now let

a′s = 1s∈V (t)∑

u∈V (t)

1u∼s,u>s, b′s =∑

u,v∈V (t)

1uju v

1s∈(v,u]. (8.8.8)

Using (5.3.63)-(5.3.64), this simplifies to

Pn(Bk(o) = t) =1 + oP(1)

n

∏v∈V (t)

ψa′vv e−m

∑u∈(v,n] p(u,v)

n∏s=1

(1− ψs)b′s , (8.8.9)

where the factor 1/n comes from the probability that o equals the root of t. By(5.3.65)-(5.3.66), this further simplifies to

Pn(Bk(o) = t) =1 + oP(1)

n

∏v∈V (t)\u

ψa′vv e−(2m+δ)vψv(1−(v/n)1−χ)

n∏s=1

(1− ψs)b′s .

(8.8.10)


Therefore, also

P(Bk(o) = t) =1 + oP(1)

n

∏v∈V (t)

E[ψa′vv e−(2m+δ)vψv(1−(v/n)1−χ)(1− ψv)b

′v]

(8.8.11)

×∏

s∈[n]\V (t)

(βs + b′s − 1)b′s(α+ βs + b′s − 1)b′s

,

where, for v ∈ V (t), a′v = 1 unless v is the root of t.It is not hard to see that there exists a q > 0 such that, uniformly in v ∈

[n] \ [n/2],

nE[ψve

−(2m+δ)vψv(1−(v/n)1−χ)(1− ψv)b′v]≥ q. (8.8.12)

Denote

ik = |V (t)| = mk+1 − 1

m− 1. (8.8.13)

Then, since b′s = 0 for all s ∈ [n/2], using the argument leading to (??), thereexists an η > 0 such that ∏

s∈[n]\V (t)

(βs + b′s − 1)b′s(α+ βs + b′s − 1)b′s

≥ ηik . (8.8.14)

Therefore, uniformly in t such that V (t) ⊆ [n/2],

P(Bk(o) = t) ≥ n−ik(qη)ik . (8.8.15)

For every collection of vertices V ⊆ [n] \ [n/2], there is at least one tree t suchthat V (t) = V . Since the total number of such sets equals

(n/2ik

), we arrive at

E[Mk] ≥ n(n/2

ik

)n−ik(qη)ik . (8.8.16)

Note that, for k ≤ (1− ε) log log n/ logm,

ik ≤m

m− 1mk ≤ m

m− 1(log n)1−ε. (8.8.17)

Therefore, we can further bound from below(n/2

ik

)≥ (n/2− ik)ik

ik!≥ 2−ik

nik

ik!(1 + o(1)), (8.8.18)

to arrive at

E[Mk] ≥ n(qη/2)ik

ik!≥ n1−ε →∞. (8.8.19)

This proves that E[Mk]→∞.


We continue with E[M2k ], which we write as

E[M2k ] = n2P(o1, o2 ∈Mk) = E[Mk] + n2P(o1, o2 ∈Mk, o1 6= o2) (8.8.20)

= E[Mk] + n2∑

t1,t2 : V (t1),V (t2)⊂n\[n/2]

P(Bk(o1) = t1, Bk(o2) = t2, o1 6= o2).

We note that if o1 ∈Mk, then o2 ∈ Bk−1(o1)\o1 is not possible. Therefore, onlyt1 and t2 for which V (t1) and V (t2) intersect at the boundaries contribute. Wewill refrain from giving all details, but the proof of (8.8.11) can then be followedfor the above event, to show that

E[M2k ] = E[Mk]

2(1 + o(1)), (8.8.21)

so that, for k ≤ (1− ε) log log n/ logm,

Mk

E[Mk]P−→ 1. (8.8.22)

Together with (8.8.19), this completes the proof that MkP−→ ∞ for k ≤ (1 −

ε) log log n/ logm.

To complete the lower bound on diam(PA(m,δ)

n ) in Theorem 8.30, we taketwo vertices u, v ∈ Mk with k = (1 − ε) log log n/ logm. Then, by definition,∂Bk(u), ∂Bk(v) ⊂ [n] \ [n/2]. We can then adapt the proof of Theorem 8.17 isadapted to show that whp, the distance between ∂Bk(u) and ∂Bk(v) is whp stillbounded from below by 4 log log n/| log (τ − 2)|. Therefore, the lower bound onthe diameter is then whp given by

diam(PA(m,δ)

n ) ≥ distPA

(m,δ)n

(u, v) (8.8.23)

= 2k + distPA

(m,δ)n

(∂Bk(u), ∂Bk(v))

≥ 2(1− ε) log log n

logm+

4 log log n

| log (τ − 2)| .

This gives an informal proof of the lower bound.

The critical case of δ = 0

We complete this section, by discussing the diameter for δ = 0:

Theorem 8.32 (Diameter of PA(m,δ)

n for δ = 0) Fix m ≥ 2. For PA(m,0)

n , asn→∞,

diam(PA(m,0)

n )log log n

log nP−→ 1. (8.8.24)

As we see, the diameter in Theorem 8.32 and the typical distances in Theorem8.7 behave similarly. This is rather unique in the critical δ = 0 setting.

The proof of Theorem 8.32 relies on the linear cord diagram (LCD) descriptionof the δ = 0 model by Bollobas and Riordan (2004a).


8.9 Further results on distances in preferential attachment models

8.9.1 Distance evolution for δ < 0

In this section, we survey results about the evolution of the graphs distances.With this, we mean that we study t 7→ distt(i, j) = dist

PA(m,δ)t

(i, j). In Exercise

8.12, you are asked to show that t 7→ distt(i, j) is non-increasing. Exercises 8.13–8.17 study various aspects of the evolution of distances.

The main result below shows how distances between uniform vertices in [n] de-crease as time progresses. This is done by investigating what the distance betweenthese vertices is in PA(m,δ)

t for all t ≥ n:

Theorem 8.33 (Evolution of distances) Consider (PA(m,δ)

t )t≥1 with m ≥ 2 andδ ∈ (−m, 0). Choose o(n)

1 , o(n)

2 uniformly at random from [n]. Then, for all t ≥ n,

supt≥n

∣∣∣distPA

(m,δ)t

(o(n)

1 , o(n)

2 )− 4⌊ log log n− log

(1 ∨ log(t/n)

)| log(τ − 2)|

⌋∨ 4∣∣∣ (8.9.1)


Theorem 8.33 is very strong, as it describes very precisely how the distancesdecrease exactly when the graph PA(m,δ)

t grows. Further, Theorem 8.33 also provesthat dist

PA(m,δ)n

(o(n)

1 , o(n)

2 ) − 4 log log n/| log(τ − 2)| is a tight sequence of random

variables. While the lower tightness (i.e., the tightness of [distPA

(m,δ)n

(o(n)

1 , o(n)

2 ) −4 log log n/| log(τ −2)|]−) follows from Theorem 8.17, we have not proved the up-per tightness (i.e., the tightness of [dist

PA(m,δ)n

(o(n)

1 , o(n)

2 )−4 log log n/| log(τ−2)|]+)in Theorem 8.8 (recall (8.7.29)). Exercises 8.19–8.19 investigate what happenswhen t = tn = ne(logn)α when α ∈ (0, 1) and α > 1, respectively. This leaves openthe interesting case where α = 1.

The proof of Theorem 8.33 by Jorritsma and Komjathy (Prepr.(2020)) usesrelated ideas as used in the present chapter, but significantly extends them inorder to prove the uniformity in t ≥ n. While we do not present the full proofhere, we do explain why dist

PA(m,δ)t

(o(n)

1 , o(n)

2 ) = 2 whp when t = tn n−2m/δ,

which is clearly an interesting case and explains why the supremum in (8.9.1) canbasically be restricted to t ∈ [n, n−2m/δ+o(1)]. This sheds light on what happensprecisely when t = tn = ne(logn)α for α = 1, a case that is left open in the abovecases.

The probability that one of the m edges of vertex n+ t+ 1 connects to u andone to v (which certainly makes the distance between u and v equal to 2) is close

8.9 Further results on distances in preferential attachment models347

to

m(m− 1)E[ Dv(n+ t) + δ

2m(n+ t) + (n+ t)δ

Dv(n+ t) + δ

2m(n+ t) + (n+ t)δ| PA(m,δ)

n

](8.9.2)

= (1 + o(1))m(m− 1)

(2m+ δ)2t2E[(Dv(n+ t) + δ)(Dv(n+ t) + δ) | PA(m,δ)

n

]= (1 + o(1))

m(m− 1)

(2m+ δ)2t2(Dv(n) + δ)(Du(n) + δ)

( tn

)2/(2+δ/m)

= (1 + o(1))m(m− 1)

(2m+ δ)2t−2(m+δ)/(2m+δ)n−2/(2+δ/m).

When taking u = o(n)

1 , v = o(n)

2 , we have that Dv(n)d−→ D1, Du(n)

d−→ D2, where(D1, D2) are two i.i.d. copies of a random variable with asymptotic degree distri-bution P(D = k) = pk. Thus, the conditional expectation of the total number ofdouble attachments to both o(n)

1 and o(n)

2 up to time n+ t is close to

t∑s=1

m(m− 1)D1D2

(2m+ δ)2s−2(m+δ)/(2m+δ)n−2/(2+δ/m) (8.9.3)

≈ (1 + o(1))m(m− 1)D1D2

(2m+ δ)(−δ) n−2m/(2m+δ)t−δ/(2m+δ),

which becomes Θ(1) when t = Kn−2m/δ. The above events, for different t, areclose to being independent. This suggests that the process of attaching to botho(n)

1 and o(n)

2 is, conditionally on their degrees (D1, D2), Poisson with some randomintensity.

8.9.2 Distances in the Bernoulli PAM

Recall the Bernoulli preferential attachment model introduced in Section 1.3.6.The nice thing about this model is that its degree structure is understood muchmore generally than for preferential attachment models with a fixed number ofedges. This also allows one to zoom in into particular instances of the preferentialattachment function. While this is quite a different model, for example due tothe fact that the graph is not connected whp, unlike PA(m,δ)

n for m ≥ 2 and nlarge, in terms of distances it behaves rather similarly to the fixed-degree modelssuch as PA(m,δ)

n . This is exemplified by the following theorem, which applies inthe infinite-variance degree setting:

Theorem 8.34 (Evolution of distances for scale-free BPA(f)

n ) Let BPA(f)

n be thesublinear preferential attachment model obtained from a concave attachment rulef satisfying

f(k) = γk + β, (8.9.4)

where γ ∈ ( 12, 1), so that τ = 1 + 1/γ ∈ (2, 3). Then Theorem 8.33 applies to this

setting when we restrict to t ≥ n such that o(n)

1 and o(n)

2 are connected in BPA(f)

n .


In particular, conditionally on o(n)

1 and o(n)

2 being connected in BPA(f)

n ,

distPA

(m,δ)n

(o(n)

1 , o(n)

2 )− 4log logn

| log(τ − 2)| (8.9.5)


The situation of affine preferential attachment functions f in (8.9.4) whereγ ∈ (0, 1

2), so that τ = 1 + 1/γ > 3 is not so well understood, but one can

conjecture that again, the distance between o(n)

1 and o(n)

2 is whp logarithmic atsome base related to the operator norm of the multi-type branching process thatdescribes the local weak limit.

The following theorem describes nicely how the addition of an extra power ofa logarithm in the degree distribution affects the distances:

Theorem 8.35 (Critical case: interpolation) Let BPA(f)

n be the sublinear pref-erential attachment model obtained from a concave attachment rule f satisfying

f(k) = 12k +

α

2

k

log k+ o( k

log k

), (8.9.6)

for some α > 0. Consider two vertices o1, o2 chosen independently and uniformlyat random from the largest connected component Cmax of BPA(f)

n . Then

distBPA

(f)n

(o1, o2) = (1 + oP(1))1

1 + α

log n

log logn. (8.9.7)

Comparing Theorem 8.35 to Theorem 6.23, we see that, for large α, the dis-tances in BPA(f)

n are about twice as large as those in GRGn(W ) for a degreedistribution with the same asymptotic form. This can be seen as an explanationof the occurrence of the extra factor 2 in Theorem 8.8 compared to Theorem6.3 for the Norros-Reittu model NRn(w) and Theorem 7.2 for the configurationmodel CMn(d) when the power-law exponent τ satisfies τ ∈ (2, 3). Note that thisextra factor is absent precisely when α = 0.



Scale-free trees have received substantial attention in the literature, we refer toBollobas and Riordan (2004b); Pittel (1994) and the references therein. Theorem8.2 is (Pittel, 1994, Theorem 1).

There is a beautiful result on the height of trees using branching processes dueto Kingman (1975), which Pittel (1994) makes crucial use of. This approach isbased on exponential martingales, and allows for a relatively short proof of thelower bound on the height of the tree.

There is a close analogy between PA(1,δ)

n and so-called uniform recursive trees.In uniform recursive trees, we grow a tree such that at time 1, we have a unique


vertex called the root, with label 1. At time n, we add a vertex and connect itto a uniformly chosen vertex in the tree. See Smythe and Mahmoud (1994) for asurvey of recursive trees.

A variant of a uniform recursive tree is the case where the probability thata newly added vertex is attached to a vertex is proportional to the degree ofthe vertices (and, for the root, the degree of the root plus one). This processis called a random plane-oriented recursive tree. For a uniform recursive tree ofsize n, it is proved by Pittel (1994) that the maximal distance between the rootand any other vertex is with high probability equal to 1

2γlog n(1 + o(1)), where

γ satisfies (8.1.2) with δ = 0. It is not hard to see that this implies that themaximal graph distance between any two vertices in the uniform recursive treeis equal to 1

γlog n(1 + o(1)).


Theorem 8.6 is proved by Dommers et al. (2010).Theorem 8.7 is proved by Bollobas and Riordan (2004a). More precisely, the

result for the diameter of PA(m,0)

n in Theorem 8.32 is proved by Bollobas andRiordan (2004a). This proof can be rather easily be extended to the proof for thetypical distances.

A weaker version of Theorem 8.8 is proved by Dommers et al. (2010). Thecurrent theorem is inspired by Dereich et al. (2012).


The bound in Proposition 8.9 for δ = 0 was proved in (Bollobas and Riordan,2004a, Lemma 3) in a rather different way. The current version for all δ is takenfrom Dommers et al. (2010), and also its proof is adapted from there.


The proofs in this section for δ = 0 first appeared in (Bollobas and Riordan,2004a, Section 4).


The proof of Theorem 8.17 is adapted from Dereich et al. (2012).



Theorem 8.33 is proved by Jorritsma and Komjathy (Prepr.(2020)), who studythe more general problem of first-passage percolation on the preferential attach-ment model. In first-passage percolation, the edges are weighted. These weightscan be interpreted as the traversal time of an edge in a rumor spread model.Then, Jorritsma and Komjathy (Prepr.(2020)) study the time it takes the rumorto go from a random source to a random destination. They obtain sharp results,concerning the leading order asymptotics of this traversal time, as well as forthese quantities in a dynamical perspective.


Theorem 8.23 is proved by Dommers et al. (2010), who also first proved aweaker upper bound than in Theorem 8.8. Theorem 8.28 was proved by Dereichet al. (2012).


Theorem 8.29 is proved by Dommers et al. (2010). Theorem 8.32 is proved byBollobas and Riordan (2004a). Theorem 8.30 is proved by Caravenna et al. (2019).


Theorem 8.35 is proved by Dereich et al. (2017). Theorem 8.34 is proved byJorritsma and Komjathy (Prepr.(2020)).


Exercise 8.1 (Bound on θ) Prove that the solution θ of (8.1.2) satisfies θ < 1.What does this imply for the diameter and typical distances in scale-free trees?

Exercise 8.2 (Bound on θ) Prove that the solution θ of (8.1.2) satisfies θ ∈(0, e−1).

Exercise 8.3 (Early vertices are whp at distance 2 for δ < 0) Let δ ∈ (−m, 0)and m ≥ 2. Show that for i, j fixed

limn→∞

P(distPA

(m,δ)n

(i, j) ≤ 2) = 1. (8.11.1)

Exercise 8.4 (Negative correlations for m = 1) Show that when m = 1, Lemma8.10 implies that when (π0, . . . , πk) contains different coordinates as (ρ0, . . . , ρk),then

P( k−1⋂i=0

πi πi+1 ∩k−1⋂i=0

ρi ρi+1)

(8.11.2)

≤ P( k−1⋂i=0

πi πi+1)P( k−1⋂i=0

ρi ρi+1).

Exercise 8.5 (Extension of (8.3.16) to PA(1,δ)

n (b)) Prove that for PA(1,δ)

n (b),(8.3.16) is replaced with

P(u1

1 v, u2

1 v

)(8.11.3)

= (1 + δ)Γ(u1 − δ/(2 + δ))Γ(u2 − (1 + δ)/(2 + δ))Γ(v)

Γ(u1 + 1/(2 + δ))Γ(u2)Γ(v + 2/(2 + δ)).

Exercise 8.6 (Distance between n− 1 and n in PA(m,0)

n ) Show that whp,

distPA

(m,0)n

(n− 1, n) ≥ k?n, (8.11.4)

where k?n is defined in (8.4.14).


Exercise 8.7 (Distance between vertices 1 and 2 in PA(m,0)

n ) Check what theanalysis in Section 8.4.2 implies for dist

PA(m,0)n

(1, 2). In particular, where does theproof that dist

PA(m,0)n

(1, 2) ≥ k?n, where k?n is defined in (8.4.14), fail, and whyshould it?

Exercise 8.8 (Most recent common ancestor in PA(1,δ)

n ) Fix o1, o2 to be twovertices in [n] chosen uniformly at random, and let V be the oldest vertex thatthe path from 1 to o1 and that from 1 to o2 have in common in PA(1,δ)

n . Use theproof of Lemma 8.13 to show that dist

PA(1,δ)n (b)

(1, V ) is tight.

Exercise 8.9 (Upper tightness criterion for centered distPA

(m,δ)2n

(o1, o2)) Fix

δ ∈ (−m, 0). Use (8.7.29) and Theorem 8.23 to show that distPA

(m,δ)2n

(o1, o2) −2 log log n/| log(τ − 2)| is upper tight when dist

PA(m,δ)2n

(o1,Coren) it tight.

Exercise 8.10 (Tightness of distPA

(m,δ)2n

(o′,Coren)) Fix δ ∈ (−m, 0) and let o′

be chosen uniformly at random from [n]. Show that distPA

(m,δ)2n

(o′,Coren) is a tight

sequence of random variables, by using n-connectors.

Exercise 8.11 (Path to upper tightness criterion for centered distPA

(m,δ)2n

(o1, o2))

How can the ideas in the previous two exercises be combined to show that

distPA

(m,δ)2n

(o1, o2)− 2 log log n/| log(τ − 2)|

is upper tight? You do not have to give the full proof, but rather convince yourselfthat this is indeed true, by adapting the proof of Theorem 8.28. [Hint: Use thato1 ∈ [2(1 − ε)n] with probability at least ε when o1 ∈ [2n] is chosen uniformly.Then only use vertices in [2n] \ [2(1− ε)n] as possible n-connectors.]

Exercise 8.12 (Monotonicity of distances in PA(m,δ)

t ) Fix m ≥ 1 and δ > −m.Show that t 7→ distt(i, j) = dist

PA(m,δ)t

(i, j) is non-decreasing.

Exercise 8.13 (Distance evolution in PA(1,δ)

t ) Fix m = 1 and δ > −1. Showthat t 7→ distt(i, j) = dist

PA(m,δ)t

(i, j) is constant for t > i ∧ j.

Exercise 8.14 (Distance structure on N due to PA(m,δ)

t ) Fix m ≥ 1 and δ > −m.Use Exercise 8.12 to show that distt(i, j)

a.s.−→ dist∞(i, j) < ∞ for all i, j ≥ 1.Thus, dist∞ is a distance function on N.

Exercise 8.15 (Nearest-neighbors on N due to PA(m,δ)

t ) Fix m ≥ 2 and δ > −m.Compute P(dist∞(i, j) = 1).

Exercise 8.16 (Infinitely many neighbors) Fix m ≥ 2 and δ > −m. Show that#j : dist∞(i, j) = 1 =∞ a.s. for all i ≥ 1.

Exercise 8.17 (Eventually distances at most 2 in PA(m,δ)

t ) Fix m ≥ 2 andδ ∈ (−m, 0). Show that d∞(i, j) ≤ 2 for all i, j.

Exercise 8.18 (Evolution of distances in PA(m,δ)

t : critical parametric choice) Fixm ≥ 2 and δ ∈ (−m, 0). Choose o(n)

1 , o(n)

2 uniformly at random from [n], and take


t = tn = ne(logn)α for some α ∈ (0, 1). Use Theorem 8.33 to identify θα such that

distPA

(m,δ)t

(o(n)

1 , o(n)

2 )

log lognP−→ θα. (8.11.5)

Exercise 8.19 (Tight distances in PA(m,δ)

t for δ < 0) Fix m ≥ 2 and δ ∈(−m, 0). Choose o(n)

1 , o(n)

2 uniformly at random from [n], and take t = tn =ne(logn)α for some α > 1. Use Theorem 8.33 to show that dist

PA(m,δ)t

(o(n)

1 , o(n)

2 )is a tight sequence of random variables.

Exercise 8.20 (Degree evolution PA(m,δ)

t ) Fix m ≥ 1 and δ > −m. Take a vertexv such that Dv(n) = k. Show that, for t ≥ 1, the random process t 7→ Dv(n + t)evolves like a Polya urn starting with k red balls and weight k + δ, while thenumber and weight of the blue balls are n− k and (2m+ δ)n− k− δ. Identify theBeta random variable U for which, for all t ≥ 1,

P(Dv(n+ t) = k + l | Dv(n) = k) = E[P(Bin(t, U) = l)

]. (8.11.6)

Part IV

Related models and problems

353

355

Summary of Part III.

In Part III, we have investigated the small-world behavior of random graphs,extending the results on the existence and uniqueness of the giant component asinformally described in Meta Theorem A on Page III. It turns out that the resultsare all quite similar, even though the details of the description of the models aresubstantially different. We can summarize the results obtained in the followingmeta theorem:

Meta Theorem B. (Small- and ultra-small-world characteristics) In a randomgraph model with power-law degrees having power-law exponent τ , the typicaldistances of the giant component in a graph of size n are order log logn whenτ ∈ (2, 3), while they are of order log n when τ > 3. Further, these typicaldistances are highly concentrated.

Informally, these results quantify the ‘six-degrees of separation’ paradigm inrandom graphs, where we see that random graphs with very heavy-tailed degreeshave ultra-small typical distances, as could perhaps be expected. The lines ofproof of these results are even often similar, relying on clever path-counting tech-niques. In particular, the results show that in generalized random graphs andconfiguration models alike, in the τ ∈ (2, 3) regime, vertices of high degrees, sayk, are typically connected to vertices of even higher degree or order k1/(τ−2). Inthe preferential attachment model, on the other hand, this is not true, yet ver-tices of degree k tend to be connected to vertices of degree k1/(τ−2) in two steps,making typical distances roughly twice as large.

Overview of Part IV.

In this part, we study several related random graph models that can be seen asextensions of the simple models studied so far. They incorporate features thatwe have not seen yet, such as direction of the edges, existence of clustering,communities and/or geometry. The red line through this part will be the studyto which extent the main results informally described in Meta Theorems A (seepage 217) and B (see above) remain valid, and if not, to which extent they needto be adapted. We will not give complete proofs, but instead informally explainwhy results are similar as in these Meta Theorems A and B, or why instead theyare different.

Chapter 9

Related Models

Abstract

In this chapter, we discuss some related random graph models thathave been studied in the literature. We explain their relevance, as wellas some of the properties in them. We discuss directed random graphs,random graphs with community structure, as well as spatial randomgraphs.


We start in Section 9.1 by extensively discussing two examples of real-world net-works. The aim there is to show that the models as discussed so far, tend not tobe highly appropriate for real-world examples. We choose the example of cita-tion networks, which are directed, have substantial clustering, have a hierarchicalcommunity structure and possibly even a spatial component to them. While wefocus on citation networks, we also highlight that we could have chosen other real-world examples as well. Section 9.2 by discussing directed versions of the randomgraphs studied in this book. In Section 9.3 and 9.4, we introduce several randomgraph models that have a community structure in them, so as to model the com-munities occurring in many (maybe even most?) real-world networks. Section 9.3studies the setting where communities are macroscopic, in the sense that thereare a bounded number of communities even when the network size tends to in-finity. In Section 9.4, instead, we look at the setting where the communities havea bounded average size. We will argue that both settings are relevant. In Section9.5, we discuss random graph models that have a spatial component to them,and explain how the spatial structure gives rise to high clustering. We close thischapter with notes and discussion in Section 9.6, and with exercises in Section9.7.

9.1 Two real-world network examples

In this section, we discuss two real-world examples of networks, citation net-works in Section 9.1.1 and terrorist networks in Section 9.1.2. These examplesshow that many real-world networks are different from the stylized network mod-els that we have discussed to far. In fact, many real-world networks are directed,in that edges (or arcs) are from one vertex to the other, and not necessarily in asymmetric way between pairs of vertices. Also, real-world networks often displaya community structure, in that certain parts are more densely connected thatbetween those parts and the rest of the network. Sometimes such communitiesare relatively small, between few vertices, and sometimes they are quite large,and even communities with sizes of the order of the entire network exist.

357

358 Related Models

9.1.1 Citation networks

In this section, we discuss citation networks, where vertices denote scientificpapers and the directed edges correspond to citations of one paper to another.Obviously, such citations are directed, since it makes a difference whether yourpaper cites mine, or my paper cites yours, leading to a directed network. In sucha network, a directed edge is sometimes called an arc.

Obviously, citation networks grow in time. Indeed, papers do not disappear, soa citation, once made in a published paper, does not disappear either. Further,their dynamics is by time, and growth is enormous. See Figure 9.1(a) to see thatthe number of papers in various fields grows exponentially with time, meaningthat more and more papers are being written. If you ever wondered why scien-tists seem to become ever more busy, then this may be an apparent explanation.In Figure 9.1(a), we have displayed the number of papers in three different do-mains, namely, Probability and Statistics (PS), Electrical Engineering (EE) andBiotechnology and Applied Microbiology (BT). The data comes from the Web OfScience database. While the exponential growth is quite prominent in the data,it is unclear how this exponential growth arises. This could either be due to thefact that the number of journals that are listed in Web Of Science grows overtime, or that journals contain more and more papers. However, the exponentialgrowth has been observed already as early as the 80’s, see the book by Derek DeSolla Price (1986), appropriately called ‘Little science, big science’.

As you can see, we have already restricted to certain subfields in science, thereason being that the publication and citation cultures in different fields are vastlydifferent. Thus, we have attempted to go to a situation in which the networks thatwe investigate are a little more homogeneous. For this, it is relevant to be ableto distinguish such fields, and to decide which papers (or journals) contribute towhich field. This is a fairly daunting task, as you can imagine. However, it is alsoan ill-defined task, as no subdomain is truly homogeneous. Let me restrict myselfto probability and statistics, as I happen to know this area best. In probabilityand statistics, there are subdomains that are very pure, as well as areas that arehighly applied, such as applied statistics. These areas do indeed have differentpublication and citation cultures. Thus, science as a whole is probably hierar-chical, where large scientific disciplines can be identified, that can, in turn, besubdivided into smaller subdomains, etc. However, one should stop somewhere,and the current three scientific disciplines are homegeneous enough to make ourpoint.

Figure 9.1(b) shows the loglog plot for the in-degree distribution in these 3citation networks. We notice that these datasets have empirical power-law citationdistributions. Thus, on average, papers attract few citations, but the amount ofvariability in the number of citations is rather substantial. We are also interestedin the dynamics of the citation distribution of the papers published in a givenyear, as time proceeds. This can be observed in Figure 9.2. We see a dynamicalpower law, meaning that at any time the degree distribution of a cohort of papers

9.1 Two real-world network examples 359

1980 1985 1990 1995 2000 2005 2010

104

PS

EE

BT

100 101 102 10310-6

10-5

10-4

10-3

10-2

10-1

100

BTEEPS

Figure 9.1 (a) Number of publication per year (logarithmic Y axis).(b)Loglog plot for the in-degree distribution tail in citation networks

100 101 102

10-3

10-2

10-1

100PS

19841988199219962000200420082012

100 101 102 103

10-4

10-3

10-2

10-1

100EE

19841988199219962000200420082012

100

101

102

10-3

10-2

10-1

100

BT

1984

1988

1992

1996

2000

2004

2008

2012

Figure 9.2 Degree distribution for papers from 1984 over time.

from a given time period (in this case 1984) is close to a power law, but theexponent changes over time (and in fact decreases, which corresponds to heaviertails). When time grows quite large, the power law approaches a fixed value.

Interestingly, the existence of power-law in-degrees in citation networks alsohas a long history. Already in 1965, Derek De Solla Price (1965) observed it, andeven proposed a model for it that relied on a preferential attachment mechanism,more than two decades before Barabasi and Albert (1999) proposed the firstpreferential attachment model.

We wish to discuss two further properties of citation networks and their dy-namics. In Figure 9.3, we see that the majority of papers stop receiving citationsafter some time, while few others keep being cited for longer times. This inhomo-geneity in the evolution of node degrees is not present in classical PAMs, wherethe degree of every fixed vertex grows as a positive power of the graph size. Figure9.3 shows that the number of citations of papers published in the same year canbe rather different, and the majority of papers actually stop receiving citationsquite soon. In particular, after a first increase, the average increment of citationsdecreases over time (see Figure 9.4). We observe a difference in this aging effectbetween the PS dataset and the other two datasets, due to the fact that in PS,

360 Related Models

scientists tend to cite older papers than in EE or BT, again exemplifying thedifferences in citation and publication patterns in different fields. Nevertheless,the average increment of citations received by papers in different years tends todecrease over time for all three datasets.

A last characteristic that we observe is the lognormal distribution of the ageof cited papers. In Figure 9.5, we plot the distribution of cited papers, lookingat references made by papers in different years. We have used a 20 years timewindow in order to compare different citing years. Notice that this lognormaldistribution seems to be very similar within different years, and the shape issimilar over different fields.

Let us summarize the differences between citation networks and the randomgraph models that form the basis of network science. Citation networks are di-rected, which is different from the typical undirected models that we have discussedso far. However, it is not hard to adapt our models to become directed, and we willexplain this in Section 9.2. Secondly, citation networks have a substantial com-munity structure, in that parts of the network exist that are much more denselyconnected than the whole network. We can argue that both communities existon a macroscopic scale, for example in terms of the various scientific disciplinesthat science consist of, as well as on a microscopic scale, where research networksof small groups of scientists create subnetworks that are more densely connected.One could even argue that geography plays an important role in citation networks,since many collaborations between scientists are within their own university orcountry, even though we all work with various researchers around the globe.

Citation networks are dynamic, like preferential attachment models (PAMs),but their time evolution is quite different to PAMs, as the linear growth in PAMsis replaced by an exponential growth in citation networks. Further, papers incitation networks seem to age, as seen both in Figures 9.3 and 9.4, in that citationrates become smaller for large times, in such a way that typical papers evencompletely stop receiving citations at some (random) point in time.

In conclusion, finding an appropriate model for citation networks is quite a

1980 1990 2000 20100

10

20

30

40

50

60

PS

1980 1990 2000 20100

5

10

15

20

25

30

35

EE

1990 2000 20100

50

100

150

BT

Figure 9.3 Time evolution for the number of citations of samples of 20randomly chosen papers from 1980 for PS and EE, and from 1982 for BT.

9.1 Two real-world network examples 361

0 5 10 15 20

0.1

0.2

0.3

0.4

0.5PS

1984198719901993

0 5 10 15 20

0.1

0.2

0.3

0.4

EE

1984198719901993

0 5 10 15 20

0.2

0.3

0.4

0.5

0.6

0.7

BT

1984198719901993

Figure 9.4 Average degree increment over a 20-years time window forpapers published in different years. PS presents an aging effect different fromEE and BT, showing that papers in PS receive citations longer than papersin EE and BT.

5 10 15 20

0.02

0.03

0.04

0.05

0.06

0.07

0.08

PS - Distribution of age of cited papers

2004200720112014

5 10 15 20

0.02

0.04

0.06

0.08

0.1

0.12

EE - Distribution of age of cited papers

2004200720112014

5 10 15 20

0.02

0.04

0.06

0.08

0.1

0.12

BT - Distribution of age of cited papers

2004200720112014

Figure 9.5 Distribution of the age of cited papers for different citing years.

challenge, and one should be quite humble in one’s expectation that the standardmodels are anywhere near to the complexity of real-world networks.

9.1.2 Terrorist networks

TO DO 9.1: Add a discussion on terrorist networks

9.1.3 General real-world network models

We conclude that real-world networks tend to have many properties that ourdifferent from the models that we have discussed so far in this book. Indeed, theyoften have a pronounced community structure, which can act at a microscopicscale or at a macroscopic scale, or anything in between, i.e., such networks arehierarchically organised. Real-world networks can be directed, with all the par-ticulars that this gives rise to. The spatial positioning of the vertices can have aprofound effect on the likelihood of edges being present. Real-world networks areoften changing over time, and should therefore be considered as temporal net-works. In some cases, even several networks work together, and can thus not beseen as separate networks. For example, in transporting people within a country,the railroad network and the road network are both highly relevant. At larger

362 Related Models

distances, also airline networks become involved. Thus, to study how people movearound around the globe, we cannot study each of these networks in isolation. Inscience, the collaboration networks of authors and the citation networks of paperstogether give a much clearer picture of how science works than any single one inisolation, even though these more restricted views can offer useful insight.

One may become rather overwhelmed by the complexity that real-world net-works provide. Indeed, high-level complex network science is part of complexitytheory, the science of complex systems. However, the past decades have given riseto a bulk of insights, often based on relatively simple models such as the onesdiscussed so far. Indeed, often Box (1976) is quoted as saying that “All modelsare wrong but some are useful”. In Box (1979), a more elaborate version of thisquote is as follows:

Now it would be very remarkable if any system existing in the realworld could be exactly represented by any simple model. However,cunningly chosen parsimonious models often do provide remarkablyuseful approximations. For example, the law PV = RT relatingpressure P , volume V and temperature T of an “ideal” gas via aconstant R is not exactly true for any real gas, but it frequentlyprovides a useful approximation and furthermore its structure isinformative since it springs from a physical view of the behaviorof gas molecules. For such a model there is no need to ask thequestion “Is the model true?”. If “truth” is to be the “whole truth”the answer must be “No”. The only question of interest is “Is themodel illuminating and useful?”.

Thus, we should not feel discouraged at all! In particular, it is important toknow when to include an extra features into a model, so that the model at handbecomes more “useful”. For this, the first step is to come up with models that doincorporate these extra features. It turns out that many of the models discussed sofar can easily be adapted so as to include features such as directedness, communitystructure and geometry. Further, it is straightforward to combine such properties,so as to include more of them. The simpler models that do not have such featuresthan serve as a useful model to compare to, and can thus act as a “benchmark”for more complex situation. The understanding of such simple models often helpsus in understanding the more complex models, since many properties, tools andideas can be easily extended to the more complex settings. In some cases, theextra feature gives rise to a richer behavior, which then merits being studiedin full detail. In these steps, network science has moved significantly forwardcompared to the models described so far. The aim of this chapter is to highlightsome of the lessons learned.

Below, we will discuss directed random graphs in Section 9.2, random graphs

9.2 Directed random graphs 363

with macroscopic or global communities in Section 9.3, random graphs with mi-croscopic or local communities in Section 9.4, and spatial random graphs in Sec-tion 9.5.

9.2 Directed random graphs

Many real-world networks are directed, in the sense that edges are oriented.For example, in the World-Wide Web, the vertices are web pages, and the edgesare the hyperlinks between them, which are clearly oriented. One could naturallyforget about these directions, but that would discard a wealth of information.For example, in citation networks, it makes a substantial difference whether mypaper cites a paper, or that paper cites mine.

This section or organised as follows. A directed graph is often called a digraph.We start by defining digraphs. After this, we discuss various models invented fordirected graphs. We start by discussing directed inhomogeneous random graphsin Section 9.2.1, then discuss directed configuration models in Section 9.2.2, andclose with directed preferential attachment models in Section 9.2.3.

A digraph D = (V (D), E(D)) on the vertex set V (D) = [n] has an edge setthat is a subset of the set E(D) ⊆ [n]2 = (u, v) : u, v ∈ [n] of all orderedpairs of elements of [n]. Elements of D are called directed edges or arcs. In manycases, a direction in the edges is natural. For example, citation networks and theWorld-Wide Web are naturally directed.

The connectivity structure of digraphs.

The orientation in the edges causes substantial differences in the connectivitystructure of the graphs involved. Indeed, a given vertex v has both a forwardconnected component consisting of all the vertices that it is connected to, aswell as a backward connected component. Finally, every vertex has a stronglyconnected component (SCC), which consists of those vertices to which there existsa forward and a backward path. See Exercise 9.1 for a description of the topologyof digraphs that ensures that the SCC of a vertex is well defined, in that it doesnot depend on the vertex chosen.

The different notions of connectivity divide the graph up into several disjointparts. Often, there is a unique largest strongly connected component (SSC) thatcontains a positive proportion of the graph. This is the part of the graph thatis most strongly connected. There are collections of vertices that are forwardconnected to the SCC, but not backward, these are the IN parts of the graph.Further, there are the collections of vertices that are backward connected to theSCC, but not forward, these are the OUT parts of the graph. And finally, thereare the parts of the graph that are neither, and consist of their own SCC and INand OUT parts. See Figure 9.6 (which is [Volume 1, Figure 1.19] repeated) for adescription of these parts for the WWW, as well as their relative sizes.

364 Related Models

Tendrils44 Million

nodes

SCC OUTIN

56 Million nodes44 Million nodes 44 Million nodes

Disconnected components

Tubes

Figure 9.6 The WWW according to Broder et al. (2000).

Local weak convergence of digraphs.

It turns out that there are several ways to define local weak convergence fordigraphs. This is due to the fact that LWC is defined in terms of neighborhoods,which in turn depend on the connectivity that one wishes to make use of. Forthe neighborhood B(G)

r (v), do we wish to explore all vertices that can be reachedfrom v (which is relevant when v is the source of an infection), or all the verticesfrom which we can reach v (which is relevant when investigating whether v can beinfected by others, or in the case of PageRank). These are relevant when exploringa graph from or towards a vertex, and we will call then exploration neighborhoods.One might also consider all edges at the same time, thus ignoring the directionof edges for the local neighborhoods.

Let distG(u, v) be the (directed) graph distance from u to v, i.e., the mini-mal number of directed edges needed to connect u to v. Note that, for digraphs,distG(u, v) and distG(v, u) may be different. We will consider the forward explo-ration neighborhood B(G)

r (u) = (V (B(G)

r (u)), E(B(G)

r (u)), u) defined by

V (B(G)

r (u)) = v ∈ V (G) : distG(u, v) ≤ r, (9.2.1)

E(B(G)

r (u)) = (x, y) ∈ E(G) : distG(u, x),distG(u, y) ≤ r, (9.2.2)

as in (2.2.1). For the backward exploration neighborhood, distG(u, v) ≤ r in thedefinition of V (B(G)

r (u)) is replaced by distG(v, u) ≤ r, and distG(u, x),distG(u, y)in the definition of E(B(G)

r (u)) is replaced by distG(x, u), distG(y, u). One couldeven consider the forward-backward exploration neighborhood, which is the unionof the forward and the backward exploration neighborhoods.

The notion of isomorphisms in Definition 2.3, and of the metric on directedrooted graphs are straightforwardly adapted from Definition 2.4. From this mo-ment on, the definitions of forward local weak convergence of deterministic di-


graphs, and that of forward local weak convergence of random digraphs in theirvarious settings are straightforwardly adapted as well.

There is a minor catch though. While the notion of forward exploration neigh-borhood keeps track of the out-degrees d(out)

v for all v ∈ B(G)

r−1(u), it does not keeptrack of the in-degrees d(in)

v for v ∈ B(G)

r−1(u). Similarly, the notion of backwardexploration neighborhood keeps track of the out-degrees d(in)

v for all v ∈ B(G)

r−1(u),but not of the out-degrees d(out)

v for v ∈ B(G)

r−1(u). Below, we need these as well.For this, we add a degree-mark to the vertices that indicates the in-degrees inthe forward exploration neighborhood, and the out-degrees of the vertices in thebackward exploration neighborhood. Particularly in random graphs that are lo-cally tree-like, the edges in the other direction than the one explored, will oftengo to vertices that are far away, and are thus often not in the explored neigh-borhood. Thus, to use this information, we need to explicitly keep track of it.

Let us explain how this marking is defined. For a vertex v, let m(v) denote itsmark. We then define an isomorphism φ : V (G1) → V (G2) between two labeledrooted graphs (G1, o1) and (G2, o2) to be an isomorphism between (G1, o1) and(G2, o2) that respects the marks, i.e., for which m1(v) = m2(φ(v)) for everyv ∈ V (G1), where m1 and m2 denote the degree-mark functions on G1 and G2

respectively. We then define R? as in (2.1.2), and the metric on rooted degree-marked graphs as in (2.1.3). We call the resulting notion of LWC marked forwardLWC, and marked backward LWC, respectively.

Even when considering forward-backward neighborhoods, the addition of marksis necessary. Indeed, while for the root of this graph, we do know its in- andout-degree by construction, for the other vertices in the forward and backwardneighborhoods this information is still not available. While the above discussionmay not be relevant for all questions that one may wish to investigate using localweak convergence techniques, it is useful for the discussion of PageRank, as wediscuss next.

Exercises 9.3–9.5 investigate the various notions of local weak convergence forsome directed random graphs that are naturally derived from undirected graphs.

Weak convergence of PageRank.

Recall the discussion in Section 2.4.4 about local weak convergence of PageRankon undirected graphs. Here we will extend this discussion to digraphs, which arefar more relevant in practice.

Recall the definition of PageRank from [Volume 1, Section 1.5]. Let us firstexplain the solution in the absence of dangling ends, so that d(out)

i ≥ 1 for alli ∈ [n]. Let G = ([n], E) denote the web graph, so that [n] corresponds to theweb pages and a directed edge (i, j) ∈ E is present precisely when page i refersto j. Denote the out-degree of vertex i by d(out)

i . Fix α ∈ (0, 1). Then, we let the

366 Related Models

vector of PageRanks (Ri)i∈[n] be the unique solution to the equation

Ri = α∑j→i

Rjd(out)

j

+ 1− α, (9.2.3)

satisfying the normalization∑

i∈[n]Ri = n (recall (2.4.51)).The parameter α ∈ (0, 1) is called the damping factor, and guarantees that

(2.4.50) has a unique solution. This solution can be understood in terms of the sta-tionary distribution of a bored surfer. Indeed, denote πi = Ri/n, so that (πi)i∈[n]

is a probability distribution that satisfies a similar relation as (Ri)i∈[n] in (2.4.50),namely

πi = α∑j→i

πjd(out)

j

+1− αn

, (9.2.4)

Therefore, (πi)i∈[n] is the stationary distribution of a random walker that, withprobability α jumps according to a simple random walk, i.e., it chooses any of theout-edges with equal probability, while with probability 1− α, the walker jumpsto a uniform location.

In the presence of dangling ends, we can just redistribute their mass equallyover all vertices that are not dangling ends, so that (2.4.50) becomes

Ri = α∑j→i

Rjd(out)

j

+ (1− α)∑j∈D

Rj + 1− α, (9.2.5)

where D ⊆ [n] denotes the collection of dangling nodes (and, when i ∈ D, thefirst term is zero by convention, and (9.2.5) reduces to (9.2.3)).

The damping factor α is quite crucial. When α = 0, the stationary distributionis just πi = 1/n for every i, so that all pages have PageRank 1. This is not veryinformative. On the other hand, PageRank concentrates on dangling ends whenα is close to one, and this is also not what we want. Experimentally, α = 0.85seems to work well and strikes a nice balance between these two extremes.

We next investigate the convergence of the PageRank distribution on a directedgraph sequence (Gn)n≥1 that converges locally weakly:

Theorem 9.1 (Existence of asymptotic PageRank distribution) Consider asequence of directed random graphs (Gn)n∈N. Then, the following hold:

(i) If Gn converges in distribution in the marked backward local weak sense, thenthere exists a limiting distribution R∅, with E[R∅] ≤ 1, such that

R(Gn)

on

d−→ R∅. (9.2.6)

(ii) If Gn converges in probability in the marked backward local weak sense, thenthere exists a limiting distribution R∅, with E[R∅] ≤ 1, such that, for everyr > 0,

1

n

∑v∈[n]

1R(Gn)v >r

P−→ P (R∅ > r) . (9.2.7)


Theorem 9.1 is the directed version of Theorem 2.24. Interestingly, the posi-tivity of the damping factor also allows us to give a power-iteration formula forR(Gn)

v , and thus for R∅. Indeed, let

A(Gn)

i,j =1j→i

d(out)

j

, i, j ∈ V (Gn) (9.2.8)

denote the (normalized) adjacency matrix of the graph Gn. Then, R(Gn)

v can becomputed as

R(Gn)

v = (1− α)∞∑k=0

αk∑

i∈V (Gn)

(A(Gn))kv,i. (9.2.9)

As a result, when Gn converges in distribution in the marked backward localweak sense with limit (G,∅), then also R∅ can be computed as

R∅ = (1− α)∞∑k=0

αk∑

i∈V (G)

(A(G))k∅,i, (9.2.10)

where A(G)

i,j is the normalized adjacency matrix of the backwards local weak limitG.

The power-law hypothesis for PageRank.

Recall from [Volume 1, Section 1.5] that the PageRank power-law hypothesis statesthat the PageRank distribution satisfies a power law with the same exponent asthat of the in-degree. By Theorem 9.1, this can be rephrased by stating thatP(R∅ > r) r−τin when P(D(in)

∅ > r) r−τin (we are on purpose being vagueabout what means in this context).

Exercises 9.6–9.7 investigate the implications of (9.2.10) for the power-lawhypothesis for random graphs having bounded out-degrees.

We continue by discussing some models of directed random graphs or randomdi-graphs. We start in Section 9.2.1 by discussing inhomogeneous random di-graphs, in Section 9.2.2 we continue with the directed configuration model, andin Section 9.2.3, we close with directed preferential attachment models.

9.2.1 Directed inhomogeneous random graphs

Let (xi)i∈[n] be a sequence of variables with values in S such that the empiricaldistribution of (xi)i∈[n] approximates a measure µ as n→∞. That is, we assumethat, for each µ-continuous Borel set A ⊆ S, as n→∞,

1

n|i ∈ [n] : xi ∈ A| → µ(A). (9.2.11)

Here we say that a Borel set A is µ-continuous whenever its boundary ∂A haszero probability, i.e., µ(∂A) = 0.

368 Related Models

Given n, let Gn be the random digraph on the vertex set (xi)i∈[n] with inde-pendent arcs having probabilities

pij = P((xi, xj) ∈ E(Gn)) = 1 ∧ (κ(xi, xj)/n), i, j ∈ [n]. (9.2.12)

We denote the resulting graph by DIRGn(κ). Combining S, µ, and κ, we obtaina large class of inhomogeneous digraphs with independent arcs. Obviously, themodel will include digraphs with in-degree and out-degree distributions whichhave power laws. This general model was studied in detail in Chapter 3. We area little less general than there, since we assume that κ in (9.2.12) not to dependon n. This simplifies the exposition considerably. Note that in the case of randomgraphs it is necessary to assume, in addition, that the kernel κ is symmetric. Inthe definition of digraphs Gn for n ≥ 2, we do not require the symmetry of thekernel.

Assumptions on the kernel.

We need to impose further conditions on the kernel κ, like those in Chapter3. Namely, we need to assume that the kernel κ is irreducible (µ × µ)-almosteverywhere. That is, for any measurable A ⊆ S with µ(A) 6= 1 or µ(A) 6= 0,the identity (µ × µ)((s, t) ∈ A × (S \ A) : κ(s, t) = 0) = 0 implies that eitherµ(A) = 0 or µ(S \ A) = 0. In addition, we assume that κ is continuous almosteverywhere on (S × S, µ× µ), and the number of arcs in DIRGn(κ), denoted by|E(DIRGn(κ))|, satisfies that, as n→∞,

1

n|E(DIRGn(κ))| P−→

∫S×S

κ(s, t)µ(ds)µ(dt) <∞. (9.2.13)

Note that here we implicitly assume that κ is integrable.

Examples of directed inhomogeneous random graphs

We next discuss some examples of DIRGn(κ).

Directed Erdos-Renyi random graph. The most basic example is the directedErdos-Renyi random graph, in which pij = pji = λ/n. In this case, κ(xi, xj) = λ.

Finite-type directed inhomogeneous random graphs. Slightly more involved arekernels of finite type, in which case (s, t) 7→ κ(s, t) takes on finitely many values.Such kernels are also highly convenient to approximate more general models with,as has been exemplified in the undirected setting in Chapter 3.

Directed rank-1 inhomogeneous random graphs. We next generalize random-1 in-homogeneous random graphs. For v ∈ [n], let w(in)

v and w(out)

v be its respectivein- and out-weights that will have the interpretation of the asymptotic averagein- and out-degrees, respectively, under a summation-symmetry condition on theweights. Then, the directed generalized random graph DGRGn(w) has edge prob-abilities given by

pij = p(DGRG)

ij =w(out)

i w(in)

j

`n + w(out)

i w(in)

j

, (9.2.14)


where

`n =1

2

∑i∈[n]

(w(out)

i + w(in)

i ). (9.2.15)

Let (W (out)

n ,W (in)

n ) = (w(out)

o , w(in)

o ) denote the in- and out-weights of a uniformlychosen vertex o ∈ [n]. Similarly to Condition 1.1, we assume that

(W (out)

n ,W (in)

n )d−→ (W (out),W (in)), E[W (out)

n ]→ E[W (out)],E[W (in)

n ]→ E[W (in)],(9.2.16)

where (W (out),W (in)) is the limiting in- and out-weight distribution. Exercises 9.8investigates the expected number of edges in this setting.

When thinking of w(out)

i and w(in)

i as corresponding to the approximate out- andin-degrees of vertex i ∈ [n], it is reasonable to assume that

E[W (in)] = E[W (out)]. (9.2.17)

Indeed, we know that, with D(out)

i and D(in)

i denoting the out- and in-degrees ofvertex i ∈ [n], that (recall Exercise 9.2)∑

i∈[n]

D(out)

i =∑i∈[n]

D(in)

i . (9.2.18)

Thus, if indeed w(out)

i and w(in)

i are the approximate out- and in-degrees of vertexi ∈ [n], then also ∑

i∈[n]

w(out)

i ≈∑i∈[n]

w(in)

i , (9.2.19)

which, assuming (9.2.16), proves (9.2.17).As discussed in more detail in Section 1.3.2 (see in particular (1.3.8) and

(1.3.9)), many related versions of the rank-1 inhomogeneous random graph exist.We refrain from giving more details here.

Multi-type marked branching processes for DIRGn(κ).

For large n, the local weak convergence and phase transition in the digraphGn can be described in terms of the survival probabilities of the related multi-type Galton-Watson branching processes with type space S. Let us introduce thenecessary mixed-Poisson branching processes now. Given s ∈ S, let X (s) andY(s) denote the Galton-Watson processes starting at a particle of type s ∈ Ssuch that the number of children of types in a subset A ⊆ S of a particle of typet ∈ S has a Poisson distribution with means∫

Aκ(t, u)µ(du), and

∫Aκ(u, t)µ(du), (9.2.20)

respectively. These numbers are independent for disjoint subsets A and for dif-ferent particles. These two branching processes correspond to the forward andbackward limits of DIRGn(κ).

We now extend the discussion by also defining the marks. When we consider

370 Related Models

the branching process X (s), to each individual of type t, we associate an inde-pendent mark having a Poisson distribution with mean

∫S κ(t, u)µ(du). When we

consider the branching process X (s), to each individual of type t, we associatean independent mark having a Poisson distribution with mean

∫S κ(u, t)µ(du),

instead. These random variables correspond to the ‘in-degrees’ for the forwardexploration process X (s), and the ‘out-degrees’ for the backward exploration pro-cess Y(s). Finally, for the forward-backward setting, we let the marked branchingprocesses X (s) and Y(s) be independent. We call these objects Poisson markedbranching processes with kernel κ. As in Section 3.4.3, we let Tκ be defined as in(3.4.13), i.e., for f : S → R, we let (Tκf)(x) =

∫S κ(x, y)f(y)µ(dy).

We now come to the main results on DIRGn(κ), which involve its local weakconvergence of and its phase transition.

Local weak convergence for DIRGn(κ)

Recall that Cmax denotes the maximal SCC, and C(2) the second largest SCC, inDIRGn(κ). Here ties are broken arbitrarily when needed. The following theoremdescribes the local weak convergence of DIRGn(κ):

Theorem 9.2 (Local weak convergence of DIRGn(κ)) Suppose that κ is irre-ducible, continuous almost everywhere on (S ×S, µ×µ), and that (9.2.13) holds.Then, DIRGn(κ) converges in probability in the marked forward and backwardand forward-backward local weak convergence sense to the above Poisson markedbranching processes with kernel κ, where the law of the type of the root ∅ is µ.

We do not give a proof of Theorem 9.2, and refer to Section 9.6 for its history.In Exercise 9.10, you are asked to determine the local weak limit of the Erdos-Renyi random graph. Exercise 9.11 proves Theorem 9.2 in the case of finite-typekernels, while Exercise 9.12 investigates local weak convergence of the directedgeneralized random graph.

The phase transition in DIRGn(κ)

The critical point of the emergence of the giant SCC is determined by the averagedjoint survival probability

ζ =

∫SζX (s)ζY(s)µ(ds) (9.2.21)

being positive. Here ζX (s) and ζY(s) denote the non-extinction probabilities ofX (s) and Y(s), respectively. The following theorem describes the phase transitionon DIRGn(κ):

Theorem 9.3 (Phase transition in DIRGn(κ)) Suppose that κ is irreducible,continuous almost everywhere on (S × S, µ× µ), and that (9.2.13) holds.

(a) When ν = ‖Tκ‖ > 1, ζ in (9.2.21) satisfies ζ ∈ (0, 1] and

|Cmax|/n P−→ ζ, (9.2.22)



(b) When ν = ‖Tκ‖ < 1, ζ in (9.2.21) satisfies ζ = 0 and |Cmax|/n P−→ 0 and

|E(Cmax)|/n P−→ 0.

Theorem 9.3 is the directed version of Theorem 3.16. Exercises 9.13 and 9.14investigate the conditions for a giant component to exist for the directed Erdos-Renyi random graphand generalized random graphs.

9.2.2 The directed configuration model

One way to obtain a directed version of CMn(d) is to give each edge a direc-tion, chosen with probability 1

2, independently of all other edges. In this model,

however, the correlation coefficient between the in- and out-degree of vertices isclose to one, particularly when the degrees are large (see Exercise 9.15). In real-world applications, correlations between in- and out-degrees can be positive ornegative, depending on the precise application, so we aim to formulate a modelthat is more general. Therefore, we formulate a general model of directed graphs,where we can prescribe both the in- and out-degrees of vertices.

Let d(in) = (d(in)

i )i∈[n] be a sequence of in-degrees, where d(in)

i denotes the in-degree of vertex i. Similarly, we let d(out) = (d(out)

i )i∈[n] be a sequence of out-degrees. Naturally, we need that∑

i∈[n]

d(in)

i =∑i∈[n]

d(out)

i (9.2.23)

in order for a graph with in- and out-degree sequence d = (d(in),d(out)) to exist(recall Exercise 9.2).

We think of d(in)

i as the number of in-half-edges incident to vertex i and d(out)

i

as the number of out-half-edges incident to vertex i. The directed configurationmodel DCMn(d) is obtained by pairing each in-half-edge to a uniformly chosenout-half-edge. The resulting graph is a random multigraph, where each vertex ihas in-degree d(in)

i and out-degree d(out)

i . Similarly to CMn(d), DCMn(d) can haveself-loops as well as multiple edges. A self-loop arises at vertex i when one of itsin-half-edges pairs to one of its out-half-edges. Let (D(in)

n , D(out)

n ) denote the in-and out-degree of a vertex chosen uniformly at random from [n].

We continue to investigate the strongly connected component of DCMn(d).Assume, similarly to Condition 1.5(a)-(b), that

(D(in)

n , D(out)

n )d−→ (D(in), D(out)), (9.2.24)

and that

E[D(in)

n ]→ E[D(in)], and E[D(out)

n ]→ E[D(out)]. (9.2.25)

Naturally, by (9.2.23), this implies that E[D(out)] = E[D(in)].Let

pk,l = P(D(in) = k,D(out) = l) (9.2.26)

372 Related Models

denote the asymptotic joint in- and out-degree distribution. We refer to (pk,l)k,l≥0

simply as the asymptotic degree distribution of DCMn(d). The distribution (pk,l)k,l≥0

plays a similar role for DCMn(d) as (pk)k≥0 does for CMn(d). We further define

p?(in)

k =∑l

lpk,l/E[D(out)], p?(out)

l =∑k

kpk,l/E[D(in)]. (9.2.27)

The distributions (p(in)

k )k≥0 and (p(out)

k )k≥0 correspond to the asymptotic forwardin- and out-degree of a uniformly chosen edge in CMn(d).

The local weak limit of the directed configuration model

Let us now formulate the construction of the appropriate marked forward andbackward branching processes that will arise as the local weak limit of DCMn(d).For the forward branching process, we let the root have out-degree with distri-bution p(out)

l = P(D(out) = l) =∑

l≥0 pk,l, whereas every other vertices except

the root has independent out-degree with law (p?(out)

l )l≥0. Further, for a vertex ofout-degree l, we let the mark (that will correspond to its asymptotic in-degree)be k with probability pk,l/p

(out)

l . For the marked backward branching process,we reverse the role of in and out. For the marked forward-backward branchingprocess, we let the root have joint out- and in-degree distribution (pk,l)k,l≥0, anddefine the forward and backward processes and marks as before. We call the abovebranching process the marked modular branching process with degree distribution(pk,l)k,l≥0.

Then, we have the following local weak limit result:

Theorem 9.4 (Local weak convergence of DCMn(d)) Suppose that the out- andin-degrees in directed configuration model DCMn(d) satisfy (9.2.24) and (9.2.25).Then, DCMn(d) converges in probability in the marked forward and backwardand forward-backward local weak convergence sense to the above marked modularbranching process with degree distribution (pk,l)k,l≥0.

It will not come as a surprise that Theorem 9.4 is the directed version ofTheorem 4.1. Exercise 9.16 asks you to give a proof of Theorem 9.4.

The giant component in the directed configuration model

Recall that the strongly connected component of a vertex v is the set of u forwhich there are directed paths from v to u and from u to v, so that u is both inthe forward and backward cluster of v. We let Cmax denote the size of the largeststrongly connected component in DCMn(d).

Let θ(in) and θ(out) be the survival probabilities of the branching processes withoffspring distributions (p?(in)

k )k≥0 and (p?(out)

k )k≥0, respectively, and define

ζ(in) = 1−∑k,l

pk,l(1− θ(in))l, ζ(out) = 1−∑k,l

pk,l(1− θ(out))k. (9.2.28)

Then, ζ(out) has the interpretation of the asymptotic probability that a uniform


vertex has a large forward cluster, while ζ(in) has the interpretation of the asymp-totic probability that a uniform vertex has a large backward cluster. Here, thebackward cluster of a vertex v consists of all vertices u that are connected to v,and the forward cluster of v consists of all vertices u for which v is connected tou. Further, let

ψ =∑k,l

pk,l(1− θ(in))l(1− θ(out))k (9.2.29)

so that ψ has the interpretation of the asymptotic probability that a uniformvertex has finite forward and backward cluster. We conclude that 1 − ψ is theprobability that a uniform vertex has either a large forward or backward cluster,and thus

ζ = ζ(out) + ζ(in) − (1− ψ) (9.2.30)

has the interpretation of the asymptotic probability that a uniform vertex hasboth a large forward and backward cluster.

Finally, we let

ν =∞∑k=0

kp?(in)

k =∑k,l

klpk,l/E[D(out)] =E[D(in)]D(out)]

E[D(out)], (9.2.31)

Alternatively, ν =∑∞

k=0 kp?(out)

k .Then, the main result concerning the size of the giant is as follows:

Theorem 9.5 (Phase transition in DCMn(d)) Suppose that the out- and in-degrees in directed configuration model DCMn(d) satisfy (9.2.24) and (9.2.25).

(a) When ν > 1, ζ in (9.2.30) satisfies ζ ∈ (0, 1] and

|Cmax|/n P−→ ζ, (9.2.32)


(b) When ν < 1, ζ in (9.2.30) satisfies ζ = 0 and |Cmax|/n P−→ 0 and |E(Cmax)|/n P−→0.

Theorem 9.5 is the adaptation to DCMn(d) of Theorem 4.4 for CMn(d).In Exercises 9.17, you are asked to prove that the probability that the size

of |Cmax|/n exceeds ζ + ε vanishes, whenever a graph sequence converges in themarked forward-backward sense locally weakly in probability. In Exercise 9.18,this is used to prove Theorem 9.5(b).

Logarithmic typical distances in the directed configuration model

We continue by studying the small-world nature of typical distances in the di-rected configuration model. Let u, v ∈ [n], and let distDCMn(d)(u, v) denote thegraph distance between u and v, i.e., the minimal number of directed edges neededto connect u to v. Our main result is as follows:

374 Related Models

Theorem 9.6 (Logarithmic typical distances in DCMn(d)) Suppose that theout- and in-degrees in directed configuration model DCMn(d) satisfy (9.2.24) and(9.2.25), and assume that

ν =E[D(in)]D(out)]

E[D(out)]> 1. (9.2.33)

Further, assume that

E[(D(in)

n )2]→ E[(D(in))2] <∞, E[(D(out)

n )2]→ E[(D(out))2] <∞. (9.2.34)

Then, conditionally on o1 → o2,

distDCMn(d)(o1, o2)

log(n)P−→ 1

log ν. (9.2.35)

Theorem 9.6 is the directed version of Theorem 7.1. The philosophy behindthe proof is quite similar: by using a breadth-first exploration process, we seethat |∂B(out)

r (o1)| grows roughly like νr, so in order to ‘catch’ o2, one would needr ≈ logν(n). Of course, up to this stage, the branching process approximationstarts to fail, which is why one needs to grow the neighborhoods from two sides.

It would be tempting to believe that (9.2.34) is not precisely needed, and thisis the content to Exercise ??.

There are no results as of yet that investigate the typical distances distDCMn(d)(o1, o2)when ν =∞. However, as you are asked to prove in Exercise ??, it is possible toshow that distDCMn(d)(o1, o2) = oP(log n) in this case.

Logarithmic diameter in the directed configuration model

We next investigate the logarithmic asymptotics of the diameter of DCMn(d).Here, we define the diameter of the digraph DCMn(d) to be equal to

diam(DCMn(d)) = maxu,v : u→v

distDCMn(d)(u, v). (9.2.36)

In order to state the result, we introduce some notation. Let

f(s, t) = E[sD(in)

tD(out)

] (9.2.37)

be the bivariate generating function of (D(in), D(out)). Recall that θ(in) and θ(out)

are the survival probabilities of a branching process with offspring distributions(p?(in)

k )k≥0 and (p?(out)

k )k≥0, respectively. Write

ν(in) =∂

∂sf(1− θ(in), 1), and ν(out) =

∂

∂sf(1, 1− θ(out)). (9.2.38)

Then, the diameter in the directed configuration model DCMn(d) behaves asfollows:

Theorem 9.7 (Logarithmic diameter in DCMn(d)) Suppose that the out- and


in-degrees in directed configuration model DCMn(d) satisfy (9.2.24) and (9.2.25).

Further, assume that (9.2.34) holds. Then, when ν = E[D(in)]D(out)]

E[D(out)]> 1,

diam(DCMn(d))

log(n)P−→ 1

log ν(in)+

1

log ν+

1

log ν(out). (9.2.39)

Theorem 9.7 is the directed version of Theorem 7.16. The interpretation of thedifferent terms is similar to that in Theorem 7.16: the terms involving ν(in) andν(out) indicate the depths of the deepest traps, where a trap indicates that theneighborhood lives for a long time without gaining substantial mass. The terminvolving ν(in) is the largest in-trap, and that involving ν(out) is the largest in-trap.These numbers are determined by first taking r such

P(|∂Br(o)| ∈ [1,K]) ≈ Θ(1

n), (9.2.40)

where ∂Br(o) corresponds to the ball of the backward r-neighborhood for ν(in),and to the forward r-neighborhood for ν(out), while K is arbitrary and large. Dueto large deviations for supercritical branching processes, one can expect that

P(|∂Br(o)| ∈ [1,K]) ≈ (ν(in/out))r. (9.2.41)

Then, we can identify r(in) = logν(in)(n) and r(out) = logν(out)(n). The solutions to(9.2.41) are given by (9.2.38). For those special vertices u, v for which |∂B(in)

r (u)| ∈[1,K] and |∂B(out)

r (v)| ∈ [1,K], it then takes around logν(n) steps to connect∂B(in)

r (u) to ∂B(out)

r (v), explaining the asymptotics in Theorem 9.7. Of course,proving that this heuristic is correct is quite a bit harder.

It is tempting to conjecture that Theorem 9.7 remains valid under weakerassumptions that in (9.2.34), however, this has not be shown and it may be hard.

9.2.3 Directed preferential attachment models

Bollobas, Borgs, Chayes and Riordan (2003) investigate a directed preferentialattachment model and prove that the degrees obey a power law similar to theone in [Volume 1, Theorem 8.3]. For its definition and the available results ondegree structure, we refer the reader to [Volume 1, Section 8.9]. Unfortunately,the type of properties investigated in this book have so far not been analyzedfor this random graph model. In particular, there is no description of the strongconnected component, nor of typical distances and diameters of this model.

One can also interpret normal preferential attachment models as directed graphsby orienting edges from young to old. This can be a useful perspective, for exam-ple when modeling temporal networks in which younger vertices can only connectto older vertices, such as for citation networks. We refrain from discussing thisin more detail, as the connectivity structure of such directed versions is not sointeresting. For example, the strongly connected component is always small (seeExercise 9.19).

376 Related Models

9.3 Random graphs with community structure: global communities

Many real-world networks have communities that are global in their size. Forexample, when partitioning science up in its core fields, citation networks havesuch a global community structure. Also, in Belgian telecommunication net-works of who calls who, the division into the French and the Flemish speakingparts is clearly visible, while in U.S. politics, the partition into Republicans andDemocrats plays a pronounced effect on the network structure of social interac-tions between politicians. In this section, we discuss random graph models fornetworks with a global community structure. This section is organised as fol-lows. In Section 9.3.1, we discuss stochastic block models, which are the modelsof choice for networks with community structures. In Section 9.3.2, we discussdegree-corrected stochastic blockmodels, which are similar to stochastic block-models, but allow for more inhomogeneity in the degree structure. In Section9.3.3, we discuss configuration models with global communities, and we close inSection 9.3.4 with preferential attachment models with global communities. Weintroduce the models, state the most important results in them, and also dis-cuss the topic of community detection in such models, a topic that has attractedconsiderable attention due to its practical importance.

9.3.1 Stochastic blockmodel

We have already encountered stochastic blockmodels as inhomogeneous randomgraphs with finitely many types in Chapter 3. Here, we repeat the definition, asterwhich we focus on the (highly interesting and challenging) community detectionresults.

Fix r ≥ 2 and suppose we have a graph with r different types of vertices.Let S = 1, . . . , r. Let ni denote the number of vertices of type i, and letµn(i) = ni/n. Let IRGn(κ) be the random graph where two vertices of types iand j, respectively, joined by an edge with probability n−1κ(i, j) (for n ≥ maxκ).Then κ is equivalent to an r × r matrix, and the random graph IRGn(κ) hasvertices of r different types (or colors). We will assume that the type distributionµn satisfies

limn→∞

ni/n = µi. (9.3.1)

Exercise 3.11 then shows that the resulting graph is graphical, so that the resultsin Chapters 3 and 6 apply. As a result, we will not spend much time on thedegree distribution, giant and graph distances in this model, as they have beenaddressed there. Exercise 9.20 studies the degree structure, while Exercise 9.21investigates when a giant exists in this model.

Let us mention that for the stochastic blockmodel to be a good model fornetworks with a global community structure, one would expect that the edgeprobabilities of internal edges, i.e., edges between vertices of the same type, arelarger than those of the external types, i.e., edges between vertices of differenttypes. In formulas, this means that κi,i > κi,j for all i, j ∈ S. For example, the

9.3 Random graphs with community structure: global communities377

bipartite Erdos-Renyi random graph, which has a structure that is quite oppositeto a random graph with global communities (as vertices are only neighbors ofvertices of a different type) is not considered a stochastic blockmodel.

Community detection in stochastic blockmodels

We next discuss the topic of community detection in stochastic blockmodels.Before we can say anything about when it is possible to detect communities,we must first define what this means. A community detection algorithm is anassignment π : [n] 7→ [r] where π(i) = s means that the algorithm assigns type sto vertex i. In what follows, we will assume that the communities have equal size.Then, in a random guess for the group memberships, two vertices will be guessedto be of the same type with probability 1/r. As a result, we are only impressedwith the performance of a community detection algorithm when it does far betterthan random guessing. This explains the following definition:

Definition 9.8 (Solvable community detection) Consider a stochastic block-model where there are the same number of vertices of each of the r types, andwhere σ(i) denotes the type of vertex i ∈ [n]. We call a community detectionproblem solvable when there exists an algorithm σ : [n] 7→ [r] and an ε > 0 suchthat, whp as n→∞,

1

n2

∑i,j∈[n]

[1σ(i)=σ(j),σ(i)=σ(j) −

1

r

]≥ ε, (9.3.2)

and otherwise we call the problem unsolvable.

The problem is the most difficult when the degrees of all the different types arethe same. This is not so surprising, as otherwise one may aim to classify basedon the degrees of the graph. As a result, from now on, we will assume that thedegrees of all types of vertices are the same. Some ideas about how one can provethat the problem is solvable can be obtained from Exercises 9.22 and 9.23.

We also assume that there are just two types, so that we can take pij = a/nfor vertices of the same type, and pij = b/n for vertices of opposite type. Herewe think of a > b. The question whether the community detection is solvable isanswered in the following theorem:

Theorem 9.9 (Stochastic blockmodel threshold) Take n to be even. Considera stochastic blockmodel of two types, each having n/2 vertices, where the edgeprobability is pij = a/n for vertices of the same type, and pij = b/n for vertices ofopposite types. Then, the community detection problem is solvable as in Definition9.8 when

(a− b)2

2(a+ b)> 1, (9.3.3)

while it is unsolvable when(a− b)2

2(a+ b)< 1. (9.3.4)

378 Related Models

Theorem 9.9 is quite surprising. Indeed, it shows that not only should a > bin order to have a chance to perform community detection, but it should besufficiently large compared to a + b. Further, the transition in Theorem 9.9 issharp, in the sense that (9.3.3) and (9.3.4) really complement each other. It isunclear what happens in the critical case when (a− b)2 = 2(a+ b). The solvablecase in (9.3.3) is sometimes called an ‘achievability result’, the unsolvable case in(9.3.4) an ‘impossibility result’.

We will not give the proof of Theorem 9.9, as this is quite involved. The proof ofthe solvable case also shows that the correlation in tends to 1 when (a−b)2/[2(a+b)] grows large.

In Exercise 9.24, you are asked to show that (9.3.3) implies that a− b > 2 anda+ b > 2, and to conclude that a giant exists in this setting.

While the results for general number of types r are less complete, there is anachievability result when pi,i = a/n and pi,j = b/n for all i, j ∈ [r], in which(9.3.3) is replaced by

(a− b)2

r(a+ (k − 1)b)> 1, (9.3.5)

which indeed reduced to (9.3.3) for r = 2. Also, many results are known aboutwhether efficient algorithms for community detection exist. In general, this meansthat not only an algorithm should exist, but it should also be computable inreasonable time (say Θ(n log n)). We refer to Section 9.6 for a more elaboratediscussion on such results.

Let us close this section by explaining how thresholds such as (9.3.3) and(9.3.5) can be interpreted. Interestingly, there is a close connection to multi-type branching processes. Consider a branching process with finitely many types.Kesten and Stigum (1966) asked in this context when it would be possible toestimate the type of the root when observing the type of the vertices in generationk for very large k. offspring matrix Mi,j = κi,jµi, which is r × r. Let λ1 > λ2 bethe two largest eigenvalues of M. Then, the Kesten-Stigum criterion is that thisis possible with probability strictly larger than 1/r when

λ22

λ1

> 1. (9.3.6)

Next, consider a general finite type inhomogeneous random graph, with limitingtypes µi and expected-neighbor matrix Mi,j = κi,jµi, which is r×r. Obviously, thelocal weak limit of the stochastic blockmodel is the above multi-type branchingprocess, so the link can indeed be expected. Under the condition in (9.3.6), it isbelieved that the community detection problem is solvable, and that communitiescan even be detected in polynomial time. For r = 2, this is sharp, as we have seenabove. For r ≥ 3, the picture is much more involved. It is believed that for r ≥ 4, adouble phase-transition occurs: Detection should be possible in polynomial time)when λ2

2/λ1 > 1, much harder but still possible (i.e., the best algorithms take


exponentially long) when λ22/λ1 > c∗ for some 0 < c∗ < 1, and information-

theoretically impossible when λ22/λ1 < c∗. However, this is not yet known in the

general case.

The way how to get from a condition like (9.3.6) to an algorithm for communitydetection is by using the two largest eigenvalues of the random graph, and obtainan estimate for the partition by using the eigenvectors corresponding to the non-backtracking random walk on the graph. The leading eigenvalue converges to λ1

in probability, while the second is bounded by |λ2|. This, together with a goodapproximation of the corresponding eigenvectors, suggests a specific estimationprocedure that we explain now.

Let B be the non-bracktracking matrix of the graph G. This means that Bis indexed by the oriented edges ~E(G) = (u, v) : u, v ∈ E(G), so that B =

(Be,f )e,f∈~E(G). For an edge e ∈ ~E(G), denote e = (e1, e2), and write

Be,f = 1e2=f1,e1 6=f2, (9.3.7)

which indicates that e ends in the vertex in which f starts, but e is not thereversal of f . The latter property explains the name non-backtracking matrix.

Now we come to eigenvalues. We restrict ourselves to the case where r = 2,even though some of the results extend with modifications to higher values ofr. Let λ1(B) and λ2(B) denote the two leading eigenvalues of B. Then, for thestochastic blockmodel,

λ1(B)P−→ λ1, λ2(B)

P−→ λ2, (9.3.8)

where we recall that λ1 > λ2 are the two largest eigenvalues of M. It turns outthat for the Erdos-Renyi random graph with edge probability (a+b)/[2n] ≡ α/n,

the first eigenvalue λ1(B)P−→ λ1 = α, while the second eigenvalue λ2(B) satisfies

λ2(B) ≤ √α+oP(1). Note that this does not follow from (9.3.8), since M is a oneby one matrix. Now, for the stochastic blockmodel with r = 2, we instead havethat λ2(B)

P−→ λ2 = (a− b)/2. Thus, when

λ2(B)2

λ1(B)> 1, (9.3.9)

then we can expect that the graph is a stochastic blockmodel, while if the reverseinequality holds, then we are not even sure whether the model is an Erdos-Renyirandom graph or a stochastic blockmodel instead. In the latter case, the graphis so random and homogeneously distributed that we will not be able to make agood estimate for the types of the vertices, which strongly suggests that this caseis unsolvable. This at least informally explains (9.3.6).

We next explain how the above analysis of eigenvalues can be used to estimatethe types. Assume that λ2

2/λ1 > 1. Let ξ2(B) : ~E(G)→ R denote the normalizedeigenvector corresponding to λ2(B). We fix a constant θ > 0. Then, we estimateσ(v) = 1 when ∑

e : e2=v

ξk(e) ≥θ√n, (9.3.10)

380 Related Models

and otherwise we estimate that σ(v) = 2, for some deterministic threshold θ. Thisestimation can then be shown to achieve (9.3.2) due to the sufficient separationof the eigenvalues.

9.3.2 Degree-corrected stochastic blockmodel

While the stochastic blockmodel is a nice model to model communities, it hasdegrees that have Poisson tails (recall Theorem 3.4). Thus, in order to account forthe abundant inhomogeneity, an adaptation of the stochastic blockmodel has beenproposed, where the degrees are more flexible. This is called the degree-corrctedblockmodel. The degree-corrcted blockmodel is an inhomogeneous random graphthat takes features of both the rank-1 inhomogeneous random graphs as wellas of the stochastic blockmodel. Just like the rank-1 setting, there are variouspossible versions of the model. Here we stick to a version for which the strongestcommunity detection results have been proved. Let us now define the model.

For each vertex v, we sample a random variable Xv, where we assume that(Xv)v∈[n] are i.i.d. These will be the vertex weights. Conditionally on (Xv)v∈[n],we then assume that the edge between vertices u and v is present with probability

puv =(κσ(u),σ(v)

XuXv

n

)∧ 1, (9.3.11)

where σ(u) ∈ [r] denotes the type of vertex u.

Let us first discuss this setting in the simplest case where r = 1. When theweights (xv)v∈[n] in (9.3.11) would be fixed, then we could take them as xv =wv/`n, where `n =

∑v∈[n]wv, in order to obtain the Chung-Lu model CLn(w).

This intuition will be helpful in what follows. Unfortunately, when (wv)v∈[n] arei.i.d., and xv = wv/`n, then (xv)v∈[n] are not i.i.d., so it is not obvious how totransfer between the settings. As a result, we stick to the setting in (9.3.11).

From now on, we will assume that there are r types of vertices, each of whichoccurs roughly (or precisely, depending on the setting) equally often. We alsoassume that κσ,σ′ takes on two values: κi,i = a and κi,j = b when i 6= j. Thisleads us to a very similar model as the stochastic blockmodel, except that theweight structure (xv)v∈[n] adds some additional inhomogeneity in the vertex roles,where vertices with high weights generally have larger degrees than those withsmall weights.

Exercise 9.25–9.26 study the degree structure of the degree-corrected block-model, while Exercise 9.27 investigates when a giant exists in this model.

Community detection in degree-corrected stochastic blockmodels

We now come to the main result of this section, which involves the solvability ofthe estimation in degree-corrected stochastic blockmodels:

Theorem 9.10 (Degree-corrected stochastic blockmodel threshold) Take n tobe even. Consider a stochastic blockmodel of two types, each having n/2 vertices,


where the edge probabilities are given by (9.3.11), with κi,i = a, and κi,j = b fori 6= j. The community detection problem is unsolvable when

(a− b)2E[X2]

2(a+ b)< 1. (9.3.12)

Assume further that there exists β > 8 such that

P(X > x) ≤ 1

xβ. (9.3.13)

Then, the community detection problem is solvable as in Definition 9.8 when

(a− b)2E[X2]

2(a+ b)> 1. (9.3.14)

The impossibility result in (9.3.12) in Theorem 9.10 is extended to all r ≥2 under the condition that (a − b)2E[X2] < r(a + b). The crux in the proofis to show that, for two vertices o1, o2 chosen uniformly at random, and withGn = ([n], E(Gn)) the realization of the graph of the degree-corrected stochasticblockmodel,

P(σ(o1) = s | σ(o2), Gn)P−→ 1

r, (9.3.15)

for every s ∈ [r]. Thus, the type of o2 gives one, asymptotically, no informationabout the type of o1. This would make detection quite hard, and explains (9.3.12).The proof for the achievability result follows a spectral argument similar to thatof the ordinary stochastic blockmodel discussed in Section 9.3.1, and we refrainfrom discussing it further here. We can expect the power-law bound in (9.3.13)to be too strict and the results to extend under slightly milder assumptions.

9.3.3 Configuration models with global communities

The stochastic blockmodel is an adaptation of the Erdos-Renyi random graphto incorporate global communities, and the degree-corrected stochastic block-model is an adaptation of the Chung-Lu model. In a similar way, one can adaptthe configuration model to incorporate global communities. Surprisingly, this hasnot attracted substantial attention in the literature, which is why this sectionis relatively short. Let us discuss the obvious setting though. Let every vertexv ∈ [n] have a type σ(v) ∈ [r].

For a vertex v ∈ [n] and a type s ∈ [r], we then let d(s)

v denote the number ofhalf-edges to be connected from vertex v to vertices of type s. We first specify thestructure of the graph between vertices of the same type. Let

∑v : σ(v)=s d

(s)

v bethe total number of half-edges between vertices of type s, and assume that thisnumber is even. We assume that the graph on v : σ(v) = s is a configurationmodel with ns = #v ∈ [n] : σ(v) = s vertices and degrees (d(s)

v )v : σ(v)=s. Thisspecifies the degrees within communities.

For the structure of the edges between vertices of different types, we recall

382 Related Models

what the bipartite configuration model is. We suppose that we have vertices oftwo types, say type 1 and type 2, and we have n1 and n2 vertices of these twotypes respectively. Let the type of a vertex v again be given by σ(v) ∈ 1, 2.Vertices of type 1 have degrees (dv)v : σ(v)=1 and vertices of type 2 have degrees(dv)v : σ(v)=2. We assume that ∑

v : σ(v)=1

dv =∑

v : σ(v)=1

dv, (9.3.16)

and we pair the half-edges incident to vertices of type 1 uniformly at random tothose incident to vertices of type 2. As a result, there are only edges betweenvertices of types 1 and 2, and the total number of edges is given in (9.3.16).

Using the above definition, we let the edges between vertices of types s, t begiven by a bipartite configuration model between the vertices in v : σ(v) = sand v : σ(v) = t, where the former have degrees (d(t)

v )v : σ(v)=s and the latterdegrees (d(s)

v )v : σ(v)=t. To make the construction feasible, we assume that∑v : σ(v)=s

d(t)

v =∑

v : σ(v)=t

d(s)

v . (9.3.17)

Special cases of this model are the configuration model for which r = 1, and thebipartite configuration model itself, for which r = 2 and d(t)

v for every v withσ(v) = t.

Let µn(s) denote the number of vertices of type s. We again assume, as in(9.3.1) that the type distribution µn(s) = ns/n satisfies that for all s ∈ [r],

limn→∞

ns/n = µs. (9.3.18)

Also, in order to describe the local and global properties of the configurationmodels with global communities, one should make assumptions similar to thosefor the original configuration model in Condition 1.5, but now for the matrix ofdegree distributions. For example, it is natural to assume that, for all s ∈ [r], thejoint distribution function of all the type degrees to satisfy

F (s)

n (x1, . . . , xr) =1

ns

∑v : σ(v)=s

1d(1)v ≤x1,...,d

(r)v ≤xr → F (s)(x1, . . . , xr), (9.3.19)

for all x1, . . . , xr ∈ R and some limiting joint distribution F (s) : Rr → [0, 1].Further, it is natural to assume that an adaptation of Condition 1.5(a) holds forall these degrees, such as that for all s, t ∈ [r],

1

ns

∑v : σ(v)=s

d(t)

v → E[D(s,t)]. (9.3.20)

While the configuration model, as well as its bipartite version, have attractedsome attention, the above extension has so far remained unexplored. Exercises9.28–9.30 informally investigate some of its properties.


9.3.4 Preferential attachment models with global communities

We next investigate preferential attachment models with global communities.Again assume that there are r communities. The preferential attachment modelswith global communities is naturally a dynamic random graph model, where nowevery vertex n has a type σ(n) ∈ [r]. The graph is considered to be directed whereall edges go from young to old. Each vertex comes in with out-degree equal to m,as in the normal preferential attachment model. We will see that the extensionstudied in the literature is most similar to (PA(m,0)

n (b))n≥0, but an extension to(PA(m,δ)

n (b))n≥0 will be discussed as well later on. In a similar way, extensions to(PA(m,δ)

n (a))n≥0 can be formulated.

We start with an initial graph at time n0 given by Gn0, in which we assume

that every vertex has out-degree m, and every vertex has a label σ(v) for allv ∈ [n0]. The graph then evolves as follows. At time n+ 1, let vertex n+ 1 havea type σ(n+ 1) that is chosen in an i.i.d. way from [r], where

µs = P(σ(n+ 1) = s) (9.3.21)

is the type distribution. We consider m edges incident to vertex n + 1 to behalf-edges, similarly to how the configuration model is constructed. We give allthe out-half-edges incident to a vertex v the label σ(v). Further, let the matrixκ : [r]×[r]→ [0,∞) be the affinity matrix. If σ(n+1) = s, then give each half-edgeof label t a weight κs,t. Choose a half-edge according to these weights, meaningthat a half-edge x with label t has a probability proportional to κs,t to be chosenas the pair of any of the m half-edges incident to vertex n+ 1. We do this for allm half-edges incident to vertex n+ 1 independently. This creates the graph Gn+1

at time n+ 1. The dynamics is iterated indefinitely.

Degree distribution of PAMs with global communities

We start by investigating the degree distribution in the preferential attachmentmodels with global communities. We start by introducing some notation.

Let ηn(s) denote the fraction of half-edges with label s, and let ηn = (ηn(s))s∈[r]

denote the empirical distribution of the labels of half-edges. For a probabilitydistribution η on [r], and a type s ∈ [r], let

hs(η) = µs +∑t∈[r]

µtκs,tηt∑

t′∈[r] κs,t′ηt′− 2ηs. (9.3.22)

Then, the half-edge label distribution satisfies

ηn(s)a.s.−→ η∗(s), where hs(η

∗) = 0 ∀s ∈ [r]. (9.3.23)

The probability distribution η∗ can be shown to be the unique probability distri-bution that solves hs(η

∗) = 0 for all s ∈ [r]. We next define the crucial parametersin the model.

384 Related Models

For s, t ∈ [r], let

θ∗s,t =κs,t∑

t′∈[r] κs,t′η∗(t′)

, (9.3.24)

and write

θ∗s =∑t∈[r]

µtθ∗s,t. (9.3.25)

For a vertex v ∈ [n], we let σ(v) ∈ [r] be its type, and ns = #v : σ(v) =s denote the type counts. We next study the degree distribution in the abovepreferential attachment models with global communities. For s ∈ [r], define

P (s)

k =1

n

∑v∈[n]

1Dv(n)=k,σ(v)=s (9.3.26)

to be the per-type degree distribution in the model, where Dv(n) denotes thedegree of vertex v at time n. The main result on the degree distribution is asfollows:

Theorem 9.11 (Degrees in preferential attachment models with global commu-nities) In the above preferential attachment models with global communities, forevery s ∈ [r],

P (s)

k (n)a.s.−→ pk(θ

∗s), (9.3.27)

where θ∗s is defined in (9.3.25), and where, for θ > 0,

pk(θ) =Γ(m+ 1/θ)

θΓ(m)

Γ(k)

Γ(k + 1 + 1/θ). (9.3.28)

Exercise 9.31 shows that the limiting distribution in (9.3.28) is indeed a proba-bility distribution. Theorem 9.11 shows that the degree distribution has a power-law tail, as can be expected from the fact that the preferential attachment mech-anism is profoundly present. Moreover, (9.3.27) also shows that the degree ofvertices of type s satisfies a power-law with exponent that depends sensitively onthe type through the key parameters (θ∗s)s∈[r]. Exercise 9.32 shows that the globaldegree distribution also converges, as can be expected by the convergence of theper-type degree distribution. Further, Exercise 9.33 shows that the global degreedistribution has a power-law tail with exponent τ = 1 + 1/maxs∈[s] θ

∗s , provided

that µs > 0 for all s ∈ [r].

Related properties of the preferential attachment models with global communi-ties seem not to have been investigated, so we move to the community detectionproblem.

9.4 Random graphs with community structure: local communities 385

Community detection in PAMs with global communities

It is important to discuss what we assume to be known. We think of the graph asbeing directed, edges being directed from young to old. Further, we assume that mand the probability distribution (µs)s∈[r] are known. Finally, and probably mostimportantly, we assume that the labels or age of the vertices in the graph areknown. Some of these parameters can be estimated (m being the easiest one).While for stochastic blockmodels, it is clear that the case where the communitiesare all equally large and have the same inter- and intra-community edge prob-abilities, this is not obvious here. However, to mimic the stochastic blockmodelsetting, you can take the case where µs = 1/r in mind as a key example. Also, thesetting where κs,s = a for all s ∈ [r] and κs,t = b for all s, t ∈ [r] with s 6= t is par-ticularly interesting, where, to model community structure, a > b is natural. Byscaling invariance, we may assume that b = 1. In this case, η∗s = 1/r for all s ∈ [r],θ∗s,t = ar/[2(a + r − 1)] for all s, t ∈ [r] with s 6= t, while θ∗s,s = a/[2(a + r − 1)].Also, θ∗s = 1

2for all s ∈ [r]. This case is probably the most difficult.

the aim is to estimate the type σ(v) for all v ∈ [n], based on the above in-formation. This can be done by several algorithms. One algorithm performs theestimation of the label of v on the basis of the neighbors of v. A second algorithmdoes it on the basis of the degree of v at time n. Let Errn denote the fraction oferrors in the above algorithms. In the above setting, it can be shown that

ErrnP−→ Err, (9.3.29)

for some limiting constant Err depending on the algorithm. The precise form ofErr is known, but difficult to obtain rigorously as it relies on a continuous-timebranching process approximation of the graph evolution. In particular, it is notobvious that, in the setting where µs = 1/r, κs,s = a > 1 for all s ∈ [r] andκs,t = 1 for all s, t ∈ [r], it is unclear whether Err < (r − 1)/r.

Many more detailed result are proved, for example that the probability that avertex label of vertex t is estimated wrongly converges uniformly for all t ∈ [n] \[δn]. Also, there exists an algorithm that estimates σ(v) correctly whp providedthat v = o(n). We refrain from discussing this further.

9.4 Random graphs with community structure: local communities

In the previous section, we investigated settings where the models have a finitenumber of communities, making the communities global. This setting is realisticwhen we would like to partition a network of choice into a finite number of parts,for example corresponding to the main scientific fields in citation or collaborationnetworks, or the continents in the Internet. However, in many other settings this isnot realistic. Indeed, most communities of social networks correspond to smallerentities, such as school classes, families, sports teams, etc. In most real-worldsettings, it is not even clear what communities look like. As a result, community

386 Related Models

detection has become an art. Needless to say, such techniques are highly relevant,to find and use the available structure in real-world networks.

The topic is relevant, since most models (including the models with globalcommunity structure from Section 9.4) have rather low clustering. For example,consider a general inhomogeneous random graph IRGn(κn) with kernel κn. As-sume that κn(x, y) ≤ n. Then, we can compute that the expected number oftriangles in an IRGn(κn) is equal to

E[# triangles in IRGn(κ)] =1

6

∑i,j,k∈[n]

κn(xi, xj)κn(xj, xk)κn(xk, xi). (9.4.1)

Under relatively weak conditions on the kernel κn, it follows that

E[# triangles in IRGn(κn)]→ 16

∫S3

κ(x1, x2)κ(x2, x3)κ(x3, x1)µ(dx1)µ(dx2)µ(dx3).

(9.4.2)Therefore, the clustering coefficient converges to zero as 1/n. In many real-worldnetworks, particularly in social networks, the clustering coefficient is strictly pos-itive.

In this section, we discuss random graph models with a community structurein which most communities are quite small. Indeed, in most cases it is assumedthat the average community size is bounded. However, since the community sizescan also be quite large, we might call them mesoscopic rather than microscopic.See Figures 9.7 and 9.8 for some empirical data on real-world networks. Figure9.7 shows that there is enormous variability in the tail probabilities of the degreedistribution, community-size distribution and inter-community degrees in real-world networks. Figure 9.8 shows that the edge-densities of communities generallydecreases with their sizes. Here, the edge density of a community of size s is givenby 2ein/[s(s−1)], where ein is the number of edges inside the community. Here, thecommunities were extracted (or detected) using the so-called Louvain method.See Stegehuis et al. (2016b) for details.

We conclude that we need to be flexible in our community structure to be ableto realistically model the community structure in real-world networks. Below, wediscuss several models that attempt to do so. We start in Section 9.4.1 by dis-cussing inhomogeneous random graphs with community structures. We continuein Section 9.4.2 by discussing the hierarchical configuration model as well as someclose cousins, followed by a discussion of random intersection graphs in Section9.4.3 and exponential random graphs in Section 9.4.4.

9.4.1 Inhomogeneous random graphs with communities

In this section, we discuss a model similar to the inhomogeneous random graphIRGn(κ) that incorporates clustering. The idea behind this model is that insteadof only adding edges independently, we can also add other graphs on r vertices inan independent way. For example, we could study a graph where each pair of ver-tices is independently connected with probability λ/n, as for ERn(λ/n), but also


100 101 102 10310− 6

10− 5

10− 4

10− 3

10− 2

10− 1

100

x

P(X

>x)

degrees

k

100 101 102 103 104 10510− 6

10− 5

10− 4

10− 3

10− 2

10− 1

100

x

P(X

>x)

degrees

k

100 101 102 103 10410− 6

10− 5

10− 4

10− 3

10− 2

10− 1

100

x

P(X

>x)

degrees

k

100 101 102 103 10410− 6

10− 5

10− 4

10− 3

10− 2

10− 1

100

xP(X

>x)

degrees

k

Figure 9.7 Tail probabilities of the degrees, community sizes s and theinter-community degrees k in real-world networks. a) Amazon co-purchasingnetwork, b) Gowalla social network, c) English word relations, d) Googleweb graph. Pictures taken from Stegehuis et al. (2016b).

each collection of triples forms a triangle with probability µ/n2, independentlyfor all triplets and independently of the status of the edges. Here the exponentn−2 is chosen to as to make the expected number of triangles containing a vertexbounded. See Exercise 9.34 for the clustering in such random graphs. In socialnetworks, also complete graphs of size four, five, etc., are present more oftenthan in usual random graph. Therefore, we also wish to add those independently.We start by introducing the model, followed by an examination of some of itsproperties.

Model introduction

In order to formulate this general version of the model, we start by introducingsome notation. We repeatedly make use of similar notation as in Chapter 3.

Let F consist of one representative of each isomorphism class of finite connectedgraphs, chosen so that if F ∈ F has r vertices then V (F ) = [r] = 1, 2, . . . , r.Simple examples of such F are the complete graphs on r vertices, but also otherexamples are possible. Recall that S denotes the type space. Given F ∈ F withr vertices, let κF be a measurable function from Sr to [0,∞). The function κF iscalled the kernel corresponding to F . A sequence κ = (κF )F∈F is a kernel family.

Let κ be a kernel family and n an integer. We define a random graph IRGn(κ)with vertex set [n]. First let x1, x2, . . . , xn ∈ S be i.i.d. (independent and iden-tically distributed) with the distribution µ. Given x = (x1, . . . , xn), constructIRGn(κ) as follows, starting with the empty graph. For each r and each F ∈ F

388 Related Models

101 102 103 104

10−3

10−2

10−1

s

2ein/(s(s−

1))

101 102 103 104

10−3

10−2

10−1

s

2ein/(s(s−

1))

101 102 103 104

10−3

10−2

10−1

100

s

2ein/(s(s−

1))

101 102 103 104

10−3

10−2

10−1

100

s

2ein/(s(s−

1))

Figure 9.8 The relation between the denseness of a community2ein/(s

2 − s) and the community size s can be approximated by a power law.a) Amazon co-purchasing network, b) Gowalla social network, c) Englishword relations, d) Google web graph. Pictures taken from Stegehuis et al.(2016b).

with |F | = r, and for every r-tuple of distinct vertices (v1, . . . , vr) ∈ [n]r, adda copy of F on the vertices v1, . . . , vr (with vertex i of F mapped to vi) withprobability

p(v1, . . . , vr;F ) =(κF (xv1

, . . . , xvr)

nr−1

)∧ 1, (9.4.3)

all these choices being independent. We shall often call the added copies of thevarious F that together form IRGn(κ) atoms as they may be viewed as indivisiblebuilding blocks. Sometimes we refer to them as small graphs, although there isin general no bound on their sizes.

The reason for dividing by nr−1 in (9.4.3) is that we wish to consider sparsegraphs; indeed, our main interest is the case when IRGn(κ) has O(n) edges. Asit turns out, we can be slightly more general; however, when κF is integrable(which we shall always assume), the expected number of added copies of eachgraph F is O(n). Note that all incompletely specified integrals are with respectto the appropriate r-fold product measure µr on Sr.

There are several plausible choices for the normalization in (9.4.3). The one wehave chosen means that if κF = c is constant, then (asymptotically) there are onaverage cn copies of F in total, and each vertex is on average in rc copies of F .An alternative is to divide the expression in (9.4.3) by r; then (asymptotically)


each vertex would on average be in c copies of F . Another alternative, naturalwhen adding cliques only but less so in the general case, would be to divide by r!;this is equivalent to considering unordered sets of r vertices instead of ordered r-tuples. When there is only one kernel, corresponding to adding edges, this wouldcorrespond to the normalization used in Bollobas et al. (2007), and in particularto that of the classical model ERn(λ/n); the normalization we use here differsfrom this by a factor of 2.

In the special case where all κF are zero apart from κK2, the kernel correspond-

ing to an edge, we recover (essentially) a special case of the model of Bollobaset al. (2007); we call this the edge-only case, since we add only edges, not largergraphs. We write κ2 for κK2

. Note that in the edge-only case, given x, two verticesi and j are joined with probability

κ2(xi, xj) + κ2(xj, xj)

n+O

((κ2(xi, xj) + κ2(xj, xi))

2

n2

). (9.4.4)

The correction term will never matter, so we may as well replace κ2 by its sym-metrized version. In fact, we shall always assume that κF is invariant underpermutations of the vertices of the graph F .

For any kernel family κ, let κe be the corresponding edge kernel, defined by

κe(x, y) =∑F

∑ij∈E(F )

∫SV (F )\i,j

κF (x1, . . . , xi−1, x, xi+1, . . . , xj−1, y, xj+1, . . . , x|F |),

(9.4.5)where the second sum runs over all 2E(F ) ordered pairs (i, j) with ij ∈ E(F ),and we integrate over all variables apart from x and y. Note that the sum neednot always converge; since every term is positive this causes no problems: wesimply allow κe(x, y) =∞ for some x, y. Given xi and xj, the probability that iand j are joined in IRGn(κ) is at most κe(xi, xj)/n + O(1/n2). In other words,κe captures the edge probabilities in IRGn(κ), but not the correlations.

The number of edges

Before proceeding to deeper properties, let us note that the expected number ofadded copies of F is (1 + O(n−1))n

∫S|F | κF . Unsurprisingly, the actual number

turns out to be concentrated about this mean. Let

ξ(κ) =∑F∈F

E(F )

∫S|F |

κF =1

2

∫S2

κe ≤ ∞ (9.4.6)

be the asymptotic edge density of κ. Since every copy of F contributes E(F ) edges,the following theorem is almost obvious, provided that we can ignore overlappingedges:

Theorem 9.12 (Edge density in IRGn(κ)) As n→∞,

E[E(IRGn(κ))/n]→ ξ(κ) ≤ ∞. (9.4.7)

Moreover, ξ(κ) <∞, then

E(IRGn(κ))/nP−→ ξ(κ). (9.4.8)

390 Related Models

In other words, Theorem 9.12 states that if ξ(κ) < ∞ then E(IRGn(κ)) =ξ(κ)n+oP(n), and if ξ(κ) =∞ then E(IRGn(κ)) > Cn whp for every constant C.We conclude that the model is sparse when ξ(κ). This is certainly true when κF isuniformly bounded, but may also be true more generally under some integrabilityassumptions.

The giant

We next consider the emergence of the giant component. When studying the com-ponent structure of IRGn(κ), the model can be simplified somewhat. Recallingthat the atoms F ∈ F are connected by definition, when we add an atom F to agraph G, the effect on the component structure is simply to unite all componentsof G that meet the vertex set of F , so only the vertex set of F matters, notits graph structure. We say that κ is a clique kernel family if the only non-zerokernels are those corresponding to complete graphs; the corresponding randomgraph model IRGn(κ) is a clique model. For questions corresponding to compo-nent structure, it suffices to study clique models. For clique kernels we write κrfor κKr ; as above, we always assume that κr is symmetric, here meaning invariantunder all permutations of the coordinates of Sr. Given a general kernel familyκ, the corresponding (symmetrized) clique kernel family is given by κ = (κr)r≥2,with

κr(x1, . . . , xr) =∑

F∈F : |F |=r

1

r!

∑π∈Gr

κF (xπ(1), . . . , xπ(r)), (9.4.9)

where Gr denotes the symmetric group of permutations of [r]. (This is consistentwith our notation κ2 = κKr .) When considering the size (meaning number ofvertices) of the giant component in IRGn(κ), we may always replace κ by thecorresponding clique kernel family.

In our analysis we also consider the linear operator Tκe defined by

Tκe(f)(x) =

∫Sκe(x, y)f(y)dµ(y), (9.4.10)

where κe is defined by (9.4.5). We need to impose some sort of integrabilitycondition on our kernel family:

Definition 9.13 (i) A kernel family κ = (κF )F∈F is integrable if∫κ =

∑F∈F

|F |∫S|F |

κF <∞. (9.4.11)

This means that the expected number of atoms containing a given vertex isbounded.

(ii) A kernel family κ = (κF )F∈F is edge integrable if∑F∈F

E(F )

∫S|F |

κF <∞; (9.4.12)

equivalently, ξ(κ) < ∞. This means that the expected number of edges in


IRGn(κ) is O(n), see Theorem 9.12, and thus the expected degree of a givenvertex is bounded.

Note that a clique kernel (κr) is integrable if and only if∑

r≥2 r∫Sr κr < ∞,

and edge integrable if and only if∑

r≥2 r(r − 1)∫Sr κr <∞.

The main results concerning the phase transition on IRGn(κ) is that if κ is anintegrable kernel family satisfying a certain extra assumption, then the normal-ized size of the giant component in IRGn(κ) is simply ζ(κ) + oP(1). The extraassumption is an irreducibility assumption similar to Definition 3.3(ii) that es-sentially guarantees that the graph does not split into two pieces: we say that asymmetric kernel κe : S2 → [0,∞) is reducible if

∃A ⊂ S with 0 < µ(A) < 1 such that κe = 0 a.e. on A× (S \A);

otherwise κe is irreducible. Thus, κe is irreducible if

A ⊆ S and κe = 0 a.e. on A× (S \A) implies µ(A) = 0 or µ(S \A) = 0.

We are now ready to formulate the main result in this section involving thephase transition in IRGn(κ). Recall that |Cmax| denotes the number of vertices inthe largest connected component of the graph under consideration, and |C(2)| thenumber of vertices in its second largest component.

Theorem 9.14 (The giant in clustered inhomogeneous random graphs) Letκ′ = (κ′F )F∈F be an irreducible, integrable kernel family, and let κ = (κr)r≥2 bethe corresponding clique kernel, given by (9.4.9). Then, there exists a ζ(κ) ∈ [0, 1)such that

|Cmax| = ζ(κ)n+ oP(n), (9.4.13)

and |C(2)| = oP(n).

Theorem 9.14 is proved by showing that (in the clique kernel case) the branch-ing process that captures the ‘local structure’ of IRGn(κ) is a good approximation,and indeed describes the existence of the giant. Along the way, it is also provedthat, in the clique kernel setting, IRGn(κ) converges in the local weak sense to this‘branching process’. This is however not a branching process in the usual sense,but instead a branching process describes the connections between the cliques.

For Theorem 9.14 to be useful we would like to know something about ζ(κ),which can be calculated from x 7→ ζκ(x), which is in turn the largest solution tothe functional equation

f(x) = 1− e−Sκ(f)(x). (9.4.14)

We can think of ζκ(x) as the probability that a vertex of type x ∈ S has a ‘large’connected component. The question when ζ(κ) > 0 is settled in the followingtheorem:

Theorem 9.15 (Condition for existence giant component) Let κ be an inte-grable clique kernel. Then, ζ(κ) > 0 if and only if ‖Tκe‖ > 1. Furthermore, if κ

392 Related Models

is irreducible and ‖Tκe‖ > 1, then ζκ(x) is the unique non-zero solution to thefunctional equation (9.4.14), and ζκ(x) > 0 holds for a.e. x.

In general, ‖Tκe‖ may be rather hard to calculate. When we suppose thateach κr is constant, however, this can be done. Indeed, say that κr = cr. Thenκe(x, y) =

∑r r(r − 1)cr = 2ξ(κ) for all x and y, so that

‖Tκe‖ = 2ξ(κ). (9.4.15)

This is perhaps surprising: it tells us that for such uniform clique kernels, thecritical point where a giant component emerges is determined only by the totalnumber of edges added; it does not matter what size cliques they lie in, eventhough, for example, the third edge in every triangle is ‘wasted’. This turns outnot to be true for arbitrary kernel families, where, rather each atom needs to bereplaced by a clique.

9.4.2 Configuration models with community structure

In this section, we investigate several models that are related to configurationmodel, yet have a pronounced community structure. We start by discussing thehierarchical configuration model and its properties. We then discuss a particu-lar version that goes under the name of the household model, and we close bydiscussing a model where triangles are explicitly added to the model.

Hierarchical configuration model: model introduction

Also the configuration model has low clustering, which often makes it inappro-priate in applied contexts. A possible solution to overcome this low clustering isby introducing a community or household structure. Consider the configurationmodel CMN(d) with a degree sequence d = (di)i∈[N ] satisfying Condition 1.5(a)-(b), now with N taking the role of n. Now we replace each of the vertices by asmall graph. Thus, vertex i is replaced by a local graph Gi. We assign each ofthe di half-edges incident to vertex i to a vertex in Gi in an arbitrary way. Thus,vertex i is replaced by the pair of the community graph Gi = (V (Gi), E(Gi)) andthe inter-community degrees d(b) = (d(b)

u )u∈V (Gi) satisfying that∑

u∈V (Gi)d(b)

u = di.Naturally, the size of the graph becomes n =

∑i∈[n] |V (Gi)|.

As a result, we obtain a graph with two levels of hierarchy, whose local struc-ture is described by the local graphs (Gi)i∈[N ], whereas its global structure isdescribed by the configuration model CMN(d). This model is called the hier-archical configuration model. A natural assumption is that the degree sequenced = (di)i∈[N ] satisfies Condition 1.5(a)-(b), while the empirical distribution of thegraphs satisfies that, as N →∞,

µn(H, ~d) =1

N

∑i∈[N ]

1Gi=H,(d(b)u )u∈Vi=

~d → µ(H, ~d), (9.4.16)

for some probability distribution on graphs with integer marks associated to thevertices. We assume that µn(H, ~d) = 0 for all H that are disconnected. Indeed,


we think of the graphs (Gi)i∈[N ] as describing the community structure of thegraph, so it makes sense to assume that (Gi)i∈[N ] are connected. In particular,(9.4.16) shows that a typical community has bounded size. We will often alsomake assumptions on the average size of the community of a random vertex.For this, it will be necessary to impose that, with µn(H) =

∑~d µn(H, ~d) and

µ(H) =∑

~d µ(H, ~d) the community distribution,∑H

|V (H)|µn(H) =1

n

∑i∈[N ]

|V (Gi)| →∑H

|V (H)|µ(H) <∞, (9.4.17)

for every connected graph H of size |V (H)| and degree vector ~d = (dh)h∈|V (H)|.Equation (9.4.17) indicates that the community of a random individual has

a tight size, since the community size of a random vertex has the size-biasedcommunity distribution (see Exercise 9.36).

The degree structure of the hierarchical configuration model is built into themodel, and is thus of less interest. We next discuss the giant and distances in thehierarchical configuration model.

The giant in the hierarchical configuration model

We now investigate the existence and size of the giant component in the hierar-chical configuration model. Let Cmax be the largest connected component in thehierarchical configuration model, and let C(2) denote the second largest cluster(breaking ties arbitrarily when needed). The main result concerning the size ofthe giant is the following theorem:

Theorem 9.16 (Giant in the hierarchical configuration model) Assume thatthe inter-community degree sequence d = (di)i∈[N ] satisfies Conditions 1.5(a)-(b),while the communities satisfy (9.4.16) and (9.4.17). Then, there exists ζ ∈ [0, 1]such that

1

n|Cmax| P−→ ζ,

1

n|C(2)| P−→ 0. (9.4.18)

Let D denote the asymptotic degree distribution d = (di)i∈[N ] in Conditions1.5(a)-(b), and write ν = E[D(D − 1)]/E[D]. Then, ζ > 0 precisely when ν > 1.

Since the communities (Gi)i∈[N ] are connected, the sizes of the clusters in thehierarchical configuration are closely related to those in CMN(d). Indeed, forv ∈ [n], let iv denote the vertex for which v ∈ V (Giv). Then,

|C (v)| =∑

i∈C ′(iv)

|V (Gi)|, (9.4.19)

where C ′(i) denotes the connected component of i in CMN(d). This allows oneto move back and forth between the hierarchical configuration model and itscorresponding configuration model CMN(d) that describes the inter-communityconnections.

394 Related Models

It also allows us to identify the limit ζ. Let ξ ∈ [0, 1] be the extinction proba-bility of the local weak limit of CMN(d) of a vertex of degree 1, so that a vertexof degree d survives with probability 1− ξk. Then,

ζ =∑H

∑~d

|V (H)|µ(H)[1− ξd], (9.4.20)

where d =∑

h∈V (H) dh. Further, ξ = 1 precisely when ν ≤ 1, see e.g., Theorem4.4. This explains the result in Theorem 9.16. In Exercise 9.38, you are asked tofill in the details.

Before moving to graph distances, we discuss an example of the hierarchicalconfiguration model that has attracted attention under the same configurationmodel with household structure.

The configuration model with household structure

In the configuration model with household structure, each Gi is a complete graphof size |V (Gi)|, while each inter-community degree equals 1, i.e., d(b)

u = 1 for allu ∈ V (Gi). As a result, for v ∈ [n], its degree is equal to |V (Guv)|, where werecall that uv is such that v ∈ V (Guv). This is a rather interesting example,where every vertex is in a ‘household’, and every household member has preciselyone connection to the outside, which connects it to another household. In thiscase, the degree distribution is equal to

Fn(x) =1

n

∑v∈[n]

1|V (Giv )|≤x =1

n

∑i∈[N ]

1|V (Gi)|≤x, (9.4.21)

where we recall that iv is such that v ∈ Giv . As a result, the degree distributionin the household model is the size-biased degree in the configuration model thatdescribes its inter-community structure. In particular, this implies that if thelimiting degree distribution D in CMN(d) is a power-law with exponent τ ′, thenthat in the household model is a power-law with exponent τ = τ − 1. This issometimes called a power-law shift, and is clearly visible in Figure 9.9.

Graph distances in the hierarchical configuration model

We next turn ourselves to graph distances in the hierarchical configuration model.Again, a link can be made to the distances in the configuration model, particularlywhen the diameters of the community graphs (Gi)i∈[N ] are uniformly bounded.Indeed, in this case, one can expect the graph distances in the hierarchical con-figuration model to be of the same order of magnitude as in the configurationmodel CMN(d).

This link is the easiest to formulate for the household model. Indeed, therethe diameter of the community graphs is 1 since all community graphs are com-plete graphs. Denote the hierarchical configuration model by HCMn(G). Then,unless u = v or the half-edge incident to u being paired to that incident to v,distHCMn(G)(u, v) = 2distCMN (d)(iu, iv) − 1, where we recall that iv is such thatv ∈ V (Giv). Thus, distances in HCMn(G) are asymptotically twice as large as


100 101 102 10310−3

10−2

10−1

100

x

P(X>x

)

degreesk

Figure 9.9 The degree distribution of a household model follows a powerlaw with a smaller exponent than the community size distribution andoutside degree distribution

those in CMN(d). The reason is that paths in the household model alternativebetween intra-community edges and inter-community edges. This is because theinter-community degrees are all equal to 1, so there is no way to jump to a vertexusing an inter-community edge and leave through one again. This is in generaldifferent.

Configuration model with clustering

We close this section on adaptations of the configuration model with local com-munities by introducing a model that has not attracted a lot of attention yet inthe mathematical community.

The low clustering of CMn(d) can be resolved by introducing households as de-scribed above. Alternatively, and in the spirit of clustered inhomogeneous randomgraphs as described in Section 9.4.1, we can also introduce clustering directly. Inthe configuration model with clustering, we assign two types of ‘degrees’ to a ver-tex i ∈ [n]. We let d(si)

i denote the number of simple half-edges incident to vertexi, and we let d(tr)

i denote the number of triangles that vertex i is part of. In thisterminology, the degree dv of a vertex is equal to dv = d(si)

i + 2d(tr)

i . We can thensay that there are d(si)

i half-edges incident to vertex i, and d(tr)

i third-triangles.The graph is built by (a) recursively choosing two half-edges uniformly at ran-

dom without replacement, and pairing them into edges (as for CMn(d)); and (b)choosing triples of third-triangles uniformly at random and without replacement,and drawing edges between the three vertices incident to the third-triangles thatare chosen.

Let (D(si)

n , D(tr)

n ) denote the number of simple edges and triangles incident to

396 Related Models

a uniform vertex in [n], and assume that (D(si)

n , D(tr)

n )d−→ (D(si), D(tr)) for some

limiting distribution (D(si), D(tr)). Newman (2009) performs a generating functionanalysis of when a giant component is expected to exist. The criterion Newmanfinds is that(E[(D(si))2]

E[D(si)]− 2)(2E[(D(tr))2]

E[D(tr)]− 3)<

2E[D(si)D(tr)]

E[D(si)]E[D(tr)]. (9.4.22)

When D(tr) = 0 a.s., so that there are no triangles, this reduces to

E[(D(si))2]

E[D(si)]− 2 > 0, (9.4.23)

which is equivalent to ν = E[D(si)(D(si) − 1)]/E[D(si)] > 1.It would be of interest to analyze this model mathematically. While the extra

triangles do created extra clustering in the graph, in that the graph is no longerlocally treelike, it is less clear what the community structure of the graph is. Ofcourse, the above setting can be generalized to arbitrary cliques and possible othercommunity structure, but this will make the mathematical analysis substantiallymore involved.

9.4.3 Random intersection graphs

In most of the above models, the local communities that vertices are part ofpartition the vertex space. However, in most real-world applications, particularlyin social networks, the vertices are not just part of one community, but oftenof several. We now present a model, the random intersection graph, where thegroup memberships are rather general. In a random intersection graph, verticesare members of groups. The group memberships arise in a random way. Oncethe group memberships are chosen, the random intersection graph is constructedby giving an edge to two vertices precisely when they are both members of thesame group. Formally, let V (Gn) denote the vertex set, and A(Gn) the collectionof groups. Let M(Gn) = (v, a) : v is in group a ⊆ V (Gn) × A(Gn) denote thegroup memberships. Then, the edge set of the random intersection graph withthese group memberships is

E(Gn) = u, v : (u, a), (v, a) ∈M(Gn) for some a ∈ A(Gn). (9.4.24)

Thus, the random intersection graph is a deterministic function of the groupmemberships. In turn, the groups give rise to a community structure in the re-sulting network. Since vertices can be in several groups, the groups are no longera partition. It is even possible that pairs of vertices are both in several groups,even though we will see that this is rare.

The group memberships occur through some random process. There are manypossibilities for this. We will discuss several different versions of the model, asthey have been introduced in the literature, and then will focus on one particulartype of model to state the main results. In general, we do wish that our modelsare sparse. Suppose that the probability that vertex v is part of group a equals


pia, and that group a has on average ma group elements. Then the average degreeof vertex i equals

E[Dv] =∑

a∈A(Gn)

pva(ma − 1), (9.4.25)

which we aim to keep bounded or converge. We now discuss several differentchoices.

Random intersection graphs with independent group-memberships

The most studied model is when there are n vertices, m = mn groups, often withm = βnα for some α > 0, and each vertex is independently connected to eachgroup with probability pn. In this case, pva = pn and ma = npn, so that (9.4.25)turns into

E[Dv] ≈ np2nm = βn1+αp2

n, (9.4.26)

and choosing p = γn−(1+α)/2 yields expected degree βγ2.A more flexible version is obtained by giving a weight wv to each of the vertices,

for example in an i.i.d. way, and then taking

pva = (γwvpn) ∧ 1, (9.4.27)

and making all the edges between vertices and groups conditionally independentgiven the weights (wi)i∈[n]. In the theorem below, we assume that (wi)i∈[n] is asequence of i.i.d. random variables with finite mean:

Theorem 9.17 (Degrees in random intersection graph with i.i.d. vertex weights)Consider the above random intersection graph, where the number of groups equalm = βnα, the vertices have i.i.d. weights (wv)v∈[n] with law W ∼ F and finitemean, and where the edge probability equals pva = (γwvn

−(1+α)/2) ∧ 1. Then,

(a) DvP−→ 0 when α < 1;

(b) Dvd−→ ∑X

i=1 Yi when α = 1, where (Yi)i≥1 are i.i.d. Poi(γ) random variablesand X ∼ Poi(βγW );

(c) Dvd−→ X where X ∼ Poi(βγ2W ) when α > 1.

Theorem 9.17 can be understood as follows: The expected number of groupsthat individual v belongs to is roughly βγwvn

−(1−α)/2. When α < 1, this is closeto zero, so that Dv = 0 whp. For α = 1, the number of groups that v is part ofis close to Poisson with parameter βγwv, and the number of other individuals ineach of these groups is approximately Poi(γ) distributed. For α > 1, individual vbelongs to a number of groups that tends to infinity as pnm ≈ βγwvn(α−1)/2 whenn→∞, while each group has expected size of vanishing size n(1−α)/2. This meansthat group sizes are generally zero or one, and these choice are asymptoticallyindependent, giving rise to the Poisson distribution specified in part (c).

398 Related Models

Random intersection graphs with prescribed groups

We next discuss a setting in which the number of groups per vertex and the groupsizes are deterministic, and are obtained by randomly pairing vertices to groups.As such, the random intersection graph is obtained by (9.4.24), where now theedges between vertices in V (Gn) and groups in A(Gn) are modeled as a bipartiteconfiguration model.

Above we have studied random intersection graphs, where connections are ran-domly and independently formed between individuals and groups. We now de-scribe a model in which vertex v ∈ [n] belongs to d(ve)

v groups, while group g ∈ [m]has size d(gr)

g . Here n is the number of individuals, while m is the number of groups.Naturally, in order for the model to be well defined, we need that∑

v∈[n]

d(ve)

v =∑a∈[m]

d(gr)

a . (9.4.28)

As in (9.4.24), we call two vertices v1 and v2 neighbors when they are connectedto the same group, so that the degree of a vertex v is the total number of othervertices u for which there exists a group of which both u and v are members.

We will now focus on this particular setting, for concreteness. We refer to theextensive discussion in Section 9.6 for more details and references to the literature.

Local limit of random intersection graphs with prescribed groups

We now investigate the local limit of the giant component in the random inter-section graph with prescribed degrees and groups:

Theorem 9.18 (Local limit of random intersection graphs with prescribed groups)Consider the random intersection graphs with prescribed groups, where the num-ber of groups satisfies mn = βn. Assume that the group membership sequenced(ve) = (d(ve)

v )v∈[n] and the group size sequence d(gr) = (d(gr)

a )v∈[m] both satisfy Con-ditions 1.5(a)-(b), where also (9.4.28) holds. Then, the model converges locally inprobability to a so-called clique tree, where

B the number of groups that the root participate in has law D(ve), which is thelimiting law of d(ve);

B the number of groups that every other vertex participates in has law Y ? − 1,where Y ? − 1 is the size-biased version of D(ve);

B the number of vertices per group are i.i.d. random variables with law X?, whereX? is the size-biased version of the limiting law of d(gr).

As a result of Theorem 9.18, the degree distribution of the random intersectiongraphs with prescribed groups is equal to

D =D(ve)∑i=1

(X?i − 1), (9.4.29)

which can be compared to Theorem 9.17(b). The intuition behind Theorem 9.18is that the random intersection graph can easily be obtained from the bipartite


configuration model by making all group members direct neighbors. By construc-tion, the local limit of the bipartite configuration model can be described by analternating branching process of the size-biased vertex and group distributions.

The giant in random intersection graphs with prescribed degrees

We now investigate the existence and size of the giant component in the randomintersection graph with prescribed degrees and groups. Let Cmax be its largestconnected component, and C(2) denote its second largest cluster (breaking tiesarbitrarily when needed). The main result concerning the size of the giant is thefollowing theorem:

Theorem 9.19 (Giant in random intersection graphs with prescribed degrees)Consider the random intersection graphs with prescribed degrees and groups,where the number of groups satisfies mn = βn. Assume that the group mem-bership sequence d(ve) = (d(ve)

v )v∈[n] and the group size sequence d(gr) = (d(gr)

a )v∈[m]

both satisfy Conditions 1.5(a)-(b), where also (9.4.28) holds. Then, there existsζ ∈ [0, 1] such that

1

n|Cmax| P−→ ζ,

1

n|C(2)| P−→ 0, (9.4.30)

where ζ is the survival probability of the local limit in Theorem 9.18. Further,ζ > 0 precisely when ν > 1, where

ν =E[(D(ve) − 1)D(ve)]

E[D(ve)]

E[(D(gr) − 1)D(gr)]

E[D(gr)]. (9.4.31)

In terms of the intuition right below Theorem 9.18, the survival probability ofthe random intersection graph is identical to that of the bipartite configurationmodel. Due to its alternating nature in odd and even generations, the bipartiteconfiguration model needs to be treated with care. However, the number of indi-viduals in odd or in even generations is a normal branching process with offspringdistribution

∑Y ?−1i=1 (X?

i − 1) for the even generations, and∑X?−1

i=1 (Y ?i − 1) for the

odd generations. These branching processes have a positive survival probabilitywhen either of the two expected values ν satisfy ν > 1. Obviously, the expectedoffsprings in the branching processes describing even and odd generations agree,by Wald’s identity. This leads us to ν in (9.4.31).

9.4.4 Exponential random graphs and maximal entropy

Suppose we have a real-world network, for which we observe a large amount ofoccurrences of certain subgraphs F ∈ F . For example, while the model is sparse,we do see a linear amount of triangles. How can we model such a network? This isparticularly relevant in the area of social science, where it was early on observedthat social networks have much more clustering, i.e., many more triangles, thanone might expect based on many of the classical models. This raises the questionhow to devise models that have similar features.

One solution may be to take the subgraph counts for granted, and use as a

400 Related Models

model a random graph with precisely these subgraph counts. For example, con-sidering the subgraphs to be the starts of any order, this would fix the degreesof all the vertices, which would leave us to the uniform graph with prescribeddegrees. However, this model is notoriously difficult to work with, and even tosimulate. It only becomes more difficult when taking more involved quantities,such as the number of triangles or the number of triangles per vertex, into ac-count. Thus, this solution may be practically impossible. Also, it may be that thenumbers we observe are merely the end product of a random process, so that weshould see the realizations of a description of the mean of the values, so a softconstraint rather than a hard constraint.

The exponential random graph is a way to leverage the randomness, and stillget a model that one can write down. Indeed, let F be a collection of subgraphs,and suppose that we observe that, on our favorite real-world network, the numberof occurrences of subgraph F equals αF for every F ∈ F . Let us now write outwhat this might mean. Let F be a graph on |V (F )| = m vertices. For a graph Gon n vertices and m vertices v1, . . . , vm, let G|(vi)i∈[m]

be the subgraph spannedby (vi)i∈[m]. This means that the vertex set of G|(vi)i∈[m]

equals [m], while its edgeset equals i, j : vi, vj ∈ E(G). The number of occurrences of F in G canthen be written as

NF (G) =∑

v1,...,vm∈V (G)

1G|(vi)i∈[m]=F. (9.4.32)

Here, it will be convenient to recall that we may equivalently writeG = (xi,j)1≤i<j≤n,where xi,j ∈ 0, 1 and xi,j = 1 if and only if i, j ∈ E(G). Then, we can writeNF (G) = NF (x).

In order to define a measure, we can take a so-called exponential family of theform

p~β(x) =1

Zn(~β)e∑F∈F βFNF (x), (9.4.33)

where Z = Zn(~β) is the normalization constant

Zn(~β) =∑x

e∑F∈F βFNF (x). (9.4.34)

In order to make sure that p~β has the correct mean values for NF , we choose~β = (βF )F∈F as the solution to∑

x

NF (x)p~β(x) = αF for all F ∈ F . (9.4.35)

In this case, the mean of NF (X), when P(X = x) = p~β(x), is exactly suchthat E[NF (X)] = αF for all F ∈ F . Further, when conditioning on NF (x) =qF for some parameters (qF )F∈F , the conditional exponential random graph isuniform over the set of graphs with this property. This is a conditioning propertyof exponential random graphs.


We next discuss two examples that we know quite well, and that arise asexponential random graphs with certain specific subgraph counts:

Example 9.20 (Example: ERn(λ/n)) Take NF (x) =∑

i,j∈[n] xij = |E(Gn)|, sothat we put a restriction on the expected number of edges in the graph. In thiscase, we see that, with Gn = (xi,j)1≤i<j≤n,

Zn(~β) =∑x

eβ|E(Gn)| = (1 + eβ)(n2), (9.4.36)

and

p~β(x) =1

Zeβ|E(Gn)| =

∏1≤i<j≤n

eβxi,j

1 + eβ. (9.4.37)

Thus, the different edges are independent, and an edge is present with probabilityeβ/(1 + eβ), and absent with probability 1/(1 + eβ). In the sparse setting, we aimthat

E[|E(Gn)|] =λ

2(n− 1), (9.4.38)

so that the average degree per vertex is precisely equal to λ. The constraint in(9.4.35) thus reduces to (

n

2

)eβ

1 + eβ=λ

2(n− 1). (9.4.39)

This leads to ERn(λ/n), where

eβ

1 + eβ=λ

n, (9.4.40)

that is, eβ = λ/(n − λ). This shows that the ERn(λ/n) is an example of anexponential random graph with a constraint on the expected number of edges inthe graph. Further, by the conditioning property of exponential random graphs,conditionally on ERn(λ/n) = m, the distribution is uniform over all graphs withm edges.

Example 9.21 (Example: GRGn(w)) The second example arises when we fixon the expected degrees of all the vertices. This arises when we take Nv(x) =∑

j∈[n] xvj = d(Gn)

v for every v ∈ [n], so that we put a restriction on the expecteddegree of all the vertices in the graph. In this case, we see that, with Gn =(xi,j)1≤i<j≤n,

Zn(~β) =∑x

e∑v∈[n] βvd

(Gn)v =

∑x

e∑i,j∈[n](βi+βj)xi,j =

∏i≤i<j≤n

(1 + eβi+βj ), (9.4.41)

and

p~β(x) =1

Zne∑v∈[n] βvd

(Gn)v =

∏1≤i<j≤n

e(βi+βj)xi,j

1 + eβi+βj. (9.4.42)

402 Related Models

Thus, the different edges are independent, and edge i, j is present with proba-bility eβi+βj/(1+eβi+βj ), and absent with probability 1/(1+eβi+βj ). In the sparsesetting, we aim that

E[d(Gn)

v ] = αv, (9.4.43)

so that the average degree of vertex v is precisely equal to αv. The constraint in(9.4.35) thus reduces to ∑

j 6=v

eβv+βj/(1 + eβv+βj ) = αv. (9.4.44)

This leads to GRGn(w), where

wv√∑u∈[n]wu

= eβv , (9.4.45)

that is, β = λ/(n − λ). This shows that the GRGn(w) is an example of anexponential random graph with a constraint on the expected number of edges inthe graph. Further, by the conditioning property of exponential random graphs,conditionally on d(Gn)

v = αv for all v ∈ [n], the distribution is uniform over allgraphs with these degrees. This gives an alternative proof of Theorem 1.4.

Now that we have discussed two quite nice examples of exponential randomgraphs, let us also discuss its intricacies. The above choices, in Examples 9.20and 9.21, and quite special in the sense that the exponent in (9.4.33) is linear inthe edge occupation statuses (xi,j)1≤i<j≤n. This gives rise to exponential randomgraphs that have independent edges. However, when investigating more intricatesubgraph counts, such as triangles, this linearity is no longer true. Indeed, thenumber of triangles is a cubic function of (xi,j)1≤i<j≤n. In such cases, the edgeswill no longer be independent, making the exponential random graph very hardto study.

Indeed, the exponential form in (9.4.33) naturally leads to large deviationsof random graphs, a topic that is much better understood in the dense settingwhere the number of edges grown proportionally to n2. In the sparse setting,such problems are hard, and sometimes ill-defined. We refer to the notes anddiscussion in Section 9.6 for more background and references.

9.5 Spatial random graphs

The models described so far do not incorporate geometry at all. Yet, geometrymay be relevant. Indeed, in many networks, the vertices are located somewhere inspace, and their locations may be relevant. People who live closer to one anotherare more likely to know each other, even though we all know people who live faraway from us. This is a very direct link to the geometric properties of networks.However, the geometry may also be much more indirect or latent. This couldarise, for example, since people who have similar interests are also more likely to

9.5 Spatial random graphs 403

know one another. This, when associating a whole bunch of attributes to verticesin the network, vertices with more similar attributes (age, interests, profession,music preference, etc.) may be more likely to know each other. In any case, weare rather directly led to studying networks where the vertices are embedded insome general geometric space. This is what we refer to as spatial networks.

One further aspect of spatial random graphs deserves to be mentioned. Due tothe fact that nearby vertices are more likely to be neighbors, it is also true thattwo neighbors of a vertex are more likely to be connected. Therefore, geometryleads to clustering, and it may be one of the most natural ways to do so.

9.5.1 Small-world model

The small-world model was the first spatial model to be proposed by Wattsand Strogatz (1998). We again refer to the notes and discussion in Section 9.6for more background and references, including the history of the model. The aimwas to describe how the small-world effect can arise in a simple and natural waythrough the addition of long-range edges. Here, long-range refers to the underlyinggeometry, where the vertices are located in some geometric space with a naturalintrinsic distance on it. Long-range edges refers to pair of vertices that are faraway in the geometry, yet are neighbors in the network. Such long-range edgescan lead to substantial shortcuts, and thus significantly decrease graph distances.Let us describe a first version of the small-world network.

We start with a finite torus, and add random long-range connections to them,independently for each pair of vertices. This gives rise to a graph that is a smallperturbation of the original lattice, but has occasional long-range connectionsthat are crucial in order to shrink graph distances. From a practical point ofview, we can think of the original graph as being the local description of ac-quaintances in a social network, while the shortcuts describe the occasional farapart acquaintances. The main idea is that, even though the shortcuts only forma small part of the connections in the graph, they are crucial in order to make ita small world.

Small-world behavior in the continuous circle model

The simplest version of the model studied by Barbour and Reinert (2001) is ob-tained by taking the circle of circumference n, and adding a Poisson number ofshortcuts with parameter nρ/2, where the starting and endpoints of the shortcutsare chosen uniformly at random independently of each other. This model is calledthe continuous circle model. Distance is measured as usual along the circle, andthe shortcuts have, by convention, length zero. Thus, one can think of this modelas the circle where the points along the random shortcut are identified, thus cre-ating a puncture in the circle. Multiple shortcuts then lead to multiple puncturesof the circle, and the distance is then the usual distance along the puncturedgraph. Bear in mind that this is something different from the usual graph dis-tances, which count the number of edges along the shortest path between pairsof vertices. The following result describes this punctured graph distance:

404 Related Models

Theorem 9.22 (Distance in continuous circle model) Let Dn the distance be-tween two uniformly chosen points along the punctured circle in the continuouscircle model. Then, for every ρ > 0, as n→∞,

Dn(2ρ)/ log(ρn)P−→ 1. (9.5.1)

More precisely,

ρ(Dn − log(ρn)/2)d−→ T, (9.5.2)

where T is a random variable satisfying

P(T > t) =

∫ ∞0

e−ydy

1 + e2t. (9.5.3)

The random variable T can also be described by

P(T > t) = E[e−e2tW (1)W (2)

], (9.5.4)

where W (1), W (2) are two independent exponential random variables with param-eter 1. Alternatively, it can be see that T = (G1 +G2 −G3)/2, where G1, G2, G3

are three independent Gumbel distributions (see (Barbour and Reinert, 2006,Page 1242)).

Interestingly, the method of proof of Theorem 9.22 (see Barbour and Reinert(2001)) is quite close to the method of proof for Theorem 7.20. Indeed, againthe parts of the graph that can be reached in distance at most t are analyzed.Let P1 and P2 be two uniform points along the circle, so that Dn has the samedistribution as the distance between P1 and P2. Denote by R(1)(t) and R(2)(t)the parts of the graph that can be reached within distance t. Then, Dn = 2Tn,where Tn is the first time that R(1)(t) and R(2)(t) have a non-zero intersection.The proof then consists of showing that, up to time Tn, the processes R(1)(t) andR(2)(t) are close to certain continuous-time branching processes, primarily due tothe fact that the probability that there are two intervals that are overlapping inquite small. Then, W (1) and W (2) can be viewed as appropriate martingale limitsof these branching processes.

Compared to Theorem 7.20, we see that now the rescaled distance Dn, aftersubtraction of the right multiple of log n, converges in distribution, while in The-orem 7.20, convergence is at best along subsequences. This is due to the fact thatDn is a continuous random variable, while the graph distance is integer-valued.Therefore, the latter suffers from discretization effects. In the next paragraph, wewill see that the graph distances in the small-world model suffer similar issues.

Small-world behavior in the small-world model

We now extend the analysis to the graph distances in the discrete model, whichis highly similar to the original small-world model. In the discrete model, the ver-tices are located on a discrete torus containing n vertices, and an extra Poi(nρ/2)edges is added uniformly at random. Again, these edges identify the points atthe two sides of the edge, rather than being a proper edge between them. Now


distGn(o1, o2) denotes the graph distances in this (connected) graph between twovertices o1, o2 chosen uniformly at random from [n]. The following theorem de-scribes the asymptotics of this random variable:

Theorem 9.23 (Distance in small-world model) Let Gn be the discrete small-world model, where vertices are connected to their immediate neighbors along thecircle of size n, and a Poisson number with parameter ρn/2 edges are addeduniformly at random. Let distGn(o1, o2) denote the graph distance between twouniformly chosen vertices o1, o2 ∈ [n]. Then,

distGn(o1, o2)/ log nP−→ 1

log (1 + 2ρ). (9.5.5)

More precisely,

P(

distGn(o1, o2)−⌈ log n

log (1 + 2ρ)

⌉+ x

)= E

[e−φ

20(1+2ρ)xW1W2

], (9.5.6)

where W1,W2 are two i.i.d. copies of an appropriate martingale limit, for a certainφ0 = φ0(n, ρ).

It would be of interest to extend this to the small-world model where the long-range edges are also being counted as having distance 1, rather than distancezero.Explain the 1 + 2ρ!

9.5.2 Hyperbolic random graphs

Here we consider the hyperbolic random graph where vertices are in a diskof radius R, and connected if their hyperbolic distance is at most R Krioukovet al. (2010). These graphs are very different from general inhomogeneous ran-dom graphs, because their geometry creates more clustered random graphs. Thehyperbolic random graph has two key parameters, ν and α. The model samplesn vertices on a disk of radius R = 2 log(n/ν), where the density of the radialcoordinate r of a vertex p = (r, φ) is

ρ(r) = αsinh(αr)

cosh(αR)− 1. (9.5.7)

Here ν parametrizes the average degree of the generated networks and −α theso-called negative curvature of the space. The angle φ of p is sampled uniformlyfrom [0, 2π], so that the points have a spherically symmetric distribution. Then,two vertices are connected when their hyperbolic distance is at most R. Here,the hyperbolic distance x = dH(u, v) between two points at polar coordinatesu = (r, φ) and v = (r′, φ′) is given by the hyperbolic law of cosines

cosh(x) = cosh(r) cosh(r′)− sinh(r) sinh(r′) cos(‖φ− φ′‖), (9.5.8)

where ‖φ−φ′‖ = π−|π−|φ−φ′|| is the difference between the two angles (whichis the Euclidean distance on the circle).

406 Related Models

For a point i with radial coordinate ri, we define its type ti as

ti = e(R−ri)/2. (9.5.9)

Then, the degree of vertex i can be approximated by a Poisson random variablewith mean ti. Thus, with high probability,

Di = Θ(ti), (9.5.10)

where Di denotes the degree of vertex i. Furthermore, the random variables (ti)i≥1

are distributed as a power-law with exponent τ , so that the degrees have a power-law distribution as well. Let us now explain the degree structure in more detail.

Degree structure of hyperbolic random graphs

Let

P (n)

k =1

n

∑v∈[n]

1Dv=k (9.5.11)

denote the degree distribution in the hyperbolic random graph. As explainedinformally in the previous paragraph, we may expect that the degree distributionobeys a power law. This is the content of the following theorem:

Theorem 9.24 (Power-law degrees in hyperbolic random graphs) As n→∞,there exists a probability distribution (pk)k≥0 such that

P (n)

k

P−→ pk, (9.5.12)

where (pk)k≥0 obeys an asymptotic power law, i.e., there exists a c > 0 such that

pk = ck−τ (1 + o(1)), (9.5.13)

where τ = 2α+ 1.

We deduce that the model is scale-free, meaning that the asymptotic degreedistribution has infinite variance, precisely when α ∈ ( 1

2, 1), and otherwise the

degree distribution obeys a power law with a larger degree exponent.The exact form of pk is identified by Fountoulakis et al. (2020), and involves

several special functions. This result is quite impressive, and the proof is quiteinvolved. For the purpose of this book, the exact shape of pk is not so relevant.

Much more is known about the local structure of the hyperbolic random graph,for example, its local weak limit has been identified. We postpone this discussionto the next section, in which we discuss the local weak limit in geometric in-homogeneous random graphs. It turns out that we can interpret the hyperbolicrandom graph as a special case, which is very interesting in its

The giant in hyperbolic random graphs

We next study the giant in hyperbolic random graphs. The main result, which issomewhat surprising, is as follows:


Theorem 9.25 (Giant in hyperbolic random graphs) Let Cmax and C(2) bethe maximal and second largest cluster in the hyperbolic random graph with pa-rameters α > 1

2and ν > 0. Then, the largest connected components satisfy the

following properties:

(a) For α > 1, |Cmax|/n P−→ 0 for all ν > 0.

(b) For α ∈ ( 12, 1), there exists a ζ such that |Cmax|/n ≥ ζ > 0 whp, and |C(2)|/n P−→

0 for all ν > 0.

(c) For α = 1, there exist π/8 ≤ ν0 ≤ ν1 ≤ 20π such that |Cmax| ≥ n/610 and

|C(2)|/n P−→ 0 for all ν > ν1, while |Cmax|/n P−→ 0 for ν ≤ ν0.

We see that a giant component only exists in the scale-free regime, which isquite surprising. In various other random graphs, also a giant component existsin settings where the degrees have finite variance, particularly when there aresufficiently many edges. This turns out not to be the case for hyperbolic randomgraphs, and we next explain the cause of this surprising result. The fact thatno giant exists when τ > 3 is explained in more detail in Section 9.5.3, seebelow Theorem 9.32. There, it is explained that the hyperbolic random graphis a special case of a one-dimensional geometric inhomogeneous random graph.Giants in one dimension only exist when the degrees have infinite variance. Wepostpone a further discussion to that section.

More precise results are known in some of these cases. For α > 1, Bode et al.(2015) showed that |Cmax| = ΘP(R

2(log logR)3n1/α), while for α ∈ ( 12, 1), Kiwi

and Mitsche (2019) proved that the second largest component is at most poly-logarithmic (i.e., at most a power of log n).

Ultra-small distances in hyperbolic random graphs

Obviously, there is little point in studying typical distances in the hyperbolicrandom graph when there is no giant component, even though such results doshed light on the component structure of smaller components. Thus, we will notrestrict to the setting where α ∈ ( 1

2, 1), where typical distances are ultra-small:

Theorem 9.26 (Ultra-small distances in hyperbolic random graphs) For α ∈( 1

2, 1), conditionally on o1, o2 being connected, with Gn the hyperbolic random

graph with parameters α > 12

and ν > 0,

dGn(o1, o2)

log lognP−→ 2

| log (2α− 1)| . (9.5.14)

Note that 2α−1 = τ−2, so that Theorem 9.26 is very similar to the ultra-smallnature of the random graphs studied in this book.

Clustering in hyperbolic random graphs

We now study c(k) for the hyperbolic random graph. The main result is as follows:

408 Related Models

Figure 9.10 A hyperbolic embedding of the Internet at AutonomousSystems level by Boguna et al. (2010) (see (Boguna et al., 2010, Figure 3))

Theorem 9.27 (Clustering spectrum of hyperbolic random graphs) The localclustering coefficient in the hyperbolic random graph satisfies with high probability

c(k) ∝

k−1 τ > 5

2, k 1

k4−2τ τ < 52, 1 k √n

k2τ−6n5−2τ τ < 52, k √n.

(9.5.15)

In the hyperbolic random graph, typical triangles that contribute most to c(k)are given by vertices of very typical degrees, as we next explain. The typicaltriangle for τ > 5/2 is a triangle where one vertex has degree k, and the othertwo have constant degree. When τ < 5/2 and k <

√n, the typical triangle has

three vertices of degree k. When k >√n, a typical triangle has one vertex of

degree k and two of degree n/k.


Hyperbolic embeddings of complex networks

In recent years, the question whether hyperbolic geometries can be used to ef-ficiently map real-world complex networks has attracted considerable attention.Indeed, hyperbolic random graphs have very small distances on the one hand,while on the other hand also have the necessary clustering to make them appro-priate for many complex networks. Of course, the question of how to embed themprecisely is highly relevant and also quite difficult.

The example that has attracted the most attention is the Internet. In Figure9.10, you can see a hyperbolic embedding of the Internet, as performed by Bo-guna et al. (2010). We see that the regions on the boundary of the outer circlecan be grouped in a fairly natural way, where the countries in which the au-tonomous systems reside seem to be grouped according to their geography, withsome exceptions (for example, it is not so clear why Kenyia is almost next to theNetherlands). This hyperbolic geometry is first of all quite interesting, but analso be helpful in sustaining the ever growing Internet traffic (see Boguna et al.(2010) and the references therein).

9.5.3 Geometric inhomogeneous random graphs

We start by defining the geometric inhomogeneous random graph. Here verticesare in a general metric space X ⊆ Rd for some dimension d ≥ 1. We let (Xi)i∈[n]

denote their locations, and assume that the (Xi)i∈[n] are chosen in an i.i.d. wayfrom some measure µ on X . Often, we will assume that X = [0, 1]d is the cubeof width 1, with periodic boundary conditions. Further, we assume that eachvertex i has a weight Wi associated to it, where (Wi)i∈[n] are assumed to be i.i.d.Often, we assume that Wi are power-law random variables, even though this isnot necessary for the definition.

The edges are conditionally independent given (xi)i∈[n] and (wi)i∈[n], where theconditional probability that the edge between u and v is present is equal to

pu,v = p(xu, xv, wu, wv, (ws)s∈[n]\u,v) (9.5.16)

= Θ(

1 ∧( wuwv∑

i∈[n]wi

)maxα,1‖xu − xv‖−dα),

where α > 0 is an appropriate parameter. We will assume that the vertex weightsobey a power law, i.e.,

P(W > w) = L(w)w−(τ−1) (9.5.17)

for some slowly varying function L : [0,∞) → (0,∞). As is usual, the literaturetreats a variety of models and settings, and we refer to the notes and discussionin Section 9.6 for more details. To avoid trivialities, we assume that there existsa ε > 0 such that

P(W < ε) = 1. (9.5.18)

We will sometimes need to assume a more restrictive setting:

410 Related Models

Assumption 9.28 (Limiting connection probabilities exist) Set X [− 12, 1

2]d with

d ≥ 1. Assume the following:

(a) There exists a measurable event En such that P(En)→ 1;(b) There exist intervals I∆(n) ⊆ R+, IW (n) ⊆ [1,∞) and a sequence ε(n) such

that I∆(n)→ R+, IW (n)→ [1,∞) and ε(n) = o(1) as n→∞;(c) There exists h : Rd × R × R such that, on the event En, for almost every x ∈

[− 12, 1

2]d, all u, v ∈ [n]∣∣∣p(x, x+ ∆/n1/d, wu, wv, (Ws)s∈[n]\u,v)− h(∆, wu, wv)

∣∣∣h(∆, wu, wv)

≤ ε(n). (9.5.19)

Having introduced the model, we will first explain how the hyperbolic randomgraph can be cast into this format, followed by a discussion of the properties ofthe geometric inhomogeneous random graph (GIRG).

Hyperbolic random graph as GIRG

We now explain that the hyperbolic random graph can be interpreted as a specialcase of the geometric inhomogeneous random graph. To this end, we embed thedisk of the native hyperbolic model into our model with dimension 1, hence wereduce the geometry of the hyperbolic disk to the geometry of a circle, but gainadditional freedom as we can choose the weights of vertices. Notice that a singlepoint on the hyperbolic disk has measure zero, so we can assume that no vertexhas radius rv = 0. Recall that dH(u, v) denotes the hyperbolic distance definedin (9.5.8). Here, we also explain this relation for the soft hyperbolic graph, forwhich the probability that there exists an edge between the vertices u = (ru, φu)and v = (rv, φv) equals

pH(dH(u, v)) =(

1 + e(dH(u,v)−Rn)/(2T ))−1

. (9.5.20)

In the limit T 0, this gives rise to the (hard) hyperbolic graph as studied inthe previous section. The identification is given in the following theorem:

Theorem 9.29 (Hyperbolic random graphs as GIRGs) The soft hyperbolic ran-dom graph is a geometric inhomogeneous random graph that satisfies Assumption9.28 with parameters

d := 1, τ := 2α+ 1, α := 1/T. (9.5.21)

and limiting connection probabilities given by

h(∆, wu, wv) =1

1 +(

c∆wuwv

)α , (9.5.22)

where c =√

2π/ν.The hard hyperbolic random graph can be mapped to a threshold GIRG, wherevertices u, v ∈ [n] are connected with probability one when ‖xu−xv‖ ≤ νwuwv/π.


Sketch of proof of Theorem 9.29. Here we give a sketch of the proof. We definethe mapping

wv := e(R−rv)/2 and xv := φv/2π, (9.5.23)

where, for v ∈ [n], its radial coordinate equals rv and its angular coordinateφv. The map (r, φ) 7→ (e(R−r)/2, φ/2π) is a bijection from the hyperbolic space to[0, eR/2]×T1,1, where T1,1 = [−π, π] with periodic boundary conditions. Therefore,it also has an inverse, which we will denote by g(wv, xv) = (rv, φv). Then, fori, j ∈ [n], we let

pi,j = pi,j((wi, xi), (wj, xj)) = pH(dH(g(wi, xi), g(wj, xj))). (9.5.24)

This maps the soft hyperbolic random graph to an explicit geometric inhomoge-neous random graph.

We next verify the conditions on the edge probabilities that are required forgeometric inhomogeneous random graphs. We start by computing from (9.5.7)that

P(rv ≤ r) =

∫ r

0

ρ(r)dr =

∫ r

0

αsinh(αr)

cosh(αR)− 1dr =

cosh(αr)− 1

cosh(αR)− 1. (9.5.25)

Since Rn = 2 log(n/ν) is large,

P(rv ≤ r) =eα(r−R) + e−α(R+r)

1 + e−2R= eα(r−R)(1 + o(1)). (9.5.26)

This has the following consequence for the distribution of wv, since

P(wv > w) = P(rv < R− 2 logw) = e−2α(1 + o(1)) = w−(τ−1)(1 + o(1)), (9.5.27)

as required by (9.5.17). In particular, this implies that most vertices in [n] willhave rv ∈ [R,R−A] for A large. Thus, most vertices will be close to the boundaryof the hyperbolic disc.

We next investigate the role of the radial coordinate. For this, we assume thatru ∈ [0, R], which is true almost surely, and ru + rv ≥ R, which occurs with highprobability for R large. We use cosh(x ± y) = cosh(x) cosh(y) ± sinh(x) sinh(y)to rewrite (9.5.8) for u = (ru, φu) and v = (rv, φv) as

cosh(dH(u, v)) = cosh(ru − rv) + sinh(ru) sinh(rv)[1− cos(‖φu − φv‖)]. (9.5.28)

When ‖φu − φv‖ is small, as in Assumption 9.28, we thus obtain that

cosh(dH(u, v)) = cosh(ru−rv)+sinh(ru) sinh(rv)‖φu−φv‖2/2(1+o(1)). (9.5.29)

Denote sn = dH(u, v)−Rn, and note that

esn = 2 cosh(dH(u, v))e−Rn − e−sn−2Rn . (9.5.30)

412 Related Models

Further, multiplying (9.5.29) by e−Rn leads to, with Rn = 2 log(n/ν),

cosh(dH(u, v))e−Rn = eru−Rnerv −RneRn‖φu − φv‖2/2(1 + o(1)) + cosh(ru − rv)e−Rn

=2π2

ν2

n2(xu − xv)2

(wuwv)2(1 + o(1)) + ν2 (wu/wv)

2 + (wu/wv)2

n2.

(9.5.31)

Combining (9.5.30) and (9.5.31), and substituting this into (9.5.20) for xv =x+ ∆/n, leads to

pH(dH(u, v)) =(

1 + esn/(2T ))−1

=1

1 +(

2π2

ν2∆2

(wuwv)2

)1/(2T )(1 + o(1)) (9.5.32)

=1

1 +(c ∆wuwv

)α (1 + o(1)),

where c =√

2π/ν.

Since hyperbolic random graphs are special cases of the GIRG, we will see thatsome results can simply be deduced from their analysis for the GIRG. Also, thefact that hyperbolic graphs arise as a one-dimensional GIRG helps us to explainwhy there is no giant for α > 1, for which the degree power-law exponent satisfiesτ > 3.

Degree structure of geometric inhomogeneous random graphs

We start by analising the degree structure of the graph. Let

P (n)

k =1

n

∑v∈[n]

1Dv=k (9.5.33)

denote the degree distribution in the hyperbolic random graph. As explainedinformally in the previous paragraph, we may expect that the degree distributionobeys a power law. This is the content of the following theorem:

Theorem 9.30 (Power-law degrees in geometric inhomogeneous random graphs)Let the edge probabilities in the geometric inhomogeneous random graph by givenby

pu,v = λ(

1 ∧(WuWv

n

)maxα,1‖Xu −Xv‖−dα), (9.5.34)

where (Wu)u∈[n] are i.i.d. random variables satisfying P(W > w) = cWw−(τ−1)(1+

o(1)) for some cW > 0 and w → ∞, while (Xu)u∈[n] are i.i.d. uniform randomvariables on [− 1

2, 1

2]d.

As n→∞, there exists a probability distribution (pk)k≥0 such that

P (n)

k

P−→ pk, (9.5.35)

where (pk)k≥0 obeys an asymptotic power law, i.e., there exists a c > 0 such that

pk = ck−τ (1 + o(1)). (9.5.36)


Update from here!

This theorem has not been proved in the current form anywhere in the litera-ture, so we prove it explicitly here:

Proof of Theorem 9.30. We use a second moment method, and start by analysingE[P (n)

k ]. Note that

E[P (n)

k ] = P(Don = k), (9.5.37)

where on ∈ [n] is chosen uniformly at random. Further, note that

Dv =∑u∈[n]

Iu,v, (9.5.38)

where

P(Iu,v = 1 | (xi)i∈[n], (wi)i∈[n]) = p(xu, xv, wu, wv, (ws)s∈[n]\u,v). (9.5.39)

Local limit of geometric inhomogeneous random graphs

We next study the local limit of geometric inhomogeneous random graphs. Beforestating the result, we introduce the local limit. We fix a Poisson process with in-tensity 1 on Rd, with an additional point on the origin that will serve as our pointof reference. To each point x ∈ Rd in this process, we associate an independentcopy of the limiting weight Wx. Then, we draw an edge between two vertices x, ywith probability h(x, y,Wx,Wy), where these variables are conditionally indepen-dent given the points of the Poisson process and the weights of its points. Callthe above process the Poisson infinite GIRG with edge-probabilities given by h.When h ∈ 0, 1, we are dealing with a threshold Poisson infinite GIRG withedge statuses given by h.

The main result is as follows:

Theorem 9.31 (Local limit of geometric inhomogeneous random graphs) Con-sider a geometric inhomogeneous random graph under Assumption 9.28. Then, itconverges locally weakly to the Poisson infinite GIRG with edge-probabilities givenby h. The same result is true for a threshold geometric inhomogeneous randomgraph.

Together with Theorem 9.29, Theorem 9.31 also identifies the local weak limitof both the soft as well as the hard hyperbolic random graph.

The giant in geometric inhomogeneous random graphs

We next study the giant in hyperbolic random graphs. The main result, which issomewhat surprising, is as follows:

Theorem 9.32 (Giant in geometric inhomogeneous random graphs) Let Cmax

414 Related Models

and C(2) be the maximal and second largest cluster in the geometric inhomo-geneous random graph with parameters τ ∈ (2, 3). Assume that the edge prob-abilities satisfy (9.5.16). Then, the largest connected components satisfy that

|Cmax|/n ≥ ζ > 0, while |C(2)|/n P−→ 0 for all α > 0.

Thus, a giant component always exists in the scale-free regime. A general resultfor τ > 3 has not been proved (yet?), and is much more delicate. We will returnto this in Section 9.5.5, where we investigate scale-free percolation, which lives onZd. There, the existence of an infinite component containing a positive proportionof the vertices should correspond to the existence of a giant in our finite randomgraphs.

Let us complete this part by discussing the situation when d = 1. Here, wecan show that when τ > 3, there is no infinite component in the local limit,which suggests that also no giant exists in the pre-limit. Recall Corollary 2.25,but beware that we do not know that the local limit exists in probability. We calla vertex u an isolated vertex when there is no edges v1, v2 with v1 ≤ u ≤ v2

present in the graph. In particular, this means that the connected componentof u is finite. We will prove that the expected number of edges v1, v2 withv1 ≤ u ≤ v2 that are present is bounded. Indeed, for the local limit of the hardhyperbolic graph, this number is equal to∑

v1≤u≤v2

E[ 1

1 +(‖v1−v2‖dcWv1

Wv2

)α ]. (9.5.40)

We bound this by

C∑

v1≤u≤v2

E[ (Wv1

Wv2)α

1 + ‖v1 − v2‖dα∧

1]≤ C

∑k≥1

kE[ Xα

1 + kdα

∧1], (9.5.41)

where X = W1W2 is the product of two independent W variables. Now, whenP(W > w) = w−(τ−1), it is not hard to see that

P(Xα > x) ≤ C log x

x−(τ−1)/α. (9.5.42)

In turn, this implies that

E[ Xα

1 + kdα

∧1]≤ Ck−(τ−1), (9.5.43)

which is summable and thus the expected number of edges v1, v2 with v1 ≤u ≤ v2 that are present is bounded. In turn, this suggests that the number equalszero with strictly positive probability (beware, these numbers are not independentfor different u), and this in turn suggests that there is a positive proportion ofthem. However, when there is a positive proportion of them, then there cannotbe an infinite component. This intuitively explains why the existence of the giantcomponent is restricted to τ ∈ (2, 3). Thus, the absence of a giant in hyperbolicgraphs with power-law exponent τ with τ > 3 is intimately related to this modelbeing inherently one-dimensional.


Ultra-small distances in geometric inhomogeneous random graphs

Obviously, there is little point in studying typical distances in the hyperbolicrandom graph when there is no giant component, even though such results doshed light on the component structure of smaller components. Thus, we will notrestrict to the setting where α ∈ ( 1

2, 1), where typical distances are ultra-small:

Theorem 9.33 (Ultra-small distances in geometric inhomogeneous random graphs)For τ ∈ (2, 3), conditionally on o1, o2 being connected, with Gn the generalizedinhomogeneous random graph as in

dGn(o1, o2)

log log nP−→ 2

| log (τ − 2)| . (9.5.44)

Clustering in geometric inhomogeneous random graphs

We now study the clustering properties of geometric inhomogeneous randomgraph. The main result is as follows:

Theorem 9.34 (Clustering in geometric inhomogenous random graphs) Theglobal clustering coefficient in the geometric inhomogeneous random graph is Θ(1)whp.

9.5.4 Spatial preferential attachment models

In the past years, several spatial preferential attachment models have beenconsidered. We shall now discuss three of such models.

Geometric preferential attachment model.

We next discuss a class of geometric preferential attachment models that combinesaspects of random geometric graphs and preferential attachment graphs. Let Gt =(Vt, Et) denote the graph at time t. Let S be the sphere S in R3 with area equalto 1. Then, we let Vt be a subset of S of size t.

The process (Gt)t≥0 evolves as follows. At time t = 0, G0 is the empty graph.At time t+ 1, given Gt, we obtain Gt+1 as follows. Let xt+1 be chosen uniformlyat random from S, and denote Vt+1 = Vt ∪ xt+1. We assign m edges to thevertex xt+1, which we shall connect independently of each other to vertices inVt(xt+1) ≡ Vt ∩ Br(xt+1), where Br(u) = x ∈ S : ‖x − u‖ ≤ r denotes thespherical cap of radius r around u. Let

Dt(xt+1) =∑

v∈Vt(xt)

D(t)

v , (9.5.45)

where D(t)

v denotes the degree of vertex v ∈ Vt in Gt. The m edges are connectedto vertices (y1, . . . , ym) conditionally independently given (Gt, xt+1), so that, forall v ∈ Vt(xt+1),

P(yi = v | Gt) =D(t)

v

max(Dt(xt+1), αmArt), (9.5.46)

416 Related Models

while

P(yi = xt+1 | Gt) = 1− Dt(xt+1)

max(Dt(xt+1), αmArt), (9.5.47)

where Ar is the area of Br(u), α ≥ 0 is a parameter, and r is a radius which shallbe chosen appropriately. Similarly to the situation of geometric random graphs,the parameter r shall depend on the size of the graph, i.e., we shall be interestedin the properties of Gn when r = rn is chosen appropriately. The main result ofFlaxman et al. (2006) is the study of the degree sequence of the arising model.Take rn = nβ−1/2 log n, where β ∈ (0, 1/2) is a constant. Finally, let α > 2. Then,there exists a probability distribution pk∞k=m such that, whp,

P (n)

k = pk(1 + o(1)), (9.5.48)

where (pk)∞k≥m satisfies (1.1.7) with τ = 1 + α ∈ (3,∞). The precise result is

in (Flaxman et al., 2006, Theorem 1(a)) and is quite a bit sharper, as detailedconcentration results are proved as well. Further results involve the proof of con-nectivity of Gn and an upper bound on the diameter when r ≥ n−1/2 log n,m ≥ K log n for some large enough K and α ≥ 0 of order O(log (n/r)). In Flax-man et al. (2007), these results were generalized to the setting where, instead ofa unit ball, a smoother version is used.

Spatial preferential attachment as influence.

We next consider a spatial preferential attachment model with local influenceregions, as a model for the Web graph. The model is directed, but it can be easilyadapted to an undirected setting. The idea behind this model is that for normalpreferential attachment models, new vertices should be aware of the degrees ofthe already present vertices. In reality, it is quite hard to observe the degrees ofvertices, and, therefore, we let vertices instead have a region of influence in somemetric space, for example the torus [0, 1]m for some dimension m, for which themetric equals

d(x, y) = min‖x− y + u‖∞ : u ∈ 0, 1,−1m. (9.5.49)

When the new vertex arrives, it is uniformly located somewhere in the unit cube,and it connects to each of the older vertices in which region of influence they landindependently and with fixed probability p. These regions of influence evolve astime proceeds, in such a way that the volume of the influence region of the vertexi at time t is equal to

R(i, t) =A1D

(t)

i +A2

t+A3

, (9.5.50)

where now D(t)

i is the in-degree of vertex i at time t, and A1, A2, A3 are parameterswhich are chosen such that pA1 ≤ 1. One of the main results of the paper is thatthis model is a scale-free graph process. Indeed, denote

pk =pk

1 + kpA1 + pA2

k−1∏j=0

jA1 +A2

1 +A2 + pA2

, (9.5.51)


then (Aiello et al., 2008, Theorem 1.1) shows that whp, for k ≤ (n1/8/ log n)4pA1/(2pA1+1),the degree sequence of the graph of size n satisfies

P (n)

k = pk(1 + o(1)), (9.5.52)

and (pk)k≥0 satisfies (1.1.7) with τ = 1+1/(pA1) ∈ [2,∞). Further results involvethe study of maximal in-degrees and the total number of edges.

9.5.5 Complex network models on the hypercubic lattice

In this section, we define a percolation model that interpolates between long-range percolation and the scale-free rank-1 inhomogeneous random graphs asdiscussed in Chapter 3. This model, termed scale-free percolation in Deijfen et al.(2013), provides a percolation model in which the degree of a vertex can havefinite mean but infinite variance. Mind that this phenomenon is impossible forindependent percolation models, since the independence of the edge variablesimplies that the variance of the degrees is always bounded by their mean.

Scale-free percolation

Scale-free percolation is defined on the lattice Zd. Let each vertex x ∈ Zd beequipped with an i.i.d. weight Wx. Conditionally on the weights (Wx)x∈Zd , theedges in the graph are independent and the probability that there is an edgebetween x and y is defined by

pxy = 1− e−λWxWy/|x−y|α , (9.5.53)

for α, λ ∈ (0,∞). We say that the edge x, y is occupied with probability pxyand vacant otherwise.

Let us discuss the role of the different parameters in the scale-free percolationmodel. The parameter α > 0 describes the long-range nature of the model, whilewe think of λ > 0 as the percolation parameter. The weight distribution is the lastparameter that describes the model. We are mainly interested in settings wherethe Wx have unbounded support in [0,∞), and then particularly when they varysubstantially.

Naturally, the model for fixed λ > 0 and weights (Wx)x∈Zd is the same asthe one for λ = 1 and weights (

√λWx)x∈Zd , so there is some redundancy in the

parameters of the model. However, we view the weights (Wx)x∈Zd as creating arandom environment in which we study the percolative properties of the model.Thus, we think of the random variables (Wx)x∈Zd as fixed once and for all andwe change the percolation configuration by varying λ. We can thus view ourmodel as percolation in a random environment given by the weights (Wx)x∈Zd .The downside of scale-free percolation is that the edge statuses are no longerindependent random variables, but are rather positively correlated (see Exercise9.40).

Scale-free percolation interpolates between long-range percolation and rank-1inhomogeneous random graphs. Indeed, we retrieve long-range percolation whenwe take Wx ≡ 1. We retrieve the Norros-Reittu model in (3.2.8) with i.i.d. edge

418 Related Models

weights when we take α = 0, λ = 1/∑

i∈[n]Wi and consider the model on [n]

instead of Zd. Thus, this model can be considered to be an interpolation betweenlong-range percolation and rank-1 inhomogeneous random graphs.

Choice of edge weights

We assume that the distribution FW of the weights (Wx)x∈Zd has a regularly-varying tail with exponent τ − 1, that is, denoting by W a random variable withthe same distribution as W0 and by FW its distribution function, we assume that

1− FW (w) = P(W > w) = w−(τ−1)L(w), (9.5.54)

where w 7→ L(w) is a function that varies slowly at infinity. Here we recall thata function L varies slowly at infinity when, for every x > 0,

limt→∞

L(tx)

L(x)= 1. (9.5.55)

Examples of slowly-varying functions are powers of logarithms. See the classicalwork by Bingham, Goldie and Teugels Bingham et al. (1989) for more informationabout regularly-varying functions. We interpret τ > 1 as the final parameter ofour model, next to α, λ (and the dimension d). Of course, there may be manyvertex-weight distributions having the asymptotics in (9.5.54) with the same τ ,but the role of τ is so important in the sequel that we separate it out.

Write Dx for the degree of x ∈ Zd and note that, by translation invariance, Dx

has the same distribution as D0. The name scale-free percolation is justified bythe following theorem:

Theorem 9.35 (Power-law degrees for power-law weights Deijfen et al. (2013))Fix d ≥ 1.

(a) Assume that the weight distribution satisfies (9.5.54) with α ≤ d or γ = α(τ −1)/d ≤ 1. Then P(D0 =∞ |W0 > 0) = 1.

(b) Assume that the weight distribution satisfies (9.5.54) with α > d and γ =α(τ − 1)/d > 1. Then there exists s 7→ `(s) that is slowly varying at infinitysuch that

P(D0 > s) = s−γ`(s). (9.5.56)

The fact that the degrees have a power-law distribution is why this model iscalled scale-free percolation. The parameter γ measures how many momentsof the degree distribution are finite. In general, when edges are present inde-pendently, but not with the same probability, it is impossible to have infinite-variance degrees in the long-range setting (see Exercise 9.39). We continue bystudying the percolative properties of scale-free percolation. As usual, denote byC (x) = y : x ←→ y the connected component or cluster of x, and by |C (x)|the number of vertices in C (x). Further, define the percolation probability as

θ(λ) = P(|C (0)| =∞), (9.5.57)


and the critical percolation threshold λc as

λc = infλ : θ(λ) > 0. (9.5.58)

As before, we denote by λc the infimum of all λ ≥ 0 with the property P(|C (0)| =∞) > 0. It is a priori unclear whether λc < ∞ or not. Deijfen et al. (Deijfenet al., 2013, Theorem 3.1) prove that λc < ∞ holds in most cases. Indeed, ifP(W = 0) < 1, then λc < ∞ in all d ≥ 2. Naturally, d = 1 again is special andthe results in (Deijfen et al., 2013, Theorem 3.1) are not optimal. It is shown thatif α ∈ (1, 2] and P(W = 0) < 1, then λc < ∞ in d = 1, while if α > 2 and τ > 1is such that γ = α(τ − 1)/d > 2, then λc =∞ in d = 1.

More interesting is whether λc = 0 or not. The following theorem shows thatthis depends on whether the degrees have infinite variance or not:

Theorem 9.36 (Positivity of the critical value Deijfen et al. (2013)) Assumethat the weight distribution satisfies (9.5.54) with τ > 1 and that α > d.

(a) Assume that γ = α(τ − 1)/d > 2. Then, θ(λ) = 0 for small λ > 0, that is,λc > 0.

(b) Assume that γ = α(τ − 1)/d < 2. Then, θ(λ) > 0 for every λ > 0, that is,λc = 0.

In ordinary percolation, instantaneous percolation in the form λc = 0 canonly occur when the degree of the graph is infinite. The randomness in the ver-tex weights facilitates instantaneous percolation in scale-free percolation. We seea similar phenomenon for rank-1 inhomogeneous random graphs, such as theNorros-Reittu model. The instantaneous percolation is related to robustness ofthe random network under consideration. Again we see that graph distances arerather small if γ ∈ (1, 2), whereas graph distances are much larger for γ > 2.

We see that γ ∈ (1, 2), where the variance of the degrees is infinite, is specialin the sense that instantaneous percolation occurs as for rank-1 random graphs.This raises the questions to which extent the analogy extends. For example, inrank-1 random graphs, the scaling limits within the scaling window are differentfor random graphs having infinite third moments of the degrees than for thosefor which the third moment is finite. This indicates that the critical behavior ofscale-free percolation might be different for γ ∈ (2, 3), where the degrees haveinfinite third moment, compared to γ > 3 where the degrees have finite thirdmoment, particularly in high dimensions. Indeed, we can think of the Norros-Reittu as a kind of mean-field model for this setting, certainly when we restrictscale-free percolation to the torus. It would be of great interest to investigatethese models in more detail.

Another scale-free percolation model by Yukich

In this section, we discuss the results in Yukich (2006) on an infinite scale-freepercolation model. Note that, for a transitive graph with fixed degree r andpercolation with a fixed percolation parameter p, the degree of each vertex has

420 Related Models

a binomial distribution with parameters r and p. Since r is fixed, this does notallow for a power-law degree sequence. As a result, it is impossible to have a scale-free random graph when dealing with independent percolation, so that we shallabandon the assumption of independence of the different edges, while keeping theassumption of translation invariance.

The model considered in Yukich (2006) is on Zd, and, thus, the definition ofa scale-free graph process does not apply so literally. We adapt the definitionslightly by saying that an infinite random graph is scale-free when

pk = P(Do = k), (9.5.59)

where Dx is the degree of vertex x ∈ Zd and o ∈ Zd is the origin, satisfies (1.4.4).This is a reasonable definition, since if let Br = [−r, r]d ∩ Zd be a cube of widthr around the origin, and denote n = (2r + 1)d, then, for each k ≥ 0,

P (n)

k =1

n

∑x∈Br

1Dx=k, (9.5.60)

which, assuming translation invariance and ergodicity, converges to pk.We next describe the model in Yukich (2006). We start by taking an i.i.d.

sequence Uxx∈Zd of uniform random variables on [0, 1]. Fix δ ∈ (0, 1] and q ∈(1/d,∞). The edge x, y ∈ Zd×Zd appears in the random graph precisely when

|x− y| ≤ δminU−qx , U−qy . (9.5.61)

We can think of the ball of radius δU−qx as being the region of influence of x,and two vertices are connected precisely when each of them lies into the region ofinfluence of the other. This motivates the choice in (9.5.61). The parameter δ canbe interpreted as the probability that nearest-neighbors are connected, and in thesequel we shall restrict ourselves to δ = 1, in which case the infinite connectedcomponent equals Zd. We denote the resulting (infinite) random graph by Gq.

We next discuss the properties of this model, starting with its scale-free nature.In (Yukich, 2006, Theorem 1.1), it is shown that, with τ = qd/(qd− 1) ∈ (1,∞),the limit

limk→∞

kτ−1P(Do ≥ k) (9.5.62)

exists, so that the model is scale free with degree power-law exponent τ (recall(1.4.3)). The intuitive explanation of (9.5.62) is as follows. Suppose we conditionon the value of Uo = u. Then, the conditional distribution of Do given that Uo = uis equal to

Do =∑x∈Zd

1|x|≤minU−qo ,U−qx =∑

x : |x|≤u−q1|x|≤U−qx . (9.5.63)

Note that the random variables 1|x|≤U−qx x∈Zd are independent Bernoulli ran-dom variables with probability of success equal to

P(1|x|≤U−qx = 1) = P(U ≤ |x|−1/q) = |x|−1/q. (9.5.64)


In order for Do ≥ k to occur, for k large, we must have that Uo = u is quite small,and, in this case, a central limit theorem should hold for Do, with mean equal to

E[Do | Uo = u] =∑

x : |x|≤u−q|x|−1/q = cu−(qd−1)(1 + o(1)), (9.5.65)

for some explicit constant c = c(q, d). Furthermore, the conditional variance ofDo given that Uo = u is bounded above by its conditional expectation, so thatthe conditional distribution of Do given that Uo = u is highly concentrated. Weomit the detail, and merely note that this can be made precise by using standardlarge deviations result. Assuming sufficient concentration, we obtain that theprobability that Do ≥ k is asymptotically equal to the probability that U ≤ uk,where uk is determined by the equation that

E[Do | Uo = uk] = cu−(qd−1)k (1 + o(1)) = k, (9.5.66)

so that uk = (k/c)−1/(qd−1). This suggests that

P(Do ≥ k) = P(U ≤ uk)(1 + o(1)) = (ck)−1/(qd−1)(1 + o(1)), (9.5.67)

which explains (9.5.62).We next turn to distances in this scale-free percolation model. For x, y ∈ Zd, we

denote by dGq(x, y) the graph distance (or chemical distance) between the verticesx and y, i.e., the minimal number of edges in Gq connecting x and y. The mainresult in Yukich (2006) is the following theorem:

Theorem 9.37 (Ultra-small distances for scale-free percolation) For all d ≥ 1and all q ∈ (1/d,∞), whp as |x| → ∞,

dGq(o, x) ≤ 8 + 4 log log |x|. (9.5.68)

The result in Theorem 9.37 shows that distances in the scale-free percolationmodel are much smaller than those in normal percolation models. It would beof interest to investigate whether the limit dGq(o, x)/ log log |x| exists, and, if so,what this limit is.

While Theorem 9.37 resembles the results in Theorem 7.21, there are a fewessential differences. First of all, Gq is an infinite graph, whereas the models con-sidered in Theorem 7.21 are all finite. It would be of interest to extend Theorem9.37 to the setting on finite tori, where the Euclidean norm |x − y| in (9.5.61)is replaced by the Euclidean norm on the torus, and the typical distance Hn isconsidered. This result is not immediate from the proof of Theorem 9.37. Sec-ondly, in Theorems 7.20 and 7.21, it is apparent that the behavior for τ > 3 israther different compared to the behavior for τ ∈ (2, 3). This feature is missingin Theorem 9.37. It would be of interest to find a geometric random graph modelwhere the difference in behavior between τ > 3 and τ ∈ (2, 3) also appears.

The result in Theorem 9.37 can be compared to similar results for long-rangepercolation, where edges are present independently, and the probability that theedge x, y is present equals |x − y|−s+o(1) for some s > 0. In this case, detailedresults exist for the limiting behavior of d(o, x) depending on the value of s. For

422 Related Models

example, in Benjamini et al. (2004), it is shown that the diameter of this infinitepercolation model is equal to dd/(d − s)e a.s. See also Biskup (2004) and thereferences therein.

Spatial configuration models on the lattice

TO DO 9.14: Introduce the spatial configuration model and results of Deijfen et al.



Citation networks. Our discussion of citation networks is inspired by, and follows,Garavaglia et al. (2017). For citation networks, there is a rich literature to pro-pose models for them using preferential attachment schemes and adaptations ofthem, mainly in the physics literature. Aging effects, i.e., considering the age ofa vertex in its likelihood to obtain children, have been extensively considered asthe starting point to investigate their dynamics, see Wang et al. (2009, 2008);Hajra and Sen (2005, 2006); Csardi (2006). Here the idea is that old papers areless likely to be cited than new papers. Such aging has been observed in manycitation network datasets and makes PAMs with weight functions depending onlyon the degree ill-suited for them. As mentioned above, such models could moreaptly be called old-get-richer models, i.e., in general old vertices have the highestdegrees. In citation networks, instead, papers with many citations appear all thetime. Wang et al. (2013) investigate a model that incorporates these effects, seealso Wang et al. (2014) for a comment on the methods in this paper. On thebasis of empirical data, they suggest a model where the aging function follows alognormal distribution with paper-dependent parameters, and the preferential at-tachment function is the identity. Wang et al. (2013) estimate the fitness functionrather than the more classical approach where it is taken to be an i.i.d. sampleof random variables.


The local weak convergence for PageRank was proved by Garavaglia et al. (2020),where also the notion of the marked backward local weak limit was introduced.We have extended the discussion on local weak convergence for directed graphshere, since these other notions are useful in other contexts as well, for examplein studying the strongly connected component.

Directed inhomogeneous random graphs. Bloznelis et al. (2012) study the generaldirected inhomogeneous random graph studied in this section, and prove Theo-rem 9.3. Cao and Olvera-Cravioto (2020) continue this analysis, and generalizeit substantially. While the local weak convergence result in Theorem 9.2 has notbeen proved anywhere explicitly, it is the leading idea in the identification of thephase transition, as well as the description of the limiting branching processes andjoint degree distribution. Garavaglia et al. (2020) does investigate the marked di-


rected forward local weak convergence for directed rank-1 inhomogeneous randomgraphs.

Cao and Olvera-Cravioto (2020) specifically investigate general rank-1 inhomo-geneous digraphs. Our choice of edge probabilities in (9.2.14)–(9.2.15) is slightlydifferent from that by Cao and Olvera-Cravioto (2020), particularly since thefactor 1

2in (9.2.15) is missing in Cao and Olvera-Cravioto (2020). We have added

it so as to make the average in-degree of a vertex of in-weight w(in)

i approximatelyw(in)

i . If this were to be true for every i, then we would also need that∑i∈[n]

w(in)

i ≈∑i∈[n]

w(out)

i ,

which would imply the limiting statement in (9.2.17). Lee and Olvera-Cravioto(2017) use the results in Cao and Olvera-Cravioto (2020) to prove that the lim-iting PageRank of such directed generalized random graphs exists, and that thesolution obeys the same recurrence relation as on a Galton-Watson tree. In par-ticular, under certain independence assumptions, this implies that the PageRankpower-law hypothesis is valid for such models.

Directed configuration models. The directed configuration model was investigatedby Cooper and Frieze (2004), where the results discussed here are proved. Infact, the results in Cooper and Frieze (2004) are much more detailed than theone in Theorem 9.5, and also include detailed bounds on the strongly connectedcomponent in the subcritical regime, as well as precise bounds on the number ofvertices whose forward and backward clusters are large and the asymptotic sizeof forward and backward clusters. A substantially simpler proof was given by Caiand Perarnau (2020b).

Both Cooper and Frieze (2004) as well as Cai and Perarnau (2020b) makeadditional assumptions on the degree distribution. In particular, they assumethat E[D(in)

n D(out)

n ]→ E[D(in)D(out)] <∞, which we do not assume. Further, Cooperand Frieze (2004) assume that d is proper, which is a technical requirement onthe degree sequence stating that (a) E[(D(in)

n )2] = O(1), E[(D(out)

n )2] = O(1); (b)E[D(in)

n (D(out)

n )2] = o(n1/12 log n). In view of the fact that such conditions do notappear in Theorem 4.4, these conditions can be expected to be suboptimal forTheorem 9.5 to hold, and we next explain how they can be avoided by a suitabletruncation argument.

Assume that the out- and in-degrees in directed configuration model DCMn(d)satisfy (9.2.24) and (9.2.25). By Exercise 9.17 below, |Cmax| ≤ n(ζ + ε) whp forn large and any ε > 0. Exercise 9.17 is proved by an adaptation of the proofof Corollary 2.25 in the undirected setting. Thus, we only need to show that|Cmax| ≤ n(ζ − ε) whp for n large and any ε > 0

Fix K > 1 very large. We now construct a lower bounding directed config-uration model where all the degrees are bounded by K. This is similar to theconstruction for the undirected configuration model in Section 4.2.1. When v issuch that dv = d(out)

v + d(in)

v ≥ K, we split v into nv = ddv/Ke vertices, and wedeterministically redistribute all out- and in-half-edges over the nv vertices in an

424 Related Models

arbitrary way such that the out- and in-degrees of all the vertices that used tocorrespond to v now have both out- and in-degree bounded by K. Denote thecorresponding random graph by DCM′n.

The resulting degree sequence again satisfies (9.2.24) and (9.2.25). Moreover,for K > 1 large and by (9.2.24) and (9.2.25), the limits in (9.2.24) and (9.2.25)for the new degree sequence are quite close to the original limits of the old degreesequence, while at the same time now having bounded degrees. As a result, wecan apply the original result in Cooper and Frieze (2004) or Cai and Perarnau(2020b) to the new setting.

Denote the size of the SCC by |C ′max|. Obviously, since we split vertices, |C ′max| ≤|Cmax|+

∑v∈[n](nv − 1). Therefore, |Cmax| ≥ |C ′max| −

∑v∈[n](nv − 1). Take K > 0

so large that∑

v∈[n](nv − 1) ≤ εn/3 and that ζ ′ ≥ ζ − ε/3, where ζ ′ is the

backward-forward survival probability of the limiting directed DCM′n and ζ thatof DCMn(d). Finally, for every ε > 0, whp |C ′max ≥ n(ζ ′ − ε/3)n. As a result, weobtain that, again whp,

|Cmax| ≥ |C ′max| − nε/3 ≥ n(ζ ′ − 2ε/3) ≥ n(ζ − ε), (9.6.1)

as required.Chen and Olvera-Cravioto (2013) study a way to obtain nearly i.i.d. in-and

out-degrees in the directed configuration model. Here, the problem is that when((d(in)

i , d(out)

i ))i∈[n] is an i.i.d. bivariate distribution with equal means, then∑

i∈[n](d(in)

i −d(out)

i ) has Gaussian fluctuations at best, so that it will not be zero. Chen andOlvera-Cravioto (2013) indicate how the excess in- or out-half-edges can be re-moved so as to keep the degrees close to i.i.d. Further, they also show that theremoval of self-loops and multiple directed edges does not significantly change thedegree distribution (so that in particular, one would assume that the local weaklimits are the same, but Chen and Olvera-Cravioto (2013) stick to the degreedistribution).

Theorem 9.7 follows from (Cai and Perarnau, 2020a, Proposition 7.7). Theorem9.6 was first proved by van der Hoorn and Olvera-Cravioto (2018) under strongerassumptions, but then also the claim proved is much stronger. Indeed, van derHoorn and Olvera-Cravioto (2018) not only identify the first order assymptoticsas in Theorem 9.7, but also the fluctuations like those stated for the undirectedconfiguration model in Theorem 7.20. This proof is substantially harder that thatof (Cai and Perarnau, 2020a, Proposition 7.7). Theorem 9.7 was proved by Caiand Perarnau (2020a).


Stochastic blockmodel. Stochastic blockmodels were introduced by Holland et al.(1983) in the context of group structures in social networks. Their introductionwas inspired by the work of Fienberg and Wasserman (1981). There, also theproblem of community detection was discussed. Of course, we have already seenthem in the context of inhomogeneous random graphs, so our focus here is on thedetection problem.


The definition of solvable in Definition 9.8, in particular (9.3.2), has appearedin different forms in the literature. For example, Bordenave et al. (2018) insteaduse the definition

minp : [r]→[r]

1

n

∑v∈[n]

[1σ(v)=(pσ)(v) −

1

r

], (9.6.2)

where the minimum is over all possible permutations p from [r] to [r].The precise threshold in Theorem 9.9 was conjectured by Decelle et al. (2011),

based on a non-rigorous belief propagation method. The proof of the impossibilityresult of the block model threshold conjecture was first given by Mossel et al.(2015). The proof of the solvable part of the block model threshold conjecturewas first given by Mossel et al. (2018), and independently by Massoulie (2014).Mossel et al. (2016) give an algorithm that maximizes the fraction of verticeslabeled correctly. The result were announced in a high-impact format by Krzakalaet al. (2013). The achievability result for general number of types r in (9.3.5) wasproved by Abbe and Sandon (2018) and Bordenave et al. (2018). We follow thepresentation in Bordenave et al. (2018). The convergence of the eigenvalues in(9.3.8) is (Bordenave et al., 2018, Theorem 4), the estimation of the types in(9.3.10) is investigated in (Bordenave et al., 2018, Theorem 5). There, it is alsoshown that this choice has the positive overlap property in (9.6.2), which impliesthe solvability in Definition 9.8. We have simplified the presentation substantiallyby considering r = 2 and an equal size of the groups of the two types.

Degree-corrected stochastic blockmodel. The degree-corrected stochastic block modelwas first introduced in Karrer and Newman (2011). Consistent estimation of com-munities in degree-corrected stochastic block model was investigated by Zhaoet al. (2012), under the assumption that the average degree tends to infinity forweak consistency, and grows faster than log n for strong consistency. Sparse set-tings suffer from a similar threshold phenomenon as for the original stochasticblockmodel as derived in Mossel et al. (2018); Massoulie (2014). The impossibil-ity result in Theorem 9.10 was proved by Gulikers et al. (2018). The proof of thesolvable parts were proved in Gulikers et al. (2017a,b).

Preferential attachment models with community structure. Jordan (2013) inves-tigates preferential attachment models in general metric spaces. When one takesthese metric spaces as discrete sets, one can interpret the geometric location as atype or community label. This interpretation was proposed by Hajek and Sank-agiri (2018), where also our results on community detection are proved. We willreturn to the geometric interpretation of the model in Section 9.5.4. The resultin (9.3.23) is stated in (Jordan, 2013, Theorem 2.1). The asymptotics of the de-gree distribution in Theorem 9.11 is (Jordan, 2013, Theorem 2.2). (Jordan, 2013,Theorem 2.3) further contains some estimates of the number of vertices in givenregions and with given degrees. We come back to such issues below. The conver-gence of the proportion of errors in (9.3.29) is stated in (Hajek and Sankagiri,2018, Proposition 8), to which we refer for the formula for Err.

A preferential attachment model with community structure, phrased as a coex-

426 Related Models

istence model, was introduced by Antunovic et al. (2016). In their model, contraryto the setting of Jordan (2013), the vertices choose their type based on the typesof their neighbors, thus creating the possibility of denser connectivity betweenvertices of the same type and thus community structure. Again, this may lead tocommunity structure. The focus of Antunovic et al. (2016) is the coexistence ofall the different types or rather a winner-takes-it-all phenomenon, depending onthe precise probability of choosing a type depending on the number of neighborsof all types. Even with two types, the behavior is quite involved and dependssensitively on the type choosing distribution. For example, for the majority rule(where the type of the new vertex is the majority of types of its older neighbors),the winner type takes it all, while if this probability is linear in the number ofolder neighbors of a given type, there is always coexistence.


Empirical properties of real-world network with community structure were stud-ied in Stegehuis et al. (2016b).

Inhomogeneous random graphs with communities. The inhomogeneous randomgraphs with communities was introduced by Bollobas et al. (2011).

Configuration models with community structure. The hierarchical configurationmodel was introduced by van der Hofstad et al. (2015). The fit to real-worldnetworks, particularly in the context of epidemics, was studied by Stegehuis et al.(2016a). The configuration model with household structure was investigated inBall et al. (2009, 2010) in the context of epidemics on social networks. Particularlywhen studying epidemics on networks, clustering is highly relevant, as clusteringslows down the spread of infectious diseases. Random intersection graph withprescribed degrees and groups are studied in a non-rigorous way in Newman(2003); Newman and Park (2003).

The configuration model with clustering was defined by Newman (2009), whostudied it non-rigorously.

Random intersection graphs. Random intersection graphs were introduced bySinger (1996) and further studied in Fill et al. (2000); Karonski et al. (1999);Stark (2004). Theorem 9.17 is (Deijfen and Kets, 2009, Theorem 1.1), where theauthors also proved that the clustering can be controlled. The model has also beeninvestigated for more general distributions of groups per vertex by Godehardt andJaworski (2003) and Jaworski et al. (2006). We refer to Bloznelis et al. (2015) fora survey of recent result.

Rybarczyk (2011) studies various properties of the random intersection graphwhen each vertex is in precisely d groups that are all chosen uniformly at randomfrom the collection of m groups. In particular, Rybarczyk (2011) proves resultson the giant as in Theorem 9.19, as well as on the diameter of the graph, whichis ΘP(log n) when the model is sparse.

Mindaugas Bloznelis (2009, 2010a,b) studies a general random intersectionmodel, where the sizes of groups are i.i.d. random variables, and then set of


the vertices in them are chosen uniformly at random from the vertex set. His re-sults include distances Bloznelis (2009) and component sizes Bloznelis (2010a,b).Bloznelis (2013) studies degree and clustering structure in this setting.

Theorem 9.18 is proved by Kurauskas (2015), see also van der Hofstad et al.(2019b). Both papers investigate more general settings, Kurauskas (2015) allowsalso for settings with independent group memberships, while van der Hofstadet al. (2019b) also allows for more general group structures than the completegraph.

Theorem 9.19 is proved by van der Hofstad et al. (2019a).

Random intersection graphs with communities. van der Hofstad et al. (2019b)propose a model that combines the random intersection graph with more gen-eral communities than complete graphs. Hofstad et al. (2019b) identify the locallimit, as well as the nature of the overlaps between different communities. vander Hofstad et al. (2019a) identifies the giant component, also when performingpercolation on the model. See Vadon et al. (2019) for an informal description ofthe model, aimed at a broad audience.

Exponential random graphs. For a general introduction to exponential randomgraphs, we refer to Snijders et al. (2006); Wasserman and Pattison (1996). Frankand Strauss (1986) discuss the notion of Markov graphs, for which the edges ofthe graph form a Markov field. The general exponential random graph is only aMarkov field when the subgraphs are restricted to edges, stars of any kind, andtriangles. This is exemplified by Example 9.21, where general degrees were useand gave rise to a model with independent edges. Kass and Wasserman (1996)discuss relations to Bayesian statistics.

For a discussion on the relation between statistical mechanics and exponentialmodels, we refer to Jaynes (1957). Let us know explain the relation between ex-ponential random graphs and entropy maximization. Let (px)x∈X be a probabilitymeasure on a general discrete set X . We define its entropy by

H(p) = −∑x∈X

px log px. (9.6.3)

Shannon (1948) proved that the entropy is the unique the quantity that is posi-tive, increases with increasing uncertainty, and is additive for independent sourcesof uncertainty, so it is a very natural quantity. Entropy measures the amount ofrandomness in a system. The relation to exponential random graphs is that theyare the random graphs that, given the conserved expected values of the subgraphcounts NF (Gn), optimize the entropy. Indeed, recall that X = (Xi,j)i≤i<j≤n arethe edge statuses of the graph, so that Gn is uniquely characterized by X, andmaximize H(p) over all the p such that

∑xNF (x) = αF for some given αF and

all subgraphs F ∈ F , where F is an appropriate set of subgraphs. Then, usingLagrange multipliers, the optimization problem reduces to

p~β(x) =1

Ze∑F∈F βFNF (x), (9.6.4)

428 Related Models

where Z = Zn(~β) is the normalization constant given in (9.4.34), and ~β =(βF )F∈F is chosen as the solution to (9.4.35). This implies that, indeed, the ex-ponential random graph model optimizes the entropy under this constraint.

See also Kass and Wasserman (1996) for a discussion of maximum entropy, anda reference to its long history as well as critique on the method.

An important question is how one can find the appropriate ~β = (βF )F∈F suchthat (9.4.35) holds. This is particularly difficult, since the computation of the

normalization constant Zn(~β) in (9.4.34) is quite hard. Often, Markov ChainMonte Carlo (MCMC) techniques are used to sample from p efficiently. In thiscase, such MCMC techniques perform a form of Glauber dynamics, for which(9.4.33) is the stationary distribution. One can then try to solve (9.4.35) bykeeping track of the value of NF in the simulation. However, these methods canbe very slow, as well as daunting, since the behavior of NF (X) under p = p~βmay undergo a phase transition, making

∑xNF (x)p~β(x) very sensitive to small

changes of ~β. See in particular Chatterjee and Diaconis (2013) for a discussionon this topic.

Bhamidi et al. (2011) (see also Bhamidi et al. (2008)) have studied the mixingtime of the exponential random graph, when edges are changed dynamically ina Glauber way. The results are somewhat disappointing, since either edges areclose to being i.i.d. or the mixing is very slow. These results, however, applyonly to dense settings where the number of edges grows quadratically with thenumber of vertices. This problem is closely related to large deviations on denseErdos-Renyi random graph. See also Chatterjee (2017) for background on suchlarge deviations, and Chatterjee and Varadhan (2011) for the original paper.

For more background on sufficient statistics and their relation to symmetries,we refer to Diaconis (1992). For a discussion of the relation between informationtheory and exponential models, we refer to Shore and Johnson (1980).


The Newman-Watts small-world model. There are various ways of adding long-range connections (for example by rewiring the existing edges), and we shallfocus on the models in Barbour and Reinert (2001, 2004, 2006), for which thestrongest mathematical results have been obtained. Small-world models were firstintroduced and analyzed in Moore and Newman (2000); Newman et al. (2000a);Newman and Watts (1999), and a non-rigorous mean-field analysis of distancesin small-world models was performed in Newman et al. (2000a). See Barbour andReinert (2001) for a discussion of the differences between the exact and mean-fieldanalyses.

Theorem 9.22 is (Barbour and Reinert, 2001, Theorem 3.9). The proof of The-orem 9.22 was extended by Barbour and Reinert (2006) to deal with discrete toriwhere the shortcuts also contribute one to the graphs distance, so that distancesare the usual distances on discrete graphs.

Theorem 9.23 follows from (Barbour and Reinert, 2006, Theorem 2.1), whicheven allows ρ to grow with n (though not too quickly).


A related model was considered by Turova and Vallier (2010). Indeed, Turovaand Vallier (2010) study a mixture between subcritical percolation on a finitecube and the Erdos-Renyi random graph. Using the methodology of Bollobaset al. (2007), it is shown that the phase transition is similar to the one describedin Theorem 3.16. It would be of interest to verify whether the distance resultsin Bollobas et al. (2007) can also be used to prove that the distances grow likelogν n, where n is the size of the graph, and ν > 1 an appropriate constant.

Hyperbolic random graph. The hyperbolic random graph was introduced by Kri-oukov et al. (2010). The first rigorous results were proved by Gugelmann et al.(2012), who proved Theorem 9.24 and identified the exact asymptotic degree dis-tribution (see (Gugelmann et al., 2012, Theorem 2.2)), as well as the positivityof the clustering in the graph (see (Gugelmann et al., 2012, Theorem 2.1)). Thestatement about the maximal degree in the hyperbolic graph is stated in (Gugel-mann et al., 2012, Theorem 2.4). The sharpest results on the degree distributionand clustering were proved by Fountoulakis et al. (2020), who identify the ex-act clustering coefficient in a quite long and technical paper, who also provedTheorem 9.27.

Bode et al. (2015) proved Theorem 9.25. Kiwi and Mitsche (2019) study thesize of the second largest component in hyperbolic random graphs. Friedrich andKrohmer (2018) and Kiwi and Mitsche (2015) (see also Friedrich and Krohmer(2015)) study the diameter of hyperbolic graphs. Abdullah et al. (2017) identifythe ultra-small-world properties of the hyperbolic random graph in Theorem 9.26.

Blasius et al. (2018) study the size of the largest cliques in hyperbolic graphs.

Geometric inhomogeneous random graph. The geometric inhomogeneous randomgraph was introduced by Bringmann et al. (2015, 2016). The relation betweenthe hyperbolic random graph and the general GIRG as described in Theorem9.29 can be found in (Bringmann et al., 2017, Section 7), where limits are derivedup to constants. Theorem 9.29 is proved in (Komjathy and Lodewijks, 2020,Section 7), who also study its weighted distances focussing on the case whereτ ∈ (2, 3). Theorem 9.31 is proved by Komjathy and Lodewijks (2020). In moredetail, a coupling version of Theorem 9.31 is stated in (Komjathy and Lodewijks,2020, Claim 3.3), where a blown-up version of the GIRG is bounded from belowand above by the limiting model with slightly smaller and larger intensities,respectively. Take a vertex uniformly at random in the GIRG. Then, whp it isalso present in the lower and upper bounding limiting Poisson infinite GIRG withedge-probabilities given by h. Similarly, whp none of the edges within a ball ofintrinsic radius r will be different in the three models, which proves local weakconvergence. Local convergence in probability would follow from a coupling of theneighborhoods of two uniformly chosen vertices in the GIRG to two independentlimiting copies. Such independence is argued in (Komjathy and Lodewijks, 2020,Proof of Theorem 2.12), so in particular around (Komjathy and Lodewijks, 2020,(3.16)), this result is not stated.

Local weak limits as arising in Theorem 9.31, in turn, were studied by Hirsch

430 Related Models

(2017). Fountoulakis (2015) studied an early version of a geometric Chung-Lumodel.

Scale-free percolation. Graph distances in scale-free percolation have been investi-gated in Deijfen et al. (2013); Heydenreich et al. (2016) by identifying the numberof edges between x and y as a function of |x−y| for x, y in the infinite component.There is some follow-up work on scale-free percolation. Hirsch (2017) proposes acontinuum model for scale-free percolation. Deprez et al. (2015) argue that scale-free percolation can be used to model real-life networks. Bringmann et al. (2015,2016) study this model on a torus and in continuum space and coin the namegeometric inhomogeneous random graphs. Heydenreich et al. (2016) establish re-currence and transience criteria. Deprez et al. (2015) show that when α ∈ (d, 2d),then the percolation function is continuous. For long-range percolation this wasproved by Berger (2002). However, in full generality, continuity of the percolationfunction as λ = λc when λc > 0 is unknown.

Spatial preferential attachment models. Jordan (2013) studies preferential attach-ment models where the vertices are located in a general metric space. Flaxmanet al. (2006, 2007) study geometric preferential attachment models. Aiello et al.(2008) give the interpretation of spatial preferential attachment models in termsof influence regions.

For a relation between preferential attachment graphs with so-called fertilityand aging, and a geometric competition-induced growth model for networks, werefer to Berger et al. (2004, 2005) and the references therein.

Complex network models on the hypercubic lattice. Spatial configuration modelson the lattice were introduced by Deijfen and Jonasson (2006), see also Deijfenand Meester (2006). Deijfen (2009) study a related model where the vertices area Poisson point process on Rd.


Exercise 9.1 (Topology of the strongly-connected component for digraphs) LetG be a digraph. Prove that if u and v are such that u is connected to v and v isconnected to u, then the strongly connected components of u and v are the same.

Exercise 9.2 (Sum of out- and in-degrees digraph agree) Let G be a digraphfor which d(out)

v and d(in)

v denote the out- and in-degree of v ∈ V (G). Show that∑v∈V (G)

d(out)

v =∑

v∈V (G)

d(in)

v . (9.7.1)

Exercise 9.3 (LWC for randomly directed graphs) Let (Gn)n≥1 be a randomgraph sequence that converges in probability in the local weak sense. Give each edgee a random orientation, by orienting e = u, v as e = (u, v) with probability 1

2

and as e = (v, u) with probability 12, independently across edges. Show that the

resulting digraph converges in probability in the marked forward and backwardand forward-backward local weak convergence sense.


Exercise 9.4 (LWC for randomly directed graphs (Cont.)) In the setting ofExercise 9.3, assume that the convergence of (Gn) is in distribution in the localweak sense. Conclude that the resulting digraph converges in distribution in themarked forward and backward and forward-backward local weak convergence sense.

Exercise 9.5 (LWC for directed version of PA(m,δ)

n (d)) Consider the edges inPA(m,δ)

n (d) to be oriented from young to old, so that the resulting digraph has out-degree m and random in-degrees. Use Theorem 5.8 to show that this digraph con-verges in probability in the marked forward and backward and forward-backwardlocal weak convergence sense.

Exercise 9.6 (Power-law lower bound for PageRank on directed version ofPA(m,δ)

n (d)) Recall the directed version of PA(m,δ)

n (d) in Exercise 9.5. Use (9.2.10)to show that there exists a constant c = c(α, δ,m) > 0 such that

P(R∅ > r) ≥ cr−τ , where τ = 3 +δ

m(9.7.2)

is the power-law exponent of PA(m,δ)

n (d). What does this say about the PageRankpower-law hypothesis for the directed version of PA(m,δ)

n (d)?

Exercise 9.7 (Power-law lower bound for PageRank on digraphs with boundedout-degrees) Let (Gn)n≥1 be a random digraph sequence that converges in prob-ability in the marked backward local weak sense to (D,∅). Assume that thereexist 0 < a, b < ∞ such that d(out)

v ∈ [a, b] for all v ∈ V (Gn). Assume thatP(D(in)

∅ > r) ≥ cx−γ for some γ > 0. Use (9.2.10) to show that there exists aconstant c′ > 0 such that P(R∅ > r) ≥ c′r−γ.

Exercise 9.8 (Mean number of edges in DGRGn(w)) Consider the directedgeneralized random graph, as formulated in (9.2.14)–(9.2.15). Assume that theweight-regularity condition in (9.2.16) holds. Let Xij be the indicator that thereis a directed edge from i to j (with Xii = 0 for all i ∈ [n] by convention). Showthat

1

nE[ ∑i,j∈[n]

Xij

]→ 2E[W (in)]E[W (out)]

E[W (in) +W (out)]. (9.7.3)

Assume further that the symmetry condition in (9.2.17) holds, and conclude that

1

nE[ ∑i,j∈[n]

Xij

]→ E[W (in)] = E[W (out)]. (9.7.4)

Exercise 9.9 (Number of edges in DGRGn(w)) In the setting of Exercise 9.8,show that

1

n

∑i,j∈[n]

XijP−→ 2E[W (in)]E[W (out)]

E[W (in) +W (out)]. (9.7.5)

Assume further that the symmetry condition in (9.2.17) holds, and conclude that

1

n

∑i,j∈[n]

XijP−→ E[W (in)] = E[W (out)]. (9.7.6)

432 Related Models

Exercise 9.10 (Local weak limit of directed Erdos-Renyi random graph) UseTheorem 9.2 to describe the local weak limit of the directed Erdos-Renyi randomgraph.

Exercise 9.11 (LWC for finite-type directed inhomogeneous random graphs)Adapt the proof of Theorem 3.11 to prove Theorem 9.2 in the case of finite-typekernels. Here, we recall that a kernel κ is called finite type when (s, t) 7→ κ(s, t)takes on finitely many values.

Exercise 9.12 (LWC for DGRGn(w)) Consider the directed generalized ran-dom graph, as formulated in (9.2.14)–(9.2.15). Assume that the weight-regularitycondition in (9.2.16) holds. Use Theorem 9.2 to determine the local weak limit inprobability of DGRGn(w). Is this local weak limit a single-or a multi-type branch-ing process?

Exercise 9.13 (Phase transition for directed Erdos-Renyi random graph) Forthe directed Erdos-Renyi random graph, show that ζ in Theorem 9.3 satisfiesζ > 0 precisely when λ > 1.

Exercise 9.14 (Phase transition for directed generalized random graph) Con-sider the directed generalized random graph, as formulated in (9.2.14)–(9.2.15).Assume that the weight-regularity condition in (9.2.16) holds. What is the condi-tion on the asymptotic weight distribution (W (out),W (in)) in (9.2.16) that is equiv-alent to ζ > 0 in Theorem 9.3?

Exercise 9.15 (Correlation out- and in-degree randomly directed graph) Inan undirected graph G, randomly direct each edge by orienting e = u, v as(u, v) with probability 1

2and as (v, u) with probability 1

2, as in Exercise 9.3. Let

v ∈ V (G) be a vertex in G of degree dv. What is the correlation between its out-and in-degree in the randomly directed version of G?

Exercise 9.16 (LWC for DCMn(d) in Theorem 9.4) Give a proof of Theorem9.4 by suitably adapting the proof of Theorem 4.1.

Exercise 9.17 (One-sided law of large numbers for SSC) Adapt the proof ofCorollary 2.25 to show that when Gn = ([n], E(Gn)) converges in probability inthe local weak sense to (G, o) having distribution µ, then, for every ε > 0 fixed,

P(|Cmax| ≤ n(ζ + ε))→ 1, (9.7.7)

where ζ = µ(|C (o)| = ∞) is the forward-backward survival probability of thelimiting graph (G, o).

Exercise 9.18 (Subcritical directed configuration model) Let DCMn(d) be adirected configuration model that satisfies the degree-regularity conditions in LetCmax denote its largest strongly connected component. Use Exercise 9.17 to showthat |Cmax|/n P−→ 0 when ζ = 0. This proves the subcritical result in Theorem9.5(b).

Exercise 9.19 (The strongly connected component in temporal networks) Let


G be a temporal network, in which vertices have a time label of their birth andedges are oriented from younger to older vertices. What do the strongly connectedcomponents of G look like?

Exercise 9.20 (Degree structure in stochastic blockmodels) Recall the defini-tion of the stochastic blockmodel in Section 9.3.1, and assume that (9.3.1) holds.What is the asymptotic expected degree of this model? When do all vertices havethe same asymptotic expected degree?

Exercise 9.21 (The giant in stochastic blockmodels) Recall the definition ofthe stochastic blockmodel in Section 9.3.1, and assume that (9.3.1) holds. Whenis there a giant component?

Exercise 9.22 (Degree structure in stochastic blockmodels with unequal ex-pected degrees) Let n be even. Consider the stochastic blockmodel with 2 typesand n/2 vertices of each of the types. Let pij = a1/n when i, j have type 1,pij = a2/n when i, j have type 2, and pij = b/n when i, j have different types.For i ∈ 1, 2 and k ∈ N0, let Ni,k(n) denote the number of vertices of type i anddegree k. Show that

Ni,k(n)

nP−→ P(Poi(λi) = k), (9.7.8)

where λi = (ai + b)/2.

Exercise 9.23 (Community detection in stochastic blockmodels with unequalexpected degrees) In the setting of Exercise 9.22, assume that a1 > a2. Considerthe following greedy community detection algorithm: Let π(i) = 1 for the n/2vertices of highest degree type 1, and π(i) = 2 for the remaining vertices (breakingties randomly when necessary). Argue that this algorithm achieves the solvabilitycondition in (9.3.2).

Exercise 9.24 (Parameter conditions for solvable stochastic blockmodels) Con-sider the stochastic blockmodel in the setting of Theorem 9.9, and assume that(a − b)2 > 2(a + b), so that the community detection problem is solvable. Showthat a−b > 2 and thus also a+b > 2. Conclude that this model has a giant. Showthat also the vertices of type 1 only (resulting in an Erdos-Renyi random graphof size n/2 and edge probability a/n) also has a giant. What are the conditionsfor the vertices of type 2 to have a giant?

Exercise 9.25 (Degree structure in degree-corrected stochastic blockmodels)Recall the definition of the degree-corrected stochastic blockmodel in (9.3.11) inSection 9.3.2, and assume that (9.3.1) holds. What is the asymptotic expecteddegree of a vertex v of weight xv of this model? What are the restrictions on(κi,j)i,j∈S such that the expected degree of vertex v with weight xv is equal toxv(1 + o(1))?

Exercise 9.26 (Equal average degrees in degree-corrected stochastic blockmod-els) Recall the definition of the degree-corrected stochastic blockmodel in (9.3.11)in Section 9.3.2, and assume that (9.3.1) holds. Let r ≥ 2 be arbitrary and assume

434 Related Models

that κi,j = b for all i 6= j, while κi,i = a. Assume that µi = 1/r for every i ∈ [r].Compute the asymptotic average degree of a vertex of type i, and show that it isindependent of i.

Exercise 9.27 (The giant in the degree-corrected stochastic blockmodels) Re-call the definition of the degree-corrected stochastic blockmodel in Section 9.3.2,and assume that (9.3.1) holds. When is there a giant component?

Exercise 9.28 (Degrees in configuration models with global communities) Re-call the definition of the configuration models with global communities in Section9.3.3, and assume that (9.3.18), (9.3.19) and (9.3.20) hold. What is the asymp-totic expected degree of this model? When do all vertices have the same asymptoticexpected degree?

Exercise 9.29 (Local limit in configuration models with global communities)Recall the definition of the configuration models with global communities in Sec-tion 9.3.3, and assume that (9.3.18), (9.3.19) and (9.3.20) hold. What do youthink that the local limit is of this model? [Note: No proof is expected, but areasonable argument.]

Exercise 9.30 (Giant in configuration models with global communities) Recallthe definition of the configuration models with global communities in Section 9.3.3,and assume that (9.3.18), (9.3.19) and (9.3.20) hold. When do you think thatthere is there a giant component? [Note: No proof is expected, but a reasonableargument.]

Exercise 9.31 (Degrees distribution in preferential attachment model with globalcommunities) Show that (pk(θ))k≥m in (9.3.28) is a probability distribution forall θ, i.e., show that

∑k≥m pk(θ) = 1.

Exercise 9.32 (Global degrees distribution in preferential attachment modelwith global communities) In the preferential attachment models with global com-munities studied in Theorem 9.11, show that also the global degree distributiongiven by P (v)

k = 1n

∑v∈[n] 1Dv(n)=k also converges almost surely.

Exercise 9.33 (Global degrees distribution in preferential attachment modelwith global communities) In the preferential attachment models with global com-munities studied in Theorem 9.11, show that the global degree distribution has apower-law tail with exponent τ = 1 + 1/maxs∈[s] θ

∗s , provided that µs > 0 for all

s ∈ [r].

Exercise 9.34 (Clustering in model with edges and triangles) Show that theglobal clustering coefficient in the model where each pair of vertices is indepen-dently connected with probability λ/n, as for ERn(λ/n), and each triple forms atriangle with probability µ/n2, independently for all triplets and independently ofthe status of the edges, converges to µ/(µ+ λ2).

Exercise 9.35 (Local limit in inhomogeneous random graph with communities)Recall the definition of the inhomogeneous random graph with communities in


Section 9.4.1. What do you think that the local limit is of this model? [Note: Noproof is expected, but a reasonable argument.]

Exercise 9.36 (Size-biased community size distribution in HCM) In the hier-archical configuration model introduced in Section 9.4.2, choose a vertex on uni-formly at random from [n]. Let Gon be the community containing on. Show that(9.4.17) implies that |V (Gon)| converges in distribution, and identify its limitingdistribution.

Exercise 9.37 (Local limit in hierarchical configuration model) Recall the defi-nition of the hierarchical configuration model in Theorem 9.16. What do you thinkthat the local limit is of this model? [Note: No proof is expected, but a reasonableargument.]

Exercise 9.38 (Law of large numbers for |Cmax| in hierarchical configurationmodel) Use Theorem 4.4 to prove the law of large numbers for the giant in thehierarchical configuration model in Theorem 9.16, and prove that ζ is given by(9.4.20).

Exercise 9.39 (Degree moments in scale-free percolation Deijfen et al. (2013))Show that E[Dp

0 ] <∞ when p < γ and E[Dp0 ] =∞ when p > γ. In particular, the

variance of the degrees is finite precisely when γ > 2.

Exercise 9.40 (Positive correlation between edge statuses in scale-free percola-tion) Show that, for scale-free percolation, and for all x, y, z distinct and λ > 0,

P(x, y and x, z occupied) ≥ P(x, y occupied) P(x, z occupied), (9.7.9)

the inequality being strict when P(W0 = 0) < 1. In other words, the edge statusesare positively correlated.

Appendix A

Some facts about the metric spacestructure of rooted graphs

.

Abstract

In this section, we highlight some properties and results about metricspaces, including separable metric spaces and Borel measures on them.These results are used throughout Chapters 1– 3. We also present someof the missing details in the proof that the space of rooted graphs isa separable metric space, or a Polish space. Finally, we discuss whatcompact sets look like in this topology, and relate this to tightnesscriteria.

A.1 Metric spaces

In this section, we discuss metric spaces. We start by defining what a metricis:

Definition A.1 (Distance metric) Let X be a space. A distance on X is afunction d : X 2 7→ [0,∞) such that

(a) 0 ≤ d(x, y) <∞ for all x, y ∈ X ;

(b) d(x, y) = d(y, x) for all x, y ∈ X ;

(c) d(x, z) ≤ d(x, y) + d(y, z) for all x, y, z ∈ X .

Definition A.2 (Metric space) Let X be a space and d a metric on it. Then(X , d) is called a metric space.

We next discuss some desirable properties of metric spaces:

Definition A.3 (Complete metric space) Let (X , d) be a metric space. We saythat (X , d) is complete when every Cauchy sequence has a limit. Here, a Cauchysequence is a sequence (xn)n≥1 with xn ∈ X such that for every ε > 0 there existsan N = N(ε) such that d(xn, xm) ≤ ε for all n,m ≥ N .

Definition A.4 (Separable metric space) Let (X , d) be a metric space. We saythat (X , d) is separable if it contains a countable subset that is dense in (X , d).This means that there exists a countable set A ⊆ X such that, for every x ∈ Xthere exists a sequence (an)n≥1 with an ∈ A such that d(an, x)→ 0.

Definition A.5 (Polish space) The metric space (X , d) is called Polish when itis separable and complete.

437

438 Some facts about the metric space structure of rooted graphs

A.2 Properties of the metric dG? on rooted graphs

For (G, o) ∈ G?, let

[G, o] = (G′, o′) : (G′, o′) ' (G, o) (A.2.1)

denote the equivalence class in G? corresponding to (G, o). We further let

[G?] = (G, o) : (G, o) ∈ G? (A.2.2)

denote the set of equivalence class in G?. This will be the set on which the distancedG?

actsIn this section, we prove that ([G?],dG?

) is a Polish space:

Theorem A.6 (Rooted graphs form a Polish space) dG?is a well-defined metric

on [G?]. Further, the metric space ([G?],dG?) is a Polish space.

The proof of Theorem A.6 will be divided into several steps. We start in Propo-sition A.8 by showing that dG?

is an ultrametric, which is a slightly strongerproperty than being a metric, and which also implies that (G?, dG?

) is a metricspace. In Proposition A.10, we show that the metric space (G?,dG?

) is complete,and in Proposition A.12, we show that it is separable. After that, we completethe proof of Theorem A.6.

In the remainder of this section, we will often work with r-neighborhoodsB(G)

r (o) of o in G. We emphasize that we consider B(G)

r (o) to be a rooted graph,with root o (recall (2.1.1)).

A.2.1 Ultrametric property of dG?

In this section, we prove that dG?: G? × G? → [0, 1] is an ultrametric. One

of the problems that we have to resolve is that the space of rooted graphs isonly defined up to isomorphisms, which means that we have to make sure thatdG?

((G1, o1), (G2, o2)) is independent of the exact representative we choose in theequivalence classes of (G1, o1) and (G2, o2). That is the content of the followingproposition:

Proposition A.7 (dG?is well-defined on [G?]) The equality dG?

((G1, o1), (G2, o2)) =

dG?((G1, o1), (G2, o2)) holds whenever (G1, o1) ' (G1, o1) and (G2, o2) ' (G2, o2).

Consequently, dG?: [G?]× [G?] is well-defined.

Proposition A.8 (Ultrametricity) The map dG?: G? × G? → [0, 1] is an ultra-

metric, meaning that

(a) dG?((G1, o1), (G2, o2)) = 0 precisely when (G1, o1) ' (G2, o2);

(b) dG?((G1, o1), (G2, o2)) = dG?

((G2, o2), (G1, o1)) for all (G1, o1), (G2, o2) ∈ G?;(c) dG?

((G1, o1), (G3, o3)) ≤ maxdG?((G1, o1), (G2, o2)),dG?

((G2, o2), (G3, o3)) forall (G1, o1), (G2, o2), (G3, o3) ∈ G?.

Before giving the proof of Propositions A.7–A.8, we state and prove an impor-tant ingredient in them:

A.2 Properties of the metric dG?on rooted graphs 439

Lemma A.9 (Local neighborhoods determine the graph) Let (G1, o1) and(G2, o2) be two connected locally finite rooted graphs such that B(G1)

r (o1) ' B(G2)

r (o2)for all r ≥ 0. Then (G1, o1) ' (G2, o2).

Proof We use a subsequence argument. Fix r ≥ 0, and consider the isomorphismφr : B(G1)

r (o1)→ B(G2)

r (o2), which exists by assumption. Extend φr to (G1, o1) bydefining

ψr(v) =

φr(v) for v ∈ V (B(G1)

r (o1));

o2 otherwise.(A.2.3)

Our aim is to use (ψr)r≥0 to construct an isomorphism between (G1, o1) and(G2, o2).

Abbreviate V (G1)

r = V (B(G1)

r (o1)). Let ψr|V (G1)0

be the restriction of ψr to V (G1)

0 =

o1. Then we know that ψr(v) = o2 for every v ∈ V (G1)

0 and r ≥ 0. We nextlet ψr|V (G1)

1be the restriction of ψr to V (G1)

1 . Then, ψr|V (G1)1

is an isomorphism

between B(G1)

1 (o1) and B(G2)

1 (o2) for every r. Since there are only finitely manysuch isomorphisms, the same isomorphism φ′1 needs to be repeated infinitely manytimes in the sequence (ψr|V (G1)

1)r≥1. Let N1 denote the values of r for which

ψr|V (G1)1

= φ′1 ∀r ∈ N1. (A.2.4)

Now we repeat this argument to k = 2. Let ψr|V (G1)2

be the restriction of

ψr to V (G1)

2 . Again, ψr|V (G1)2

is an isomorphism between B(G1)

2 (o1) and B(G2)

2 (o2)

for every r. Since there are again only finitely many such isomorphisms, thesame isomorphism φ′2 needs to be repeated infinitely many times in the sequence(ψr|V (G1)

2)r∈N1

. Let N2 denote the values of r ∈ N1 for which

ψr|V (G1)2

= φ′2 ∀r ∈ N2. (A.2.5)

We now generalize this argument to general k ≥ 2. Let ψr|V (G1)

k

be the re-

striction of ψr to V (G1)

k . Again, ψr|V (G1)

k

is an isomorphism between B(G1)

k (o1) and

B(G2)

k (o2) for every r. Since there are again only finitely many such isomorphisms,the same isomorphism φ′k needs to be repeated infinitely many times in the se-quence (ψr|V (G1)

k

)r∈Nk−1. Let Nk denote the values of r ∈ Nk−1 for which

ψr|V (G1)

k

= φ′k ∀r ∈ Nk. (A.2.6)

Then, we see that Nk is a decreasing infinite set.Let us define ψ′` to be the first element of the sequence (ψr)r∈N` . Then, it

follows that ψ′`(v) = φ′k(v) for all ` ≥ k and all v ∈ V (G1)

k . Denote U0 = o1and Uk = V (G1)

k \ V (G1)

k−1 . Since we assume that V (G1) is connected, we have that∪k≥0Uk = V (G1).

It follows that the functions (ψ′`)`≥1 converge pointwise to

ψ(v) = ψ′∞(v) =∑k≥1

φ′k(v)1v∈Uk. (A.2.7)


We claim that ψ is the desired isomorphism between (G1, o1) and (G2, o2).The map ψ is clearly bijective, since φ′k : Uk → φ′k(Uk) is bijective. Further,let u, v ∈ V (G1). Denote k = maxdG1

(o1, u), dG1(o1, v). Then u, v ∈ Vk. Be-

cause φ′k is an isomorphism between B(G1)

k (o1) and B(G2)

k (o2), it follows thatφ′k(u), φ′k(v) ∈ V (B(G2)

k (o2)) and φ′k(u), φ′k(v) ∈ E(B(G2)

k (o2)) precisely whenu, v ∈ E(B(G1)

k (o1)). Since ψ = φ′k on V (G1)

k , it then also follows that ψ(u), ψ(v) ∈E(B(G2)

k (o2)) precisely when u, v ∈ E(B(G1)

k (o1)), as required. Finally, ψ(o1) =φk(o1) and φk(o1) = o2 for every k ≥ 0. This completes the proof.

Proof of Proposition A.7. We note that if (G1, o1) ' (G1, o1) and (G2, o2) '(G2, o2), then B(G1)

r (o1) ' B(G2)

r (o2) if and only if B(G1)

r (o1) ' B(G2)

r (o2). Therefore,dG?

((G1, o1), (G2, o2)) is independent of the exact choice of the representative inthe equivalence class of (G1, o1) and (G2, o2), so that dG?

((G1, o1), (G2, o2)) isconstant on such equivalence classes. This makes the metric dG?

([G1, o1], [G2, o2])well-defined for [G1, o1], [G2, o2] ∈ [G?].

Proof of Proposition A.8. (a) Assume that dG?((G1, o1), (G2, o2)) = 0. Then we

have B(G1)

r (o1) ' B(G2)

r (o2) for all r ≥ 0, so that, by Lemma A.9, also (G1, o1) '(G2, o2), as required.

The proof of (b) is trivial and omitted.For (c), let

rij = supr : B(Gi)

r (oi) ' B(Gj)

r (oj). (A.2.8)

Then B(G1)

r (o1) ' B(G3)

r (o3) for all r ≤ minr13, r23 and B(G1)

r (o2) ' B(G3)

r (o3)for all r ≤ minr13, r23. We conclude that B(G1)

r (o1) ' B(G2)

r (o2) for all r ≤minr13, r23, so that r12 ≥ minr13, r23. This implies that

1/(r13 + 1) ≤ max1/(r13 + 1), 1/(r23 + 1),which implies the claim (recall (2.1.2)).

A.2.2 Completeness of ([G?], dG?)

In this section, we prove that ([G?],dG?) is complete:

Proposition A.10 (Completeness) The metric space ([G?],dG?) is complete.

Before giving the proof of Proposition A.10, we state and prove an importantingredient in it:

Lemma A.11 (Compatible rooted graph sequences have a limit) Let ((Gr, or))r≥0

be connected locally finite rooted graphs that are compatible, meaning that B(Gs)

r (os) 'B(Gr)

r (or) for all r ≤ s. Then there exists a connected locally finite rooted graph(G, o) such that (Gr(or), or) ' B(G)

r (o). Moreover, (G, o) is unique up to isomor-phisms.

Proof The point is that (Gr, or) may not live on the same vertex sets. Therefore,we first create a version (G′r, o

′r) that does live on the same vertex set. For this,

we will first construct isomorphic copies of (Gr, or) on a common node set, in


a compatible way. To do this, denote Vr = v ∈ N : r ≤ Nr, where Nr =|V (Br(or))|. We define a sequence of bijections φr : V (Br(or)) → Vr recursivelyas follows. Let φ0 be the unique isomorphism from V (B0(o0)) = o0 to V0 = 1.

Let ψr be an isomorphism between (Gr−1, or−1) and (B(Gr)

r−1 , or), and ηr anarbitrary bijection between V (Gr−1) \ V (B(Gr)

r−1 ) to Vr \ Vr−1. Define

φr(v) =

φr−1(ψ−1

r (v)) for v ∈ V (B(Gr)

r−1 ),

ηr(v) for v ∈ V (Gr−1) \ V (B(Gr)

r−1 ).(A.2.9)

Then we define (G′r, o′r) = (φr(Gr), φr(or)), where φr(Gr) is the graph consisting

of the vertex set φr(v) : v ∈ V (Gr) and edge set φr(u), φr(v) : u, v ∈E(Gr). Let us derive some properties of (G′r, o

′r). First of all, we see that o′r = 1

for every r ≥ 0, by construction. Further, φr is a bijection, so that B(G′r)

r (o′r) 'B(Gr)

r (or) = (Gr, or).Now we are ready to define (G, o). We define the root as o = 1, the vertex set of

G by V (G) =⋃r≥1 V (G′r), and the edge set E(G) =

⋃r≥1E(G′r). Then it follows

that B(G)

r (o) = B(G′r)

r (o′r) ' B(Gr)

r (or) = (Gr, or), as required. Further, (G, o) islocally finite and connected, since (Gr, or) is for every r ≥ 0. Finally, to verifyuniqueness, note that if (G′, o′) also satisfies that B(G′)

r (o′) ' B(Gr)

r (or) = (Gr, or)for every r ≥ 0, then B(G′)

r (o′) ' B(G)

r (o) for every r ≥ 0, so that (G, o) ' (G′, o′),as required.

Proof of Proposition A.10. To verify completeness, fix a Cauchy sequence ([Gn, on])n≥1

with representative rooted graphs ((Gn, on))n≥1. Then, for every ε > 0, there ex-ists an N = N(ε) ≥ 0 such that, for every n,m ≥ N ,

dG?([Gn, on], [Gm, om]) ≤ ε. (A.2.10)

We note that, by Proposition A.7,

dG?([Gn, on], [Gm, om]) = dG?

((Gn, on), (Gm, om)), (A.2.11)

so from now on, we can work with the representatives instead. Since

dG?((Gn, on), (Gm, om)) ≤ ε, (A.2.12)

we obtain that for all r ≤ 1/ε− 1 and n,m ≥ N , B(Gn)

r (on) ' B(Gm)

r (om). Equiv-alently, this implies that for every r ≥ 1, there exists an nr such that, for alln ≥ nr,

B(Gn)

r (on) ' B(Gnr )

r (onr). (A.2.13)

Clearly, we may select nr such that nr is strictly increasing. Define (G′r, o′r) =

B(Gnr )

r (onr). Then ((G′r, o′r))r≥0 forms a compatible sequence. By Lemma A.11,

there exists a locally finite rooted graph (G, o) such that B(G)

r (o) ' (G′r(o′r), o

′r).

But then also

B(Gn)

r (on) ' B(Gnr )

r (onr) = (G′r(o′r), o

′r). (A.2.14)

This, in turn, implies that, for all n ≥ nr,dG?

([G, o], [Gn, on]) = dG?((G, o), (Gn, on)) ≤ 1/(r + 1). (A.2.15)


Since r ≥ 1 is arbitrary, we conclude that [Gn, on] converges to [G, o], which is in[G?], so that ([G?],dG?

) is complete.

A.2.3 Separability of ([G?], dG?)

In this section, we prove that ([G?],dG?) is separable:

Proposition A.12 (Separability) The metric space (G?,dG?) is separable.

Proof We need to show that there exists a countable dense subset in (G?, dG?).

Consider the set of all finite rooted graphs, which is certainly countable. Also,B(G)

r (o) is a finite rooted graph for all r ≥ 0. Finally, dG?(B(G)

r (o), (G, o)) ≤ 1/(r+1), so that B(G)

r (o) converges to (G, o) when r → ∞. Thus, the space of finiterooted graphs is dense and countable. This completes the proof that (G?,dG?

) isseparable.

A.2.4 ([G?],dG?) is Polish: Proof of Theorem A.6

In this section, we use the above results to complete the proof of Theorem A.6:

Proof of Theorem A.6. The function dG?is well-defined on [G?]× [G?] by Propo-

sition A.7. Proposition A.8 implies that dG?is an (ultra)metric on [G?]. Finally,

Proposition A.10 proves that ([G?], dG?) is complete, while Proposition A.12 proves

that ([G?],dG?) is separable. Thus, ([G?], dG?

) is a Polish space.

A.2.5 The laws of neighborhoods determine distributions on G?

In this section, we show that the laws of neighborhoods determine distributionson G?, as was crucially used in the proof of Theorem 2.6:

Proposition A.13 (Laws of neighborhoods determine distributions) Let µ andµ′ be two distributions on G? such that µ(B(G)

r (o) ' H?) = µ′(B(G)

r (o) ' H?).Then µ = µ′.

Proof The measures µ and µ′ satisfy µ = µ′ precisely when µ(H?) = µ′(H?) forevery measurable H? ⊆ G?. Fix H? ⊆ G?. For r ≥ 0, denote

H?(r) = (G, o) : ∃(G′, o′) ∈H? such that B(G)

r (o) ' B(G′)r (o′). (A.2.16)

Thus, H?(r) contains those rooted graphs whose r-neighborhood is the same asthat of a rooted graph in H?. Clearly, H?(r) H? as r → ∞. Therefore, alsoµ(H?(r)) µ(H?) and µ′(H?(r)) µ′(H?).

Finally, note that (G, o) ∈H?(r) if and only if B(G)

r (o) ∈H?(r). Thus,

µ(H?(r)) =∑

H?∈H?(r)

µ(B(G)

r (o) ' H?). (A.2.17)

Since µ(B(G)

r (o) ' H?) = µ′(B(G)

r (o) ' H?), we conclude that

µ(H?(r)) =∑

H?∈H?(r)

µ(B(G)

r (o) ' H?) =∑

H?∈H?(r)

µ′(B(G)

r (o) ' H?) = µ′(H?(r)),

(A.2.18)


we conclude that µ(H?) = µ′(H?), as required.

A.2.6 Compact sets in ([G?], dG?) and tightness

In Theorem 2.7, we have given a tightness criterion for ([G?],dG?). In this sec-

tion, we investigate tightness further, by considering compact sets in the metricspace ([G?], dG?

). First, we recall the definition of compactness:

Definition A.14 (Compact sets on general metric spaces) Let X be a gen-eral metric space. A set K is compact provided that every collection of open setscovering K has a finite subset.

Next, we recall the definition of tightness:

Definition A.15 (Tightness on general metric spaces) A sequence of randomvariable (Xn)n≥1 living on a general metric space X is tight when, for every ε > 0,there exists a compact set K = Kε such that

lim supn→∞

P(Xn ∈ Kcε) ≤ ε. (A.2.19)

Thus, the notion of tightness is intimately related to compact sets. For real-valued random variables, compact sets can be taken asK = [−K,K]. For ([G?],dG?

),we first need to investigate what compact sets look like. This is the content ofthe next theorem:

Theorem A.16 (Compact sets in (G?,dG?)) For (G, o) ∈ G? and r ≥ 1, define

∆r(G, o) = maxd(G)

v : v ∈ V (B(G)

r (o)), (A.2.20)

where d(G)

v denotes the degree of v ∈ V (G). Then, a closed family of rooted graphsK ⊆ G? is tight if and only if

sup(G,o)∈K

∆r(G, o) <∞ for all r ≥ 1. (A.2.21)

Theorem 2.7 thus states that the uniform integrability of (d(Gn)

on)n≥1 implies

(A.2.21), but not vice versa. It is highly interesting that the compact sets in(G?, dG?

) can be described so explicitly.

Proof Recall from (Rudin, 1991, Theorem A.4) that a closed set K is compactwhen it is totally bounded, meaning that for every ε > 0, the set K can be coveredby finitely many balls of radius ε. As a result, for every r ≥ 1, there must begraphs (F1, o), . . . , (F`, o) such that K is covered by the finitely many open sets

(G, o) : B(G)

r (o) ' B(Fi)

r (o). (A.2.22)

Equivalently, every (G, o) ∈ K satisfies B(G)

r (o) ' B(Fi)

r (o) for some i ∈ [`]. Inturn, this is equivalent to the statement that the set

Ar = B(G)

r (o) : (G, o) ∈ K (A.2.23)

is finite for every r ≥ 1.


We finally prove that Ar is finite for every r ≥ 1 precisely when (A.2.21) holds.Denote ∆r = sup(G,o)∈K∆r(G, o). If ∆r is finite for every r ≥ 1, then, becauseevery (G, o) ∈ K is connected, the graphs B(G)

r (o) can have at most

|V (B(G)

r (o))| ≤ 1 + ∆r + · · ·+ ∆rr

many vertices, so that Ar is finite. On the other hand, when ∆r = ∞, then Kcontains a sequence of rooted graphs (Gi, oi) such that ∆r(Gi, oi) → ∞, so thatalso |V (B(Gi)

r (oi))| → ∞. Since rooted graphs with different numbers of verticesare non-isomorphic (recall Exercise 2.1), this shows that Ar is infinite.

A.3 Notes and discussion

Notes on Section A.1

We draw inspiration from Howes (1995) and Rudin (1987, 1991).

Notes on Section A.2

This section is based to a large extent on (Leskela, 2019, Appendix B), some partsof the presented material is copied almost verbatim from there. I am grateful toLasse Leskela for making me aware of the subtleties of the proof, as well as sharinghis preliminary version of these notes.

References

Abbe, E., and Sandon, C. 2018. Proof of the achievability conjectures for the general stochasticblock model. Comm. Pure Appl. Math., 71(7), 1334–1406.

Abdullah, M.A., Bode, M., and Fountoulakis, N. 2017. Typical distances in a geometric modelfor complex networks. Internet Math., 38.

Aiello, W., Bonato, A., Cooper, C., Janssen, J., and Pralat, P. 2008. A spatial web graph modelwith local influence regions. Internet Math., 5(1-2), 175–196.

Albert, R., and Barabasi, A.-L. 2002. Statistical mechanics of complex networks. Rev. ModernPhys., 74(1), 47–97.

Aldous, D. 1985. Exchangeability and related topics. Pages 1–198 of: Ecole d’ete de probabilitesde Saint-Flour, XIII—1983. Lecture Notes in Math., vol. 1117. Berlin: Springer.

Aldous, D. 1991. Asymptotic fringe distributions for general families of random trees. Ann.Appl. Probab., 1(2), 228–266.

Aldous, D., and Lyons, R. 2007. Processes on unimodular random networks. Electron. J. Probab.,12, no. 54, 1454–1508.

Aldous, D., and Steele, J.M. 2004. The objective method: probabilistic combinatorial optimiza-tion and local weak convergence. Pages 1–72 of: Probability on discrete structures. Ency-clopaedia Math. Sci., vol. 110. Springer, Berlin.

Angel, O., and Schramm, O. 2003. Uniform infinite planar triangulations. Comm. Math. Phys.,241(2-3), 191–213.

Antunovic, T., Mossel, E., and Racz, M. 2016. Coexistence in preferential attachment networks.Combin. Probab. Comput., 25(6), 797–822.

Arratia, R., Barbour, AD, and Tavare, S. 2003. Logarithmic combinatorial structures: a proba-bilistic approach. European Mathematical Society.

Athreya, K., and Ney, P. 1972. Branching processes. New York: Springer-Verlag. DieGrundlehren der mathematischen Wissenschaften, Band 196.

Athreya, K. B., Ghosh, A. P., and Sethuraman, S. 2008. Growth of preferential attachmentrandom graphs via continuous-time branching processes. Proc. Indian Acad. Sci. Math. Sci.,118(3), 473–494.

Athreya, K.B. 2007. Preferential attachment random graphs with general weight function.Internet Math., 4(4), 401–418.

Ball, F., Sirl, D., and Trapman, P. 2009. Threshold behaviour and final outcome of an epidemicon a random network with household structure. Adv. in Appl. Probab., 41(3), 765–796.

Ball, F., Sirl, D., and Trapman, P. 2010. Analysis of a stochastic SIR epidemic on a randomnetwork incorporating household structure. Math. Biosci., 224(2), 53–73.

Barabasi, A.-L. 2002. Linked: The New Science of Networks. Cambridge, Massachusetts: PerseusPublishing.

Barabasi, A.-L., and Albert, R. 1999. Emergence of scaling in random networks. Science,286(5439), 509–512.

Barbour, A. D., and Reinert, G. 2001. Small worlds. Random Structures Algorithms, 19(1),54–74.

Barbour, A. D., and Reinert, G. 2004. Correction: “Small worlds” [Random Structures Algo-rithms 19 (2001), no. 1, 54–74; MR1848027]. Random Structures Algorithms, 25(1), 115.

Barbour, A. D., and Reinert, G. 2006. Discrete small world networks. Electron. J. Probab., 11,no. 47, 1234–1283 (electronic).

Bender, E.A., and Canfield, E.R. 1978. The asymptotic number of labelled graphs with givendegree sequences. Journal of Combinatorial Theory (A), 24, 296–307.

Benjamini, I., and Schramm, O. 2001. Recurrence of distributional limits of finite planar graphs.Electron. J. Probab., 6, no. 23, 13 pp. (electronic).

445

446 References

Benjamini, I., Kesten, H., Peres, Y., and Schramm, O. 2004. Geometry of the uniform spanningforest: transitions in dimensions 4, 8, 12, . . . . Ann. of Math. (2), 160(2), 465–491.

Benjamini, I., Lyons, R., and Schramm, O. 2015. Unimodular random trees. Ergodic TheoryDynam. Systems, 35(2), 359–373.

Berger, N. 2002. Transience, recurrence and critical behavior for long-range percolation. Comm.Math. Phys., 226(3), 531–558.

Berger, N., Borgs, C., Chayes, J. T., D’Souza, R. M., and Kleinberg, R. D. 2004. Competition-induced preferential attachment. Pages 208–221 of: Automata, languages and programming.Lecture Notes in Comput. Sci., vol. 3142. Berlin: Springer.

Berger, N., Borgs, C., Chayes, J. T., D’Souza, R. M., and Kleinberg, R. D. 2005. Degree dis-tribution of competition-induced preferential attachment graphs. Combin. Probab. Comput.,14(5-6), 697–721.

Berger, N., Borgs, C., Chayes, J., and Saberi, A. 2014. Asymptotic behavior and distributionallimits of preferential attachment graphs. Ann. Probab., 42(1), 1–40.

Bhamidi, S., Bresler, G., and Sly, A. 2008. Mixing Time of Exponential Random Graphs. Pages803–812 of: FOCS ’08: Proceedings of the 2008 49th Annual IEEE Symposium on Foundationsof Computer Science. Washington, DC, USA: IEEE Computer Society.

Bhamidi, S., Hofstad, R. van der, and Hooghiemstra, G. 2010a. First passage percolation onrandom graphs with finite mean degrees. Ann. Appl. Probab., 20(5), 1907–1965.

Bhamidi, S., Hofstad, R. van der, and Leeuwaarden, J. van. 2010b. Scaling limits for criticalinhomogeneous random graphs with finite third moments. Electronic Journal of Probability,15, 1682–1702.

Bhamidi, S., Bresler, G., and Sly, A. 2011. Mixing time of exponential random graphs. Ann.Appl. Probab., 21(6), 2146–2170.

Bhamidi, S., Hofstad, R. van der, and Leeuwaarden, J. van. 2012. Novel scaling limits for criticalinhomogeneous random graphs. Ann. Probab., 40(6), 2299–2361.

Bingham, N. H., Goldie, C. M., and Teugels, J. L. 1989. Regular variation. Encyclopedia ofMathematics and its Applications, vol. 27. Cambridge: Cambridge University Press.

Biskup, M. 2004. On the scaling of the chemical distance in long-range percolation models. Ann.Probab., 32(4), 2938–2977.

Blasius, T., Friedrich, T., and Krohmer, A. 2018. Cliques in hyperbolic random graphs. Algo-rithmica, 80(8), 2324–2344.

Bloznelis, M. 2009. A note on loglog distances in a power law random intersection graph.arXiv:0911.5127 [math.PR].

Bloznelis, M. 2010a. Component evolution in general random intersection graphs. SIAM J.Discrete Math., 24(2), 639–654.

Bloznelis, M. 2010b. The largest component in an inhomogeneous random intersection graphwith clustering. Electron. J. Combin., 17(1), Research Papr 110, 17.

Bloznelis, M. 2013. Degree and clustering coefficient in sparse random intersection graphs. Ann.Appl. Probab., 23(3), 1254–1289.

Bloznelis, M., Gotze, F., and Jaworski, J. 2012. Birth of a strongly connected giant in aninhomogeneous random digraph. J. Appl. Probab., 49(3), 601–611.

Bloznelis, M., Godehardt, E., Jaworski, J., Kurauskas, V., and Rybarczyk, K. 2015. Recentprogress in complex network analysis: models of random intersection graphs. Pages 69–78 of:Data Science, Learning by Latent Structures, and Knowledge Discovery. Springer.

Bode, M., Fountoulakis, N., and Muller, T. 2015. On the largest component of a hyperbolicmodel of complex networks. Electron. J. Combin., 22(3), Paper 3.24, 46.

Boguna, M., Papadopoulos, F., and Krioukov, D. 2010. Sustaining the internet with hyperbolicmapping. Nature communications, 1(1), 1–8.

Boldi, P., and Vigna, S. 2004. The WebGraph Framework I: Compression Techniques. Pages595–601 of: Proc. of the Thirteenth International World Wide Web Conference (WWW 2004).Manhattan, USA: ACM Press.

References 447

Boldi, P., Rosa, M., Santini, M., and Vigna, S. 2011. Layered Label Propagation: A Mul-tiResolution Coordinate-Free Ordering for Compressing Social Networks. Pages 587–596 of:Srinivasan, Sadagopan, Ramamritham, Krithi, Kumar, Arun, Ravindra, M. P., Bertino, Elisa,and Kumar, Ravi (eds), Proceedings of the 20th international conference on World Wide Web.ACM Press.

Bollobas, B. 1980. A probabilistic proof of an asymptotic formula for the number of labelledregular graphs. European J. Combin., 1(4), 311–316.

Bollobas, B. 2001. Random graphs. Second edn. Cambridge Studies in Advanced Mathematics,vol. 73. Cambridge: Cambridge University Press.

Bollobas, B., and Fernandez de la Vega, W. 1982. The diameter of random regular graphs.Combinatorica, 2(2), 125–134.

Bollobas, B., and Riordan, O. 2004a. The diameter of a scale-free random graph. Combinatorica,24(1), 5–34.

Bollobas, B., and Riordan, O. 2004b. Shortest paths and load scaling in scale-free trees. Phys.Rev. E., 69, 036114.

Bollobas, B., and Riordan, O. 2015. An old approach to the giant component problem. J.Combin. Theory Ser. B, 113, 236–260.

Bollobas, B., Riordan, O., Spencer, J., and Tusnady, G. 2001. The degree sequence of a scale-freerandom graph process. Random Structures Algorithms, 18(3), 279–290.

Bollobas, B., Borgs, C., Chayes, J., and Riordan, O. 2003. Directed scale-free graphs. Pages 132–139 of: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms(Baltimore, MD, 2003). New York: ACM.

Bollobas, B., Janson, S., and Riordan, O. 2005. The phase transition in the uniformly grownrandom graph has infinite order. Random Structures Algorithms, 26(1-2), 1–36.

Bollobas, B., Janson, S., and Riordan, O. 2007. The phase transition in inhomogeneous randomgraphs. Random Structures Algorithms, 31(1), 3–122.

Bollobas, B., Janson, S., and Riordan, O. 2011. Sparse random graphs with clustering. RandomStructures Algorithms, 38(3), 269–323.

Bordenave, C., Lelarge, M., and Massoulie, L. 2018. Nonbacktracking spectrum of randomgraphs: community detection and nonregular Ramanujan graphs. Ann. Probab., 46(1), 1–71.

Box, G.E.P. 1976. Science and statistics. Journal of the American Statistical Association,71(356), 791–799.

Box, G.E.P. 1979. Robustness in the strategy of scientific model building. Pages 201–236 of:Robustness in statistics. Elsevier.

Bringmann, K., Keusch, R., and Lengler, J. 2015. Geometric Inhomogeneous Random Graphs.Preprint arXiv:1511.00576 [cs.SI].

Bringmann, K., Keusch, R., and Lengler, J. 2016. Average Distance in a General Class ofScale-Free Networks with Underlying Geometry. Preprint arXiv:1602.05712 [cs.DM].

Bringmann, Karl, Keusch, Ralph, and Lengler, Johannes. 2017. Sampling Geometric Inhomoge-neous Random Graphs in Linear Time. In: 25th Annual European Symposium on Algorithms(ESA 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.

Britton, T., Deijfen, M., and Martin-Lof, A. 2006. Generating simple random graphs withprescribed degree distribution. J. Stat. Phys., 124(6), 1377–1397.

Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A.,and Wiener, J. 2000. Graph Structure in the Web. Computer Networks, 33, 309–320.

Cai, X.S., and Perarnau, G. 2020a. The diameter of the directed configuration model. PreprintarXiv:2003.04965 [math.PR].

Cai, X.S., and Perarnau, G. 2020b. The giant component of the directed configuration modelrevisited. Preprint arXiv:2004.04998 [math.PR].

Callaway, D.S., Hopcroft, J.E., Kleinberg, J. M., Newman, M. E. J., and Strogatz, S.H. 2001.Are randomly grown graphs really random? Physical Review E, 64, 041902.

Cao, J., and Olvera-Cravioto, M. 2020. Connectivity of a general class of inhomogeneous randomdigraphs. Random Structures Algorithms, 56(3), 722–774.

448 References

Caravenna, F., Garavaglia, A., and van der Hofstad, R. 2019. Diameter in ultra-small scale-freerandom graphs. Random Structures Algorithms, 54(3), 444–498.

Chatterjee, S. 2017. Large deviations for random graphs. Lecture Notes in Mathematics, vol.2197. Springer, Cham. Lecture notes from the 45th Probability Summer School held inSaint-Flour, June 2015, Ecole d’Ete de Probabilites de Saint-Flour. [Saint-Flour ProbabilitySummer School].

Chatterjee, S., and Durrett, R. 2009. Contact processes on random graphs with power lawdegree distributions have critical value 0. Ann. Probab., 37(6), 2332–2356.

Chatterjee, S., and Varadhan, S. R. S. 2011. The large deviation principle for the Erdos-Renyirandom graph. European J. Combin., 32(7), 1000–1017.

Chatterjee, Sourav, and Diaconis, Persi. 2013. Estimating and understanding exponential ran-dom graph models. Ann. Statist., 41(5), 2428–2461.

Chen, N., and Olvera-Cravioto, M. 2013. Directed random graphs with given degree distribu-tions. Stoch. Syst., 3(1), 147–186.

Chung, F., and Lu, L. 2001. The diameter of sparse random graphs. Adv. in Appl. Math., 26(4),257–279.

Chung, F., and Lu, L. 2002a. The average distances in random graphs with given expecteddegrees. Proc. Natl. Acad. Sci. USA, 99(25), 15879–15882 (electronic).

Chung, F., and Lu, L. 2002b. Connected components in random graphs with given expecteddegree sequences. Ann. Comb., 6(2), 125–145.

Chung, F., and Lu, L. 2003. The average distance in a random graph with given expecteddegrees. Internet Math., 1(1), 91–113.

Chung, F., and Lu, L. 2004. Coupling online and offline analyses for random power law graphs.Internet Math., 1(4), 409–461.

Chung, F., and Lu, L. 2006a. Complex graphs and networks. CBMS Regional Conference Seriesin Mathematics, vol. 107. Published for the Conference Board of the Mathematical Sciences,Washington, DC.

Chung, F., and Lu, L. 2006b. The volume of the giant component of a random graph with givenexpected degrees. SIAM J. Discrete Math., 20, 395–411.

Cohen, R., and Havlin, S. 2003. Scale-free networks are ultrasmall. Phys. Rev. Lett., 90, 058701,1–4.

Cooper, C., and Frieze, A. 2004. The size of the largest strongly connected component of arandom digraph with a given degree sequence. Combin. Probab. Comput., 13(3), 319–337.

Corten, R. 2012. Composition and Structure of a Large Online Social Network in the Nether-lands. PLOS ONE, 7(4), 1–8.

Csardi, G. 2006. Dynamics of Citation Networks. 4131, 698–709.

Darling, D. A. 1970. The Galton-Watson process with infinite mean. J. Appl. Probability, 7,455–456.

Davies, P. L. 1978. The simple branching process: a note on convergence when the mean isinfinite. J. Appl. Probab., 15(3), 466–480.

Decelle, A., Krzakala, F., Moore, C., and Zdeborova, L. 2011. Asymptotic analysis of thestochastic block model for modular networks and its algorithmic applications. Physical ReviewE, 84(6), 066106.

Deijfen, M. 2009. Stationary random graphs with prescribed iid degrees on a spatial Poissonprocess. Electron. Commun. Probab., 14, 81–89.

Deijfen, M., and Jonasson, J. 2006. Stationary random graphs on Z with prescribed iid degreesand finite mean connections. Electron. Comm. Probab., 11, 336–346 (electronic).

Deijfen, M., and Kets, W. 2009. Random intersection graphs with tunable degree distributionand clustering. Probab. Engrg. Inform. Sci., 23(4), 661–674.

Deijfen, M., and Meester, R. 2006. Generating stationary random graphs on Z with prescribedindependent, identically distributed degrees. Adv. in Appl. Probab., 38(2), 287–298.

References 449

Deijfen, M., Hofstad, R. van der, and Hooghiemstra, G. 2013. Scale-free percolation. Annalesde l’institut Henri Poincare (B) Probability and Statistics, 49(3), 817–838. cited By (since1996)0.

Dembo, A., and Montanari, A. 2010a. Gibbs measures and phase transitions on sparse randomgraphs. Braz. J. Probab. Stat., 24(2), 137–211.

Dembo, A., and Montanari, A. 2010b. Ising models on locally tree-like graphs. The Annals ofApplied Probability, 20(2), 565–592.

Deprez, P, Hazra, R., and Wuthrich, M. 2015. Inhomogeneous Long-Range Percolation forReal-Life Network Modeling.

Dereich, S., and Morters, P. 2009. Random networks with sublinear preferential attachment:Degree evolutions. Electronic Journal of Probability, 14, 122–1267.

Dereich, S., and Morters, P. 2011. Random networks with concave preferential attachment rule.Jahresber. Dtsch. Math.-Ver., 113(1), 21–40.

Dereich, S., and Morters, P. 2013. Random networks with sublinear preferential attachment:the giant component. Ann. Probab., 41(1), 329–384.

Dereich, S., Monch, C., and Morters, P. 2012. Typical distances in ultrasmall random networks.Adv. in Appl. Probab., 44(2), 583–601.

Dereich, S., Monch, C., and Morters, P. 2017. Distances in scale free networks at criticality.Electron. J. Probab., 22, Paper No. 77, 38.

Dhara, S., Hofstad, R. van der, Leeuwaarden, J.S.H., and Sen, S. 2016. Heavy-tailed configurationmodels at criticality. Preprint arXiv:1612.00650 [math.PR].

Dhara, S., Hofstad, R. van der, van Leeuwaarden, J., and Sen, S. 2017. Critical window for theconfiguration model: finite third moment degrees. Electron. J. Probab., 22, Paper No. 16, 33.

Diaconis, P. 1992. Sufficiency as statistical symmetry. Pages 15–26 of: American MathematicalSociety centennial publications, Vol. II (Providence, RI, 1988). Amer. Math. Soc., Providence,RI.

Ding, J., Kim, J. H., Lubetzky, E., and Peres, Y. 2010. Diameters in supercritical random graphsvia first passage percolation. Combin. Probab. Comput., 19(5-6), 729–751.

Ding, J., Kim, J. H., Lubetzky, E., and Peres, Y. 2011. Anatomy of a young giant componentin the random graph. Random Structures Algorithms, 39(2), 139–178.

Dommers, S., Hofstad, R. van der, and Hooghiemstra, G. 2010. Diameters in preferential at-tachment graphs. Journ. Stat. Phys., 139, 72–107.

Drmota, M. 2009. Random trees. SpringerWienNewYork, Vienna. An interplay between com-binatorics and probability.

Durrett, R. 2003. Rigorous result for the CHKNS random graph model. Pages 95–104 of:Discrete random walks (Paris, 2003). Discrete Math. Theor. Comput. Sci. Proc., AC. Assoc.Discrete Math. Theor. Comput. Sci., Nancy.

Durrett, R. 2007. Random graph dynamics. Cambridge Series in Statistical and ProbabilisticMathematics. Cambridge: Cambridge University Press.

Eckhoff, M., Goodman, J., Hofstad, R. van der, and Nardi, F.R. 2013. Short paths for firstpassage percolation on the complete graph. J. Stat. Phys., 151(6), 1056–1088.

Erdos, P., and Renyi, A. 1959. On random graphs. I. Publ. Math. Debrecen, 6, 290–297.

Erdos, P., and Renyi, A. 1960. On the evolution of random graphs. Magyar Tud. Akad. Mat.Kutato Int. Kozl., 5, 17–61.

Erdos, P., and Renyi, A. 1961a. On the evolution of random graphs. Bull. Inst. Internat. Statist.,38, 343–347.

Erdos, P., and Renyi, A. 1961b. On the strength of connectedness of a random graph. ActaMath. Acad. Sci. Hungar., 12, 261–267.

Esker, H. van den, Hofstad, R. van der, Hooghiemstra, G., and Znamenski, D. 2006. Distancesin random graphs with infinite mean degrees. Extremes, 8, 111–140.

Esker, H. van den, Hofstad, R. van der, and Hooghiemstra, G. 2008. Universality for the distancein finite variance random graphs. J. Stat. Phys., 133(1), 169–202.

450 References

Faloutsos, C., Faloutsos, P., and Faloutsos, M. 1999. On power-law relationships of the internettopology. Computer Communications Rev., 29, 251–262.

Federico, L., and van der Hofstad, R. 2017. Critical window for connectivity in the configurationmodel. Combin. Probab. Comput., 26(5), 660–680.

Fernholz, D., and Ramachandran, V. 2007. The diameter of sparse random graphs. RandomStructures Algorithms, 31(4), 482–516.

Fienberg, S., and Wasserman, S. 1981. Categorical data analysis of single sociometric relations.Sociological methodology, 12, 156–192.

Fill, J., Scheinerman, E., and Singer-Cohen, K. 2000. Random intersection graphs when m =ω(n): an equivalence theorem relating the evolution of the G(n,m, p) and G(n, p) models.Random Structures Algorithms, 16(2), 156–176.

Flaxman, A., Frieze, A., and Vera, J. 2006. A geometric preferential attachment model ofnetworks. Internet Math., 3(2), 187–205.

Flaxman, A., Frieze, A., and Vera, J. 2007. A geometric preferential attachment model ofnetworks II. In: Proceedings of WAW 2007.

Fountoulakis, N. 2015. On a geometrization of the Chung-Lu model for complex networks. J.Complex Netw., 3(3), 361–387.

Fountoulakis, Nikolaos, van der Hoorn, Pim, Muller, Tobias, and Schepers, Markus. 2020. Clus-tering in a hyperbolic model of complex networks. Preprint arXiv:2003.05525 [math.PR].

Frank, O., and Strauss, D. 1986. Markov graphs. Journ. of the American Stat. Assoc., 81(395),832–842.

Friedrich, T., and Krohmer, A. 2015. On the diameter of hyperbolic random graphs. Pages614–625 of: Automata, languages, and programming. Part II. Lecture Notes in Comput. Sci.,vol. 9135. Springer, Heidelberg.

Friedrich, T., and Krohmer, A. 2018. On the diameter of hyperbolic random graphs. SIAM J.Discrete Math., 32(2), 1314–1334.

Gao, P., and Wormald, N. 2016. Enumeration of graphs with a heavy-tailed degree sequence.Adv. Math., 287, 412–450.

Gao, P., Hofstad, R. van der, Soutwell, A., and Stegehuis, C. 2018. Counting triangles in power-law uniform random graphs. Available at arXiv:1812.04289 [math.PR].

Garavaglia, A., van der Hofstad, R., and Woeginger, G. 2017. The dynamics of power laws:fitness and aging in preferential attachment trees. J. Stat. Phys., 168(6), 1137–1179.

Garavaglia, A., van der Hofstad, R., and Litvak, N. 2020. Local weak convergence for PageRank.Ann. Appl. Probab., 30(1), 40–79.

Gilbert, E. N. 1959. Random graphs. Ann. Math. Statist., 30, 1141–1144.

Gleiser, P., and Danon, L. 2003. Community Structure in Jazz. Advances in Complex Systems,06(04), 565–573.

Godehardt, E., and Jaworski, J. 2003. Two models of random intersection graphs for classifi-cation. Pages 67–81 of: Exploratory data analysis in empirical research. Stud. ClassificationData Anal. Knowledge Organ. Berlin: Springer.

Goni, J., Esteban, F., de Mendizabal, N., Sepulcre, J., Ardanza-Trevijano, S., Agirrezabal, I.,and Villoslada, P. 2008. A computational analysis of protein-protein interaction networks inneurodegenerative diseases. BMC Systems Biology, 2(1), 52.

Grimmett, G. 1999. Percolation. 2nd edn. Berlin: Springer.

Gugelmann, L., Panagiotou, K., and Peter, U. 2012. Random hyperbolic graphs: degree sequenceand clustering. Pages 573–585 of: International Colloquium on Automata, Languages, andProgramming. Springer.

Gulikers, L., Lelarge, M., and Massoulie, L. 2017a. Non-backtracking spectrum of degree-corrected stochastic block models. Pages Art. No. 44, 27 of: 8th Innovations in TheoreticalComputer Science Conference. LIPIcs. Leibniz Int. Proc. Inform., vol. 67. Schloss Dagstuhl.Leibniz-Zent. Inform., Wadern.

References 451

Gulikers, L., Lelarge, M., and Massoulie, L. 2017b. A spectral method for community detectionin moderately sparse degree-corrected stochastic block models. Adv. in Appl. Probab., 49(3),686–721.

Gulikers, L., Lelarge, M., and Massoulie, L. 2018. An impossibility result for reconstruction inthe degree-corrected stochastic block model. Ann. Appl. Probab., 28(5), 3002–3027.

Hajek, B., and Sankagiri, S. 2018. Community Recovery in a Preferential Attachment Graph.Preprint arXiv:1801.06818 [stat.ML].

Hajra, K.B., and Sen, P. 2005. Aging in citation networks. Physica A: Statistical Mechanicsand its Applications, 346(1-2), 44–48.

Hajra, K.B., and Sen, P. 2006. Modelling aging characteristics in citation networks. Physica A:Statistical Mechanics and its Applications, 368(2), 575–582.

Hall, P. 1981. Order of magnitude of moments of sums of random variables. J. London MathSoc., 24(2), 562–568.

Halmos, P. 1950. Measure Theory. D. Van Nostrand Company, Inc., New York, N. Y.Hardy, G. H., Littlewood, J. E., and Polya, G. 1988. Inequalities. Cambridge Mathematical

Library. Cambridge: Cambridge University Press. Reprint of the 1952 edition.Harris, T. 1963. The theory of branching processes. Die Grundlehren der Mathematischen

Wissenschaften, Bd. 119. Berlin: Springer-Verlag.Hatami, H., and Molloy, M. 2010. The scaling window for a random graph with a given degree se-

quence. Pages 1403–1411 of: Proceedings of the Twenty-First Annual ACM-SIAM Symposiumon Discrete Algorithms. Society for Industrial and Applied Mathematics.

Heydenreich, Markus, Hulshof, Tim, and Jorritsma, Joost. 2016. Structures in supercriticalscale-free percolation. Preprint arXiv:1604.08180 [math.PR], to appear in Ann. Appl. Probab.

Hirsch, C. 2017. From heavy-tailed Boolean models to scale-free Gilbert graphs. Braz. J. Probab.Stat., 31(1), 111–143.

Hofstad, R. van der. 2017. Random graphs and complex networks. Volume 1. Cambridge Seriesin Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge.

Hofstad, R. van der. 2018+. Random graphs and complex networks. Vol. 2. In preparation, seehttp://www.win.tue.nl/∼rhofstad/NotesRGCNII.pdf.

Hofstad, R. van der, and Komjathy, J. 2017. When is a scale-free graph ultra-small? J. Stat.Phys., 169(2), 223–264.

Hofstad, R. van der, and Litvak, N. 2014. Degree-degree dependencies in random graphs withheavy-tailed degrees. Internet Math., 10(3-4), 287–334.

Hofstad, R. van der, Hooghiemstra, G., and Van Mieghem, P. 2005. Distances in random graphswith finite variance degrees. Random Structures Algorithms, 27(1), 76–123.

Hofstad, R. van der, Hooghiemstra, G., and Znamenski, D. 2007a. Distances in random graphswith finite mean and infinite variance degrees. Electron. J. Probab., 12(25), 703–766 (elec-tronic).

Hofstad, R. van der, Hooghiemstra, G., and Znamenski, D. 2007b. A phase transition for thediameter of the configuration model. Internet Math., 4(1), 113–128.

Hofstad, R. van der, Leeuwaarden, J.H.S. van, and Stegehuis, C. 2015. Hierarchical configurationmodel.

Hofstad, R. van der, Janson, S., and Luczak, M. 2016. Component structure of the configurationmodel: barely supercritical case. Preprint arXiv:1611.05728 [math.PR].

Hofstad, R. van der, Komjathy, J., and Vadon, V. 2019a. Phase transition in random intersectiongraphs with communities.

Hofstad, R. van der, Komjathy, J., and Vadon, V. 2019b. Random intersection graphs withcommunities.

Holland, P., Laskey, K., and Leinhardt, S. 1983. Stochastic blockmodels: first steps. SocialNetworks, 5(2), 109–137.

Hoorn, P. van der, and Olvera-Cravioto, M. 2018. Typical distances in the directed configurationmodel. Ann. Appl. Probab., 28(3), 1739–1792.

Howes, N. 1995. Modern analysis and topology. Universitext. Springer-Verlag, New York.

452 References

Janson, S. 2008. The largest component in a subcritical random graph with a power law degreedistribution. Ann. Appl. Probab., 18(4), 1651–1668.

Janson, S. 2009a. On percolation in random graphs with given vertex degrees. Electron. J.Probab., 14, no. 5, 87–118.

Janson, S. 2009b. Standard representation of multivariate functions on a general probabilityspace. Electron. Commun. Probab., 14, 343–346.

Janson, S. 2010a. Asymptotic equivalence and contiguity of some random graphs. RandomStructures Algorithms, 36(1), 26–45.

Janson, S. 2010b. Susceptibility of random graphs with given vertex degrees. J. Comb., 1(3-4),357–387.

Janson, S., and Luczak, M. 2007. A simple solution to the k-core problem. Random StructuresAlgorithms, 30(1-2), 50–62.

Janson, S., and Luczak, M. 2009. A new approach to the giant component problem. RandomStructures Algorithms, 34(2), 197–216.

Janson, S., Luczak, T., and Rucinski, A. 2000. Random graphs. Wiley-Interscience Series inDiscrete Mathematics and Optimization. Wiley-Interscience, New York.

Jaworski, J., Karonski, M., and Stark, D. 2006. The degree of a typical vertex in generalizedrandom intersection graph models. Discrete Math., 306(18), 2152–2165.

Jaynes, E. T. 1957. Information theory and statistical mechanics. Phys. Rev. (2), 106, 620–630.Jordan, J. 2013. Geometric preferential attachment in non-uniform metric spaces. Electron. J.

Probab., 18, no. 8, 15.Jorritsma, J., and Komjathy, J. Prepr.(2020). Distance evolutions in growing preferential at-

tachment graphs. Preprint arXiv:2003.08856 [math.PR].Joseph, Adrien. 2014. The component sizes of a critical random graph with given degree se-

quence. Ann. Appl. Probab., 24(6), 2560–2594.Kallenberg, O. 2002. Foundations of modern probability. Second edn. Probability and its Ap-

plications (New York). New York: Springer-Verlag.Karonski, M., Scheinerman, E., and Singer-Cohen, K. 1999. On random intersection graphs: the

subgraph problem. Combin. Probab. Comput., 8(1-2), 131–159. Recent trends in combina-torics (Matrahaza, 1995).

Karrer, B., and Newman, M. E. J. 2011. Stochastic blockmodels and community structure innetworks. Physical Review E, 83(1), 016107.

Kass, R.E., and Wasserman, L. 1996. The selection of prior distributions by formal rules. Journalof the American statistical Association, 91(435), 1343–1370.

Kesten, H., and Stigum, B. P. 1966. A limit theorem for multidimensional Galton-Watsonprocesses. Ann. Math. Statist., 37, 1211–1223.

Kingman, J. F. C. 1975. The first birth problem for an age-dependent branching process. Ann.Probability, 3(5), 790–801.

Kiwi, M., and Mitsche, D. 2015. A bound for the diameter of random hyperbolic graphs.Pages 26–39 of: 2015 Proceedings of the Twelfth Workshop on Analytic Algorithmics andCombinatorics (ANALCO). SIAM, Philadelphia, PA.

Kiwi, M., and Mitsche, D. 2019. On the second largest component of random hyperbolic graphs.SIAM J. Discrete Math., 33(4), 2200–2217.

Komjathy, J, and Lodewijks, B. 2020. Explosion in weighted hyperbolic random graphs andgeometric inhomogeneous random graphs. Stochastic Process. Appl., 130(3), 1309–1367.

Krioukov, D., Papadopoulos, F., Kitsak, M., Vahdat, A., and Boguna, M. 2010. Hyperbolicgeometry of complex networks. Phys. Rev. E (3), 82(3), 036106, 18.

Krioukov, D., Kitsak, M., Sinkovits, R., Rideout, D., Meyer, D., and Boguna, M. 2012. Networkcosmology. Scientific reports, 2.

Krzakala, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborova, L., and Zhang, P. 2013.Spectral redemption in clustering sparse networks. Proceedings of the National Academy ofSciences, 110(52), 20935–20940.

Kunegis, J. 2017. The Koblenz Network Collection.

References 453

Kurauskas, V. 2015. On local weak limit and subgraph counts for sparse random graphs. PreprintarXiv: 1504.08103 [math.PR].

Lee, J., and Olvera-Cravioto, M. 2017. PageRank on inhomogeneous random digraphs. PreprintarXiv:1707.02492 [math.PR].

Leskela, L. 2019. Random graphs and network statistics. Available athttp://math.aalto.fi/∼lleskela/LectureNotes004.html.

Leskovec, J., and Krevl, A. 2014 (Jun). SNAP Datasets: Stanford Large Network Dataset Col-lection. http://snap.stanford.edu/data.

Leskovec, J., Kleinberg, J., and Faloutsos, C. 2007. Graph evolution: Densification and shrinkingdiameters. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 2.

Leskovec, J., Lang, K., Dasgupta, A., and Mahoney, M. 2009. Community structure in largenetworks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math-ematics, 6(1), 29–123.

Litvak, N., and Hofstad, R. van der. 2013. Uncovering disassortativity in large scale-free net-works. Physical Review E, 87(2), 022801.

Luczak, T. 1992. Sparse random graphs with a given degree sequence. Pages 165–182 of: Randomgraphs, Vol. 2 (Poznan, 1989). Wiley-Intersci. Publ. Wiley, New York.

Massoulie, L. 2014. Community detection thresholds and the weak Ramanujan property. Pages694–703 of: STOC’14—Proceedings of the 2014 ACM Symposium on Theory of Computing.ACM, New York.

McKay, B.D. 1981. Subgraphs of random graphs with specified degrees. Congressus Numeran-tium, 33, 213–223.

McKay, B.D. 2011. Subgraphs of Random Graphs with Specified Degrees. In: Proceedings of theInternational Congress of Mathematicians 2010 (ICM 2010). Published by Hindustan BookAgency (HBA), India. WSPC Distribute for All Markets Except in India.

McKay, Brendan D, and Wormald, Nicholas C. 1990. Asymptotic enumeration by degree se-quence of graphs of high degree. European Journal of Combinatorics, 11(6), 565–580.

Molloy, M., and Reed, B. 1995. A critical point for random graphs with a given degree sequence.Random Structures Algorithms, 6(2-3), 161–179.

Molloy, M., and Reed, B. 1998. The size of the giant component of a random graph with a givendegree sequence. Combin. Probab. Comput., 7(3), 295–305.

Moore, C., and Newman, M. E. J. 2000. Epidemics and percolation in small-world networks.Phys. Rev. E, 61, 5678–5682.

Mossel, E., Neeman, J., and Sly, A. 2015. Reconstruction and estimation in the planted partitionmodel. Probab. Theory Related Fields, 162(3-4), 431–461.

Mossel, E., Neeman, J., and Sly, A. 2016. Belief propagation, robust reconstruction and optimalrecovery of block models. Ann. Appl. Probab., 26(4), 2211–2256.

Mossel, E., Neeman, J., and Sly, A. 2018. A proof of the block model threshold conjecture.Combinatorica, 38(3), 665–708.

Nachmias, A., and Peres, Y. 2010. Critical percolation on random regular graphs. RandomStructures Algorithms, 36(2), 111–148.

Newman, M. E. J. 2003. Properties of highly clustered networks. Physical Review E, 68(2),026121.

Newman, M. E. J. 2009. Random Graphs with Clustering. Phys. Rev. Lett., 103(Jul), 058701.Newman, M. E. J. 2010. Networks: an introduction. Oxford University Press.Newman, M. E. J., and Park, J. 2003. Why social networks are different from other types of

networks. Physical Review E, 68(3), 036122.Newman, M. E. J., and Watts, D.J. 1999. Scaling and percolation in the small-world network

model. Phys. Rev. E, 60, 7332–7344.Newman, M. E. J., Moore, C., and Watts, D. J. 2000a. Mean-field solution of the small-world

network model. Phys. Rev. Lett., 84, 3201–3204.Newman, M. E. J., Strogatz, S., and Watts, D. 2000b. Random graphs with arbitrary degree

distribution and their application. Phys. Rev. E, 64, 026118, 1–17.

http://snap.stanford.edu/data

454 References

Newman, M. E. J., Strogatz, S., and Watts, D. 2002. Random graph models of social networks.Proc. Nat. Acad. Sci., 99, 2566–2572.

Newman, M. E. J., Watts, D. J., and Barabasi, A.-L. 2006. The Structure and Dynamics ofNetworks. Princeton Studies in Complexity. Princeton University Press.

Norros, I., and Reittu, H. 2006. On a conditionally Poissonian graph process. Adv. in Appl.Probab., 38(1), 59–75.

Pemantle, R. 2007. A survey of random processes with reinforcement. Probab. Surv., 4, 1–79(electronic).

Pickands III, J. 1968. Moment convergence of sample extremes. Ann. Math. Statistics, 39,881–889.

Pittel, B. 1994. Note on the heights of random recursive trees and random m-ary search trees.Random Structures Algorithms, 5(2), 337–347.

Price, D. J. de S. 1965. Networks of Scientific Papers. Science, 149, 510–515.Price, Derek J. 1986. Little science, big science... and beyond. Columbia University Press New

York.Reittu, H., and Norros, I. 2004. On the power law random graph model of massive data networks.

Performance Evaluation, 55(1-2), 3–23.Riordan, O. 2012. The phase transition in the configuration model. Combinatorics, Probability

and Computing, 21, 1–35.Riordan, O., and Wormald, N. 2010. The diameter of sparse random graphs. Combinatorics,

Probability & Computing, 19(5-6), 835–926.Ross, S. M. 1996. Stochastic processes. Second edn. Wiley Series in Probability and Statistics:

Probability and Statistics. New York: John Wiley & Sons Inc.Rudas, A., Toth, B., and Valko, B. 2007. Random trees and general branching processes.

Random Structures Algorithms, 31(2), 186–202.Rudin, W. 1987. Real and complex analysis. New York: McGraw-Hill Book Co.Rudin, W. 1991. Functional analysis. International Series in Pure and Applied Mathematics.

New York: McGraw-Hill Inc.Rybarczyk, K. 2011. Diameter, connectivity, and phase transition of the uniform random inter-

section graph. Discrete Math., 311(17), 1998–2019.Schuh, H.-J., and Barbour, A. D. 1977. On the asymptotic behaviour of branching processes

with infinite mean. Advances in Appl. Probability, 9(4), 681–723.Seneta, E. 1973. The simple branching process with infinite mean. I. J. Appl. Probability, 10,

206–212.Shannon, C. E. 1948. A mathematical theory of communication. Bell System Tech. J., 27,

379–423, 623–656.Shore, John E., and Johnson, Rodney W. 1980. Axiomatic derivation of the principle of max-

imum entropy and the principle of minimum cross-entropy. IEEE Trans. Inform. Theory,26(1), 26–37.

Singer, K. 1996. Random intersection graphs. ProQuest LLC, Ann Arbor, MI. Thesis (Ph.D.)–The Johns Hopkins University.

Smythe, R., and Mahmoud, H. 1994. A survey of recursive trees. Teor. Imovır. Mat. Stat., 1–29.Snijders, T.A., Pattison, P., Robbins, G., and Handcock, M. 2006. New specifications for expo-

nential random graph models. Sociological Methodology, 36(1), 99–153.Soderberg, B. 2002. General formalism for inhomogeneous random graphs. Phys. Rev. E (3),

66(6), 066121, 6.Soderberg, B. 2003a. Properties of random graphs with hidden color. Phys. Rev. E (3), 68(2),

026107, 12.Soderberg, B. 2003b. Random graph models with hidden color. Acta Physica Polonica B, 34,

5085–5102.Soderberg, B. 2003c. Random graphs with hidden color. Phys. Rev. E (3), 68(1), 015102, 4.Solomonoff, R., and Rapoport, A. 1951. Connectivity of random nets. Bull. Math. Biophys.,

13, 107–117.

References 455

Stark, D. 2004. The vertex degree distribution of random intersection graphs. Random StructuresAlgorithms, 24(3), 249–258.

Stegehuis, C., Hofstad, R. van der, and Leeuwaarden, J.S.H van. 2016a. Epidemic spreading oncomplex networks with community structures. Scientific Reports, 6, 29748.

Stegehuis, C., Hofstad, R. van der, and Leeuwaarden, J.S.H van. 2016b. Power-law relations inrandom networks with communities. Physical Review E, 94, 012302.

Sundaresan, S., Fischhoff, I., Dushoff, J., and Rubenstein, D. 2007. Network metrics revealdifferences in social organization between two fission–fusion species, Grevy’s zebra and onager.Oecologia, 151(1), 140–149.

Turova, T. 2013. Diffusion approximation for the components in critical inhomogeneous randomgraphs of rank 1. Random Structures Algorithms, 43(4), 486–539.

Turova, T.S. 2011. The largest component in subcriticalinhomogeneous random graphs. Com-binatorics, Probability and Computing, 20(01), 131–154.

Turova, T.S., and Vallier, T. 2010. Merging percolation on Zd and classical random graphs:Phase transition. Random Structures Algorithms, 36(2), 185–217.

Ugander, J., Karrer, B., Backstrom, L., and Marlow, C. 2011. The anatomy of the Facebooksocial graph. Preprint arXiv:1111.4503 [cs.SI].

Vadon, V., Komjathy, J., and van der Hofstad, R. 2019. A new model for overlapping commu-nities with arbitrary internal structure. Applied Network Science, 4(1), 42.

Wang, D., Song, C., and Barabasi, A. L. 2013. Quantifying Long-Term Scientific Impact. Science,342(6154), 127–132.

Wang, J., Mei, Y., and Hicks, D. 2014. Comment on ”Quantifying long-term scientific impact”.Science, 345(6193), 149.

Wang, M., Yu, G., and Yu, D. 2008. Measuring the preferential attachment mechanism incitation networks. Physica A: Statistical Mechanics and its Applications, 387(18), 4692 –4698.

Wang, M., Yu, G., and Yu, D. 2009. Effect of the age of papers on the preferential attachment incitation networks. Physica A: Statistical Mechanics and its Applications, 388(19), 4273–4276.

Wasserman, S., and Pattison, P. 1996. Logit models and logistic regressions for social networks.Psychometrika, 61(3), 401–425.

Watts, D. J. 1999. Small worlds. The dynamics of networks between order and randomness.Princeton Studies in Complexity. Princeton, NJ: Princeton University Press.

Watts, D. J. 2003. Six degrees. The science of a connected age. New York: W. W. Norton &Co. Inc.

Watts, D. J., and Strogatz, S. H. 1998. Collective dynamics of ‘small-world’ networks. Nature,393, 440–442.

Wormald, N. C. 1981. The asymptotic connectivity of labelled regular graphs. J. Combin.Theory Ser. B, 31(2), 156–167.

Yukich, J. E. 2006. Ultra-small scale-free geometric networks. J. Appl. Probab., 43(3), 665–677.Zhao, Y., Levina, E., and Zhu, J. 2012. Consistency of community detection in networks under

degree-corrected stochastic block models. Ann. Statist., 40(4), 2266–2292.

Index

Albert-Barabasi model, 188

Box, George E.P., 344“All models are wrong but some are

useful”, 344Branching process

delayed, 272infinite mean, 257marked, 106multitype, 93two-stage, 106

Central limit theorem, 195Chung-Lu model

attack vulnerability, 112fluctuations of typical distance, 238

Citation networks, 340exponential growth, 340power-law degree sequences, 340

Clique, 216Clustering coefficient, 52Collaboration graph

random intersection graph, 358Community structure network, 339

macroscopic communities, 339microscopic communities, 339

Complexity, 110Configuration model, 121, 245

average distance, 246clustering, 356, 357connectivity, 145core, 253diameter, 263diameter core, 253directed, 347distance, 246forward degree distribution, 123household structure, 356near-critical behavior, 151path counting, 248phase transition, 127

Connected componentcomplexity, 110

Connectivityconfiguration model, 145

De Finetti’s Theorem, 160Death process, 133Directed network, 339

arc, 340Distribution

Bernoulli, 160Mixed Poisson, 88

Erdos-Renyi random graphdiameter, 264

Exchangeable random variables, 160

Helly’s Theorem, 162Hopcount, 6Household model, 356

Inhomogeneous random graph, 83diameter, 236fluctuations of typical distance, 238graphical, 85irreducible, 85small world, 202sparsity, 85typical graph distance, 201, 237Ultra-small world, 204

Inhomogeneous random graphsdual kernel, 236duality principle, 236

Internet, 6hopcount, 6

Internet Movie Data baserandom intersection graph, 358

Maclaurin inequality, 249Mixed marked Poisson branching process, 106Mixed Poisson distribution, 88Multitype branching process, 93

irreducible, 94Markov chain, 94singular, 94

Network statisticsclustering coefficient, 52

Polya urn schemelimit theorem, 162, 163

Pathself-avoiding, 206

Path countingconfiguration model, 248

Paths, 206self-avoiding, 206

percolationlong-range, 365

Perron-Frobenius theorem, 94Phase transition

configuration model, 127Preferential attachment model, 159, 279

diameter, 289typical distances, 289

Random graphtypical graph distance, 201

Random intersection graph, 358Random intersection graphs, 358Random variables

exchangeable, 160

457

458 Index

Real-world network, 3Real-world networks

citation networks, 340communities, 339directed, 339

scale-free percolation network, 363Scale-free trees, 280Self-avoiding path, 206Size-biased distribution, 106Small world, 202spatial preferential attachment models, 368Stochastic domination, 106

TheoremDe Finetti, 160Helly, 162Perron-Frobenius, 94

Treeheight, 280

Two-regular graphdiameter, 277longest cycle, 278

Ultra-small world, 202, 204Uniform recursive trees, 331Universality

typical distances, 291

Glossary 459

Todo list

TO DO 1.1: Refer to recent discussion on power-laws in real-world networks. 6To be added! 139Integrate Proposition 5.20 more fluently in the text! 192Incorporate Proposition 5.23 more fluently in the text! 202TO DO 5.1: Add local weak convergence for BPAM 206TO DO 5.2: Extend the discussion of the phase transition for BPAM 208TO DO 5.3: Add further results! 209TO DO 6.1: Add case of distances in real-world network... 219TO DO 9.1: Add a discussion on terrorist networks 361Explain the 1 + 2ρ! 405Update from here! 413TO DO 9.14: Introduce the spatial configuration model and results of Deijfen et

al. 422

461

Random graphs and complex networksrhofstad/NotesRGCNII.pdf1.2 Random graphs and real-world networks9...

Documents

Transcript of Random graphs and complex networksrhofstad/NotesRGCNII.pdf1.2 Random graphs and real-world networks9...