LecturesNetworkSystems Bullo 4jan16
			* The preview only display some random pages of manuals. You can download
				full content via the form below.
			
				
					
						 The preview is being generated... Please wait a
						moment!
					
				 
			 
			
			
			
				- Submitted by: Angel Alberto Castro Lancheros
- File size: 4.3 MB
- File
					type: application/pdf
- Words: 92,042
- Pages: 230
Report / DMCA this file
			
			
				Add to bookmarkDescription
				
					Lectures on Network Systems
Francesco Bullo With scientific contributions from: Jorge Cortés, Florian Dörfler, and Sonia Martínez Version v0.81 (4 Jan 2016). This document is intended for personal use: you are allowed to print and photocopy it. All other rights are reserved, e.g., this document (in whole or in part) may not be posted online or shared in any way without express consent. © 2012-16.
 Preface Topics These lecture notes are intended primarily for graduate students interested in network systems, distributed algorithms, and cooperative control. The objective is to answer basic questions such as: What are fundamental dynamical models of interconnnected systems? What are the essential dynamical properties of these models and how are they related to network properties? What are basic estimation, control, and optimization problems for these dynamical models? The book is organized in two parts: Linear and Nonlinear Systems. The Linear Systems part includes (i) basic concepts and results in matrix theory and graph theory (with an emphasis on Perron– Frobenius theory and algebraic graph theory), (ii) averaging algorithms in discrete and continuous time, described by static, time-varying and stochastic matrices, whereas in the Nonlinear Systems part includes (iii) robotic coordination problems for relative sensing networks, (iv) networks of phase oscillator systems with an emphasis on the Kuramoto model, and (v) virus propagation models, including lumped and network models as well as stochastic and deterministic models. Both parts include motivating examples of network systems and distributed algorithms from sensor, social, robotic and power networks. Books which try to digest, coordinate, get rid of the duplication, get rid of the less fruitful methods and present the underlying ideas clearly of what we know now, will be the things the future generations will value. Richard Hamming (1915-1998), Mathematician The intended audience The intended audience is 1st year graduate students in Engineering, Sciences and Applied Mathematics programs. For the first part on Linear Systems, the required background includes competency in linear algebra and only very basic notions of dynamical systems. For the second part on Nonlinear Systems (including coupled oscillators and virus propagation), the required background includes a calculus course. The treatment is self-contained and does not require a nonlinear systems course. These lecture notes are meant to be taught over a quarter-long course with a total 35 to 40 hours of contact time. On average, each chapter should require approximately 2 hours of lecture time. 3
 4 For the benefit of instructors, these lecture notes are supplemented by two documents. First, a complete Answer Key is available on demand by an instructor. Second, these lecture notes are also available in a “slides” format especially suited for classroom teaching. Acknowledgments I wish to thank Sonia Martínez and Jorge Cortés for their fundamental contribution to my understanding and our joint work on distributed algorithms and robotic networks. Their scientific contribution is most obviously present in Chapters 2, 3, and 4. I am grateful to Noah Friedkin for instructive discussions about social influence networks that influenced Chapter 5, and to Florian Dörfler for his extensive contributions to Chapters 13, 14, and 15 and to a large number of exercises. I am grateful to Alessandro Giua for his detailed comments and suggestions. I wish to thank Sandro Zampieri and Wenjun Mei for their contribution to Chapters 16 and 17 and to Stacy Patterson for adopting an early version of these notes and providing me with detailed feedback. I wish to thank Jason Marden and Lucy Pao for their invite to visit the University of Colorado at Boulder and deliver some of these lecture notes. I also acknowledge the generous support of the Army Research Office through grant W911NF-111-0092 and the National Science Foundation through grants CPS-1035917 and CPS-1135819. Finally, a special thank you goes to all students who took this course and all scientists who read these notes. Particular thanks go to Deepti Kannapan, Peng Jia, Fabio Pasqualetti, Sepehr Seifi, John W. Simpson-Porco, Ashish Cherukuri, Alex Olshevsky, and Vaibhav Srivastava for their contributions to these lecture notes and homework solutions.
Santa Barbara, California, USA 29 Mar 2012 — 4 Jan 2016
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
Francesco Bullo
 Contents I
Linear Systems
1
1 Motivating Problems and Systems 1.1 Social influence networks: opinion dynamics . . . . . . . . . . . . 1.2 Wireless sensor networks: averaging algorithms . . . . . . . . . . . 1.3 Compartmental networks: dynamical flows among compartments . 1.4 Appendix: Robotic networks in cyclic pursuit and balancing . . . . 1.5 Appendix: Design problems in wireless sensor networks . . . . . . 1.5.1 Wireless sensor networks: distributed parameter estimation 1.5.2 Wireless sensor networks: distributed hypothesis testing . . 1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
3 4 5 6 7 9 9 10 12
2 Elements of Matrix Theory 2.1 Linear systems and the Jordan normal form . . . . . . . . . . . . . . . . . . 2.1.1 Discrete-time linear systems . . . . . . . . . . . . . . . . . . . . . . 2.1.2 The Jordan normal form . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Semi-convergence and convergence for discrete-time linear systems 2.2 Row-stochastic matrices and their spectral radius . . . . . . . . . . . . . . . 2.2.1 The spectral radius for row-stochastic matrices . . . . . . . . . . . . 2.3 Perron–Frobenius theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Classification of nonnegative matrices . . . . . . . . . . . . . . . . . 2.3.2 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Applications to dynamical systems . . . . . . . . . . . . . . . . . . . 2.3.4 Selected proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
13 14 14 15 17 18 19 21 21 22 23 25 28
3 Elements of Graph Theory 3.1 Graphs and digraphs . . . . . . . . . . . . . . . . . 3.2 Paths and connectivity in undirected graphs . . . . 3.3 Paths and connectivity in digraphs . . . . . . . . . 3.3.1 Connectivity properties of digraphs . . . . 3.3.2 Periodicity of strongly-connected digraphs 3.3.3 Condensation digraphs . . . . . . . . . . . 3.4 Weighted digraphs . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
35 35 36 37 38 38 39 40
5
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . . .
. . . . . . .
. . . . . . . .
. . . . . . .
. . . . . . . .
. . . . . . .
. . . . . . . .
. . . . . . .
. . . . . . .
 6
Contents 3.5 3.6
Database collections and software libraries . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 The Adjacency Matrix 4.1 The adjacency matrix . . . . . . . . . . . . . . . . . . 4.2 Algebraic graph theory: basic and prototypical results . 4.3 Powers of the adjacency matrix, paths and connectivity 4.4 Graph theoretical properties of primitive matrices . . . 4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . .
41 43
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
45 45 46 46 48 50
5 Discrete-time Averaging Systems 5.1 Averaging with primitive row-stochastic matrices . . . . . . . . . . . . . 5.2 Averaging with reducible matrices . . . . . . . . . . . . . . . . . . . . . 5.3 Averaging with reducible matrices and multiple sinks . . . . . . . . . . . 5.4 Design of weights for undirected graphs: the equal-neighbor model . . . 5.5 Design of weights for undirected graphs: the Metropolis–Hastings model 5.6 Centrality measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
53 53 54 56 58 59 60 65
6 The Laplacian Matrix 6.1 The Laplacian matrix . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 The Laplacian in mechanical networks of springs . . . . . . . . . . 6.3 The Laplacian in electrical networks of resistors . . . . . . . . . . . 6.4 Properties of the Laplacian matrix . . . . . . . . . . . . . . . . . . 6.5 Graph connectivity and the rank of the Laplacian . . . . . . . . . . 6.6 The algebraic connectivity, its eigenvector, and graph partitioning 6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
69 69 70 71 72 73 75 78
. . . . . . .
83 83 84 85 86 88 89 93
. . . . . .
95 95 96 97 97 98 100
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
7 Continuous-time Averaging Systems 7.1 Example #1: Flocking behavior for a group of animals . . . . . . 7.2 Example #2: A simple RC circuit . . . . . . . . . . . . . . . . . . 7.3 Continuous-time linear systems and their convergence properties 7.4 The Laplacian flow . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Design of weight-balanced digraphs from strongly-connected . . 7.6 Distributed optimization using the Laplacian flow . . . . . . . . . 7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 The Incidence Matrix and Relative Measurements 8.1 The incidence matrix . . . . . . . . . . . . . . . . . . . . 8.2 Properties of the incidence matrix . . . . . . . . . . . . . 8.3 Distributed estimation from relative measurements . . . . 8.3.1 Problem statement . . . . . . . . . . . . . . . . . . 8.3.2 Optimal estimation via centralized computation . . 8.3.3 Optimal estimation via decentralized computation
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . .
. . . . . . . . . . . . .
. . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
 Contents 8.4 8.5
7
Cycle and cutset spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9 Compartmental and Positive Systems 9.1 Introduction and example systems . . . . . . . . . . . . . . . . . . 9.2 Compartmental systems . . . . . . . . . . . . . . . . . . . . . . . 9.3 Positive systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Table of asymptotic behaviors for averaging and positive systems . 9.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Convergence Rates, Scalability and Optimization 10.1 Some preliminary calculations and observations . . 10.2 Convergence factors for row-stochastic matrices . 10.3 Cumulative quadratic index for symmetric matrices 10.4 Circulant network examples and scalability analysis 10.5 Design of fastest distributed averaging . . . . . . . 10.6 Exercises . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
107 107 108 113 116 117
. . . . . .
. . . . . .
. . . . . .
. . . . . .
119 119 120 123 125 126 128
11 Time-varying Averaging Algorithms 11.1 Examples and models of time-varying discrete-time algorithms . . . . . . . . . . 11.1.1 Shared Communication Channel . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Asynchronous Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.3 Models of time-varying averaging algorithms . . . . . . . . . . . . . . . . 11.2 Convergence over time-varying connected graphs . . . . . . . . . . . . . . . . . 11.3 Convergence over digraphs connected over time . . . . . . . . . . . . . . . . . . 11.3.1 Shared communication channel with round robin scheduling . . . . . . . 11.3.2 Convergence theorems for symmetric time-varying algorithms . . . . . . 11.3.3 Uniform connectivity is required for non-symmetric matrices . . . . . . . 11.4 Analysis methods and proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.1 Bounded solutions and non-increasing max-min function . . . . . . . . . 11.4.2 Proof of Theorem 11.2: the max-min function is exponentially decreasing 11.5 Time-varying algorithms in continuous-time . . . . . . . . . . . . . . . . . . . . 11.5.1 Undirected graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.2 Directed graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
129 129 129 130 130 131 132 132 133 133 134 135 136 137 137 139 141
. . . . . .
143 143 144 144 145 146 148
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
12 Randomized Averaging Algorithms 12.1 Examples of randomized averaging algorithms . . . . . . . . . . . . 12.2 A brief review of probability theory . . . . . . . . . . . . . . . . . 12.3 Randomized averaging algorithms . . . . . . . . . . . . . . . . . . 12.3.1 Additional results on uniform symmetric gossip algorithms 12.3.2 Additional results on the mean-square convergence factor . 12.4 Table of asymptotic behaviors for averaging systems . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
 8
II
Contents
Nonlinear Systems
149
13 Nonlinear Systems and Robotic Coordination 13.1 Coordination in relative sensing networks . . . . . . . . . . . . 13.2 Stability theory for dynamical systems . . . . . . . . . . . . . . 13.2.1 Main convergence tool: the LaSalle Invariance Principle 13.2.2 Application #1: Linear and linearized systems . . . . . . 13.2.3 Application #2: Negative gradient systems . . . . . . . . 13.3 A nonlinear rendezvous problem . . . . . . . . . . . . . . . . . 13.4 Flocking and Formation Control . . . . . . . . . . . . . . . . . 13.5 Rigidity and stability of the target formation . . . . . . . . . . 13.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
151 151 154 156 156 158 159 160 163 168
14 Coupled Oscillators: Basic Models 14.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.1 Example #1: A spring network on a ring . . . . . . . . . . . . 14.2.2 Example #2: The “structure-preserving” power network model 14.2.3 Example #3: Flocking, schooling, and vehicle coordination . . 14.3 Coupled phase oscillator networks . . . . . . . . . . . . . . . . . . . . 14.3.1 The geometry of the circle and the torus . . . . . . . . . . . . 14.3.2 Synchronization notions . . . . . . . . . . . . . . . . . . . . . . 14.3.3 Preliminary results . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.4 The order parameter and the mean field model . . . . . . . . . 14.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
171 171 172 172 173 174 175 176 177 177 179 180
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
15 Networks of Coupled Oscillators 181 15.1 Synchronization of identical oscillators . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 15.1.1 An averaging-based approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 15.1.2 The potential landscape, convergence and phase synchronization . . . . . . . . 182 15.1.3 Phase balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 15.2 Synchronization of heterogeneous oscillators . . . . . . . . . . . . . . . . . . . . . . . 184 15.2.1 Synchronization of heterogeneous oscillators over complete homogeneous graphs 185 15.2.2 Synchronization of heterogeneous oscillators over weighted undirected graphs 187 15.2.3 Appendix: alternative theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 15.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 16 Virus Propagation: Basic Models 16.1 The SI model . . . . . . . . . 16.2 The SIR model . . . . . . . . 16.3 The SIS model . . . . . . . . 16.4 Exercises . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
191 191 193 195 197
 Contents
9
17 Virus Propagation in Contact Networks 17.1 The stochastic network SI model . . . 17.2 The network SI model . . . . . . . . . 17.3 The network SIS model . . . . . . . . 17.4 The network SIR model . . . . . . . . 17.5 Exercises . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Bibliography
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
199 199 202 204 207 209 211
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 10
Contents
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Part I
Linear Systems
1
 Chapter 1
Motivating Problems and Systems In this introductory chapter, we introduce some example problems and systems from multiple disciplines. The objective is to motivate our interest for distributed systems, algorithms and control. We look at the following examples: (i) In the context of social influence networks, we discuss a classic reference on how opinions evolve and possibly reach a consensus in groups of individuals. Here, consensus means that the opinions of the individuals are identical. (ii) In the context of wireless sensor networks, we discuss distributed simple averaging algorithms and, in the appendix, two advanced design problems in the context of parameter estimation and hypothesis testing. (iii) In the context of compartmental networks, we discuss dynamical flows among compartments, such as arising in ecosystems. (iv) Finally, in the context of robotic networks: we discuss simple robotic behaviors for cyclic pursuit and balancing. In all cases we are interested in presenting the basic models and motivating interest in understanding their dynamic behaviors, such as the existence and attractivity of equilibria. We present additional linear in later chapters and nonlinear examples in the second part. For a similar valuable list of related and instructive examples, we refer to (Hendrickx 2008, Chapter 9) and (Garin and Schenato 2010, Section 3.3). Other examples of multi-agent systems and applications can be found in (Bullo et al. 2009; Fuhrmann and Helmke 2015; Mesbahi and Egerstedt 2010).
3
 4
Chapter 1. Motivating Problems and Systems
1.1
Social influence networks: opinion dynamics
This example is an illustration of the rich literature on opinion dynamics, starting with the early works by French (1956), Harary (1959), and DeGroot (1974). Specifically, we adopt the setup quite literally from (DeGroot 1974). We consider a group of n individuals who must act together as a team. Each individual has his own subjective probability distribution Fi for the unknown value of some parameter (or more simply an estimate of the parameter). We assume now that individual i is appraised of the distribution Fj of each other member j 6= i of the group. Then the DeGroot model predicts that the individual will revise its distribution to be: Fi+ =
n X
aij Fj ,
Figure 1.1: Interactions in a social influence network
j=1
where aij denotes the weight that individual i assigns to the distribution of individual j when he carries out this revision. More precisely, the coefficient aii describes the attachment of individual i to its own opinion and aij , j 6= i, is an interpersonal influence weight that individual i accords to individual j. In the DeGroot model, the coefficients aij satisfy the following constraints: they are nonnegative, that Pm is, aij ≥ 0, and, for each individual, the sum of self-weight and accorded weights equals 1, that is, j=1 aij = 1 for all i. In mathematical terms, the matrix 
a11  .. A= .
an1
 . . . a1n ..  .. . .  . . . ann
has nonnegative entries and each of its rows has unit sum. Such matrices are said to be row-stochastic. Questions of interest are: (i) Is this model of human opinion dynamics believable at all? (ii) How does one measure the coefficients aij ? (iii) Under what conditions do the distributions converge to consensus? What is this value? (iv) What are more realistic, empirically-motivated models, possibly including stubborn individuals or antagonistic interactions? Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 1.2. Wireless sensor networks: averaging algorithms
1.2
5
Wireless sensor networks: averaging algorithms
sensor node gateway node Figure 1.2: A wireless sensor network composed of a collection of spatially-distributed sensors in a field and a gateway node to carry information to an operator. The nodes are meant to measure environmental variables, such as temperature, sound, pressure, and cooperatively filter and transmit the information to an operator.
A wireless sensor network is a collection of spatially-distributed devices capable of measuring physical and environmental variables (e.g., temperature, vibrations, sound, light, etc), performing local computations, and transmitting information throughout the network (including, possibly, an external operator). Suppose that each node in a wireless sensor network has measured a scalar environmental quantity, say xi . Consider the following simplest distributed algorithm, based on the concepts of linear averaging: each node repeatedly executes  x+ i := average xi , {xj , for all neighbor nodes j} ,
(1.1)
where x+ For example, for the graph in Figure 1.3, one i denotes the new value of xi . + + can easily write x1 := (x1 + x2 )/2, x2 := (x1 + x2 + x3 + x4 )/4, and so forth. In summary, the algorithm’s behavior is described 3 4 by   1/2 1/2 0 0 1/4 1/4 1/4 1/4  x+ =  1 2  0 1/3 1/3 1/3 x = Awsn x, 0 1/3 1/3 1/3 Figure 1.3: Example graph where the matrix Awsn in equation is again row-stochastic. Questions of interest are: (i) Does each node converge to a value? Is this value the same for all nodes? (ii) Is this value equal to the average of the initial conditions? (iii) What properties do the graph and the corresponding matrix need to have in order for the algorithm to converge? (iv) How quick is the convergence? Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 6
Chapter 1. Motivating Problems and Systems
1.3
Compartmental networks: dynamical flows among compartments
Compartmental systems model dynamical processes characterized by conservation laws (e.g., mass, fluid, energy) and by the flow of material between units known as compartments. The flow of energy and nutrients (water, nitrates, phosphates, etc) in ecosystems is typically studied using compartmental modelling. For example, Figure 1.4 illustrates a widely-cited water flow model for a desert ecosystem (Noy-Meir 1973). precipitation
soil
evaporation, drainage, runo↵
uptake drinking
plants
transpiration
herbivory animals
evaporation
Figure 1.4: Water flow model for a desert ecosystem. The blue line denotes an inflow from the outside environment. The red lines denote outflows into the outside environment.
If we let qi denote the amount of material in compartment i, the mass balance equation for the ith compartment is written as: X q˙i = (Fj→i − Fi→j ) − Fi→0 + ui , j6=i
where ui is the inflow from the environment and Fi→0 is the outflow into the environment. We now assume linear flows, that is, we assume that the flow Fi→j from node i to node j (as well as to the environment) is proportional to the mass quantity at i, that is, Fi→j = fij qi for a positive flow rate constant fij . Therefore we can write X q˙i = (fji qj − fij qi ) − fi0 qi + ui j6=i
and so, in vector notation, there exists an appropriate C matrix such that q˙ = Cq + u.
For example, let us write down the compartmental matrix C for the water flow model in figure. We let q1 , q2 , q3 denote the water mass in soil, plants and animals, respectively. Moreover, as in figure, we let fe-d-r , ftrnsp , fevap , fdrnk , fuptk , fherb , denote respectively the evaporation-drainage-runoff, transpiration, evaporation, drinking, uptake, and herbivory rate. With these notations, we can write   −fe-d-r − fuptk − fdrnk 0 0 fuptk −ftrnsp − fherb 0 . C= fdrnk fherb −fevap
Questions of interest are:
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 1.4. Appendix: Robotic networks in cyclic pursuit and balancing
7
(i) for constant inflows u, does the total mass in the system remain bounded? (ii) is there an asymptotic equilibrium? do all evolutions converge to it? (iii) which compartments become empty asymptotically?
1.4
Appendix: Robotic networks in cyclic pursuit and balancing
In this section we consider two simple examples of coordination motion in robotic networks. The standing assumption is that n robots, amicably referred to as “bugs,” are placed and restricted to move on a circle of unit radius. Because of this bio-inspiration and because this language is common in the literature (Bruckstein et al. 1991; Klamkin and Newman 1971; Marshall et al. 2004), we refer to the following two problems as n-bugs problems. On this unit circle the bugs’ positions are angles measured counterclockwise from the positive horizontal axis. We let angles take value in [0, 2π), that is, an arbitrary position θ satisfies 0 ≤ θ < 2π. The bugs are numbered counterclockwise with identities i ∈ {1, . . . , n} and are at positions θ1 , . . . , θn . It is convenient to identify n+1 with 1. We assume the bugs move in discrete times k in a counterclockwise direction by a controllable amount ui (i.e., a control signal), that is: θi (k + 1) = mod(θi (k) + ui (k), 2π). where mod(ϑ, 2π) is the remainder of the division of ϑ by 2π and its introduction is requred to ensure that θi (k + 1) remains inside [0, 2π). The n-bugs problem is related to the study of pursuit curves and inquires about what the paths of n bugs are, not aligned initially, when they chase one another. We refer to (Bruckstein et al. 1991; Marshall et al. 2004; Smith et al. 2005; Watton and Kydon 1969) for surveys and recent results.
Objective: optimal patrolling of a perimeter. Approach: Cyclic pursuit We now suppose that each bug feels an attraction and moves towards the closest counterclockwise neighbor, as illustrated in Figure 1.5. Recall that the counterclockwise distance from θi and θi+1 is the length of the counterclockwise arc from θi and θi+1 and satisfies: distcc (θi , θi+1 ) = mod(θi+1 − θi , 2π),
In short, given a control gain κ ∈ [0, 1], we assume that the ith bug sets its control signal to upursuit,i (k) = κ distcc (θi (k), θi+1 (k)).
✓i ✓i+1
✓i ✓i+1
 distcc (✓i , ✓i+1 )
✓i  distcc (✓i , ✓i+1 )
 distc (✓i , ✓i
Figure 1.5: Cyclic pursuit and balancing – prototypical n-bug problems Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
1)
1
 8
Chapter 1. Motivating Problems and Systems Questions of interest are: (i) Does this system have any equilibrium? (ii) Is a rotating equally-spaced configuration a solution? Here an equally-spaced configuration is one for which mod(θi+1 − θi , 2π) = mod(θi − θi−1 , 2π) for all i ∈ {1, . . . , n}.
(iii) For which values of κ do the bugs converge to an equally-spaced configuration and with what pairwise distance?
Objective: optimal sensor placement. Approach: Cyclic balancing Next, we suppose that each bug feels an attraction towards both the closest counterclockwise and the closest clockwise neighbor, as illustrated in Figure 1.5. Given a “control gain” κ ∈ [0, 1/2] and the natural notion of clockwise distance, the ith bug sets its control signal to ubalancing,i (k) = κ distcc (θi (k), θi+1 (k)) − κ distc (θi (k), θi−1 (k)), where distc (θi (k), θi−1 (k)) = distcc (θi−1 (k), θi (k)). Questions of interest are: (i) Is a static equally-spaced configuration a solution? (ii) For which values of κ do the bugs converge to a static equally-spaced configuration? (iii) Is it true that the bugs will approach an equally-spaced configuration and that each of them will converge to a stationary position on the circle?
A preliminary analysis It is unrealistic (among other aspects of this setup) to assume that the bugs know the absolute position of themselves and of their neighbors. Therefore, it is interesting to rewrite the dynamical system in terms of pairwise distances between nearby bugs. For i ∈ {1, . . . , n}, we define the relative angular distances (the lengths of the counterclockwise arcs) di = distcc (θi , θi+1 ) ≥ 0. (We also adopt the usual convention that dn+1 = d1 and that d0 = dn ). The change of coordinates from (θ1 , . . . , θn ) to (d1 , . . . , dn ) leads us to rewrite the cyclic pursuit and the cyclic balancing laws as: upursuit,i (k) = κdi , ubalancing,i (k) = κdi − κdi−1 . In this new set of coordinates, one can show that the cyclic pursuit and cyclic balancing systems are, respectively, di (k + 1) = (1 − κ)di (k) + κdi+1 (k),
di (k + 1) = κdi+1 (k) + (1 − 2κ)di (k) + κdi−1 (k).
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
(1.2) (1.3)
 1.5. Appendix: Design problems in wireless sensor networks
9
These are two linear time-invariant dynamical systems with state d = (d1 , . . . , dn ) and governing equation described by the two n × n matrices:
Apursuit
 1−κ κ   0 1−κ   .. .. = . .   . ..  0 κ 0
 ··· 0 0  .. .. . . 0    .. .. , . . 0    .. . 1−κ κ  ··· 0 1−κ
Abalancing
 1 − 2κ κ   κ 1 − 2κ   .. .. = . .   . ..  0 κ 0
 ··· 0 κ  .. .. . . 0    .. .. . . . 0    .. . 1 − 2κ κ  ··· κ 1 − 2κ
We conclude with the following remarks. (i) Equations (1.2) and (1.3) are correct if the counterclockwise order of the bugs is never violated. One can show that this is true for κ < 1 in the pursuit case and κ < 1/2 in the balancing case; we leave this proof to the reader in Exercise E1.2. (ii) The matrices Apursuit and Abalancing , for varying n and κ, are Toeplitz and circulant. Moreover, they have nonnegative entries for the stated ranges of κ and are row-stochastic. (iii) If one defines the agreement space, i.e., {(α, α, . . . , α) ∈ Rn | α ∈ R}, then each point in this set is an equilibrium for both systems. P (iv) It must be true for all times that (d1 , . . . , dn ) ∈ {x ∈ Rn | xi ≥ 0, ni=1 xi = 2π}. This property is indeed the consequence of the nonnegative matrices Apursuit and Abalancing being doubly-stochastic, i.e., each row-sum and each column-sum is equal to 1. (v) We will later study for which values of κ the system converges to the agreement space.
1.5
Appendix: Design problems in wireless sensor networks
In this appendix we show how averaging algorithms can be used to tackle realistic wireless sensor network problems.
1.5.1
Wireless sensor networks: distributed parameter estimation
The next two examples are also drawn from the field of wireless sensor network, but they feature a more advanced setup and require a basic background in estimation and detection theory, respectively. The key lessons to be learnt from these examples is that it is useful to have algorithms that compute the average of distributed quantities. Following ideas from (Garin and Schenato 2010; Xiao et al. 2005), we aim to estimate an unknown parameter θ ∈ Rm via the measurements taken by a sensor network. Each node i ∈ {1, . . . , n} measures yi = Bi θ + vi , where yi ∈ Rmi , Bi is a known matrix and vi is random measurement noise. We assume that (A1) the noise vectors v1 , . . . , vn are independent jointly-Gaussian variables with zero-mean E[vi ] = 0mi and positive-definite covariance E[vi vi> ] = Σi = Σ> i , for i ∈ {1, . . . , n}; and Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 10
Chapter 1. Motivating Problems and Systems 
 B1 P   (A2) the measurement parameters satisfy the following two properties: i mi ≥ m and  ...  is full Bn
rank.
Given the measurements y1 , . . . , yn , it is of interest to compute a least-square estimate of θ, that is, an estimate of θ that minimizes a least-square error. Specifically, we aim to minimize the following weighted least-square error: n n X X 
2 >  b 
min yi − Bi θ Σ−1 = yi − Bi θb Σ−1 yi − Bi θb . i θb
i
i=1
i=1
In this weighted least-square error, individual errors are weighted by their corresponding inverse covariance matrices so that an accurate (respectively, inaccurate) measurement corresponds to a high (respectively, low) error weight. With this particular choice of weights, the least-square estimate coincides with the so-called maximum-likelihood estimate; see (Poor 1994) for more details. Under assumptions (A1) and (A2), the optimal solution is θb∗ =
n X i=1
Bi> Σ−1 i Bi
n −1 X
Bi> Σ−1 i yi .
i=1
This formula is easy to implement by a single processor with all the information about the problem, i.e., the parameters and the measurements. To compute θb∗ in the sensor (and processor) network, we perform two steps: [Step 1:] we run two distributed algorithms in parallel to compute the average of the quantities Bi> Σ−1 i Bi −1 > and Bi Σi yi . [Step 2:] we compute the optimal estimate via  −1   > −1 > −1 > −1 θb∗ = average B1> Σ−1 B , . . . , B Σ B average B Σ y , . . . , B Σ y . 1 n 1 n n n 1 1 n n 1
Questions of interest are:
(i) How do we design algorithms to compute the average of distributed quantities? (ii) What properties does the graph need to have in order for such an algorithm to exist? (iii) How do we design an algorithm with fastest convergence?
1.5.2
Wireless sensor networks: distributed hypothesis testing
We consider a distributed hypothesis testing problem; these ideas appeared in (Olfati-Saber et al. 2006; Rao and Durrant-Whyte 1993). Let hγ , for γ ∈ Γ in a finite set Γ, be a set of two or more hypotheses about an uncertain event. For example, given a certain area of interest, we could have h0 = “no target is present”, h1 = “one target is present” and h2 = “two or more targets are present”. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 1.5. Appendix: Design problems in wireless sensor networks
11
Suppose that we know the a priori probabilities p(hγ ) of the hypotheses and that n nodes of a sensor network take measurements yi , for i ∈ {1, . . . , n}, related to the event. Independently of the type of measurements, assume you can compute p(yi |hγ ) = probability of measuring yi given that hγ is the true hypothesis. Also, assume that each observation is conditionally independent of all other observations, given any hypothesis. (i) We wish to compute the maximum a posteriori estimate, that is, we want to identify which one is the most likely hypothesis, given the measurements. Note that, under the independence assumption, Bayes’ Theorem implies that the a posteriori probabilities satisfy n
Y p(hγ ) p(hγ |y1 , . . . , yn ) = p(yi |hγ ). p(y1 , . . . , yn ) i=1
(ii) Observe that p(hγ ) is known, and p(y1 , . . . , yn ) is a constant normalization factor scaling all posteriori probabilities equally. Therefore, for each hypothesis γ ∈ Γ, we need to compute n Y i=1
p(yi |hγ ),
or equivalently, we aim to exchange data among the sensors in order to compute: ! n    X log(p(yi |hγ )) = exp n average log p(y1 |hγ ), . . . , log p(yn |hγ ) exp i=1
(iii) In summary, even in this hypothesis testing problem, we need algorithms to compute the average of the n numbers log p(y1 |hγ ), . . . , log p(yn |hγ ), for each hypothesis γ. Questions of interest here are the same as in the previous section.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 12
Chapter 1. Motivating Problems and Systems
1.6
Exercises
E1.1
Simulating the averaging dynamics. Simulate in your favorite programming language and software package the linear averaging algorithm in equation (1.1). Set n = 5, select the initial state equal to (1, −1, 1, −1, 1), and use the following undirected unweighted graphs, depicted in Figure E1.1: (i) the complete graph, (ii) the ring graph, and (iii) the star graph with node 1 as center.
Which value do all nodes converge to? Is it equal to the average of the initial values? Turn in your code, a few printouts (as few as possible), and your written responses.
Figure E1.1: Complete graph, ring graph and star graph with 5 nodes E1.2
Computing the bugs’ dynamics. Consider the cyclic pursuit and balancing dynamics described in Section 1.4. Verify (i) the cyclic pursuit closed-loop equation (1.2), (ii) the cyclic balancing closed-loop equation (1.3), and (iii) the counterclockwise order of the bugs is never violated. Hint: Recall the distributive property of modular addition: mod(a ± b, n) = mod(mod(a, n) ± mod(b, n), n).
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Chapter 2
Elements of Matrix Theory We review here basic concepts from matrix theory. These concepts will be useful when analyzing graphs and averaging algorithms defined over graphs. In particular we are interested in understanding the convergence of the linear dynamical systems discussed in Chapter 1. Some of those systems are described by matrices that have nonnegative entries and have row-sums equal to 1.
Notation It is useful to start with some basic notations from matrix theory and linear algebra. We let f : X → Y denote a function from set X to set Y . We let R, N and Z denote respectively the set of real, natural and integer numbers; also R≥0 and Z≥0 are the set of nonnegative real numbers and nonnegative integer numbers. For real numbers a < b, we let [a, b] = {x ∈ R | a ≤ x ≤ b},
]a, b]= {x ∈ R | a < x ≤ b},
[a, b[ = {x ∈ R | a ≤ x < b},
]a, b[= {x ∈ R | a < x < b}.
Given a complex number z ∈ C, its norm (sometimes referred to as complex modulus) is denoted by |z|, its real part by <(z) and its imaginary part by =(z). We let 1n ∈ Rn (respectively 0n ∈ Rn ) be the column vector with all entries equal to +1 (respectively 0). Let e1 , . . . , en be the standard basis vectors of Rn , that is, ei has all entries equal to zero except for the ith entry equal to 1. We let In denote the n-dimensional identity matrix and A ∈ Rn×n denote a square n × n matrix with real entries {aij }, i, j ∈ {1, . . . , n}. The matrix A is symmetric if A> = A. A symmetric matrix is positive definite (resp. positive semidefinite) if all its eigenvalues are positive (resp. nonnegative). The kernel of A is the subspace kernel(A) = {x ∈ Rn | Ax = 0n }, the image of A is image(A) = {y ∈ Rn | Ax = y, for some x ∈ Rn }, and the rank of A is the dimension of its image. Given vectors v1 , . . . , vj ∈ Rn , their span is span(v1 , . . . , vj ) = {a1 v1 + · · · + aj vj | a1 , . . . , aj ∈ R} ⊂ Rn . 13
 14
Chapter 2. Elements of Matrix Theory
2.1
Linear systems and the Jordan normal form
In this section we introduce a prototypical model for dynamical systems and study its stabilities properties via the so-called Jordan normal form, that is a key tool from matrix theory.
2.1.1
Discrete-time linear systems
We start with a basic definition. Definition 2.1 (Discrete-time linear system). A square matrix A defines a discrete-time linear systems by x(k + 1) = Ax(k), x(0) = x0 , (2.1) or, equivalently by x(k) = Ak x0 , where the sequence {x(k)}k∈Z≥0 is called the solution, trajectory or evolution of the system. We are interested in understanding when a solution from an arbitrary initial condition has an asymptotic limit as time diverges and to what value the solution converges. We formally define this property as follows. Definition 2.2 (Semi-convergent and convergent matrices). A matrix A ∈ Rn×n is (i) semi-convergent if limk→+∞ Ak exists, and
(ii) convergent if it is semi-convergent and limk→+∞ Ak = 0n×n . It is immediate to see that, if A is semi-convergent with limiting matrix A∞ = limk→+∞ Ak , then lim x(k) = A∞ x0 .
k→+∞
In what follows we characterize the sets of semi-convergent and convergent matrices. Remark 2.3 (Modal decomposition for symmetric matrices). Before treating the general analysis method, we present the self-contained and instructive case of symmetric matrices. Recall that a symmetric matrix A has real eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λn and corresponding orthonormal (i.e., orthogonal and unitlength) eigenvectors v1 , . . . , vn . Because the eigenvectors are an orthonormal basis for Rn , we can write the modal decomposition x(k) = y1 (k)v1 + · · · + yn (k)vn , where the ith normal mode is defined by yi (k) = vi> x(k). We then left-multiply the two equalities (2.1) by vi> and exploit Avi = λi vi to obtain yi (k + 1) = λi yi (k),
yi (0) = vi> x0 ,
=⇒
yi (k) = λki (vi> x0 ).
In short, the evolution of the linear system (2.1) is x(k) = λk1 (v1> x0 )v1 + · · · + λkn (vn> x0 )vn .
Therefore, each evolution starting from an arbitrary initial condition satisfies
(i) limk→∞ x(k) = 0n if and only if |λi | < 1 for all i ∈ {1, . . . , n}, and
> x )v if and only if λ = · · · = λ = 1 and |λ | < 1 for all (ii) limk→∞ x(k) = (v1> x0 )v1 + · · · + (vm 0 m 1 m i i ∈ {m + 1, . . . , n}. •
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 2.1. Linear systems and the Jordan normal form
2.1.2
15
The Jordan normal form
In this section we review a very useful canonical decomposition of a square matrix. Recall that two n × n matrices A and B are similar if B = T AT −1 for some invertible matrix T . Here and in what follows, matrices are allowed to have complex entries. Theorem 2.4 (Jordan normal form). Each n × n matrix A is similar to a block diagonal matrix J, called the Jordan normal form of A, given by  J1  0 J = .  .. 0
 ··· 0  . J2 . . 0   ∈ Cn×n ,  .. .. . . 0 · · · 0 Jm 0
where each block Ji , called a Jordan block, is a square matrix of size ji and of the form  λi  0 Ji =  .  .. 0
 ··· 0  .. . 0 λi  ∈ Cji ×ji .  .. .. . . 1 · · · 0 λi 1
(2.2)
Clearly, m ≤ n and j1 + · · · + jm = n. We refer to (Horn and Johnson 1985) for a standard proof of this theorem. In other words, Theorem 2.4 implies there exists an invertible matrix T such that A = T JT −1 ⇐⇒ AT = T J
⇐⇒ T −1 A = JT −1 .
(2.3) (2.4) (2.5)
The matrix J is unique, modulo a re-ordering of the Jordan blocks. The eigenvalues of J are the (not necessarily distinct) numbers λ1 , . . . , λm ; these numers are also the eigenvalues of A (since a similarity transform does not change the eigenvalues of a matrix). Given an eigenvalue λ, (i) the algebraic multiplicity of λ is the sum of the sizes of all Jordan blocks with eigenvalue λ (or, equivalently, the multiplicity of λ as a root of the characteristic polynomial of A), and (ii) the geometric multiplicity of λ is the number of Jordan blocks with eigenvalue λ (or, equivalently, the number of linearly-independent eigenvectors associated to λ). An eigenvalue is simple if it has algebraic and geometric multiplicity equal precisely to 1, that is, a single Jordan block of size 1. An eigenvalue is semisimple if all its Jordan blocks have size 1, so that its algebraic and geometric multiplicity are equal. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 16
Chapter 2. Elements of Matrix Theory
Let t1 , . . . , tn and r1 , . . . , rn denote the columns and rows of T and T −1 respectively. If all eigenvalues of A are semisimple, then the equations (2.4) and (2.5) imply, for all i ∈ {1, . . . , n}, and
Ati = λi ti
ri A = λi ri .
In other words, the ith column of T is the right eigenvector (or simply eigenvector of A corresponding to the eigenvalue λi , and the ith row of T −1 is the corresponding left eigenvector of A. Matrices with only semisimple eigenvalues are called diagonalizable (because J is diagonal). Finally, it is possible to have eigenvalues with larger algebraic than geometric multiplicity; in this case, the columns of the matrix T are the generalized right eigenvectors of A and the rows of T −1 are the generalized left eigenvector of A. For more details we refer to reader to (Horn and Johnson 1985). Example 2.5 (Revisiting the wireless sensor network example). Next, as numerical example, let us reconsider the wireless sensor network discussed in Section 1.2 and the 4-dimensional row-stochastic matrix Awsn , which we report here for convenience:
Awsn
 1/2 1/4 =  0 0
 1/2 0 0 1/4 1/4 1/4 . 1/3 1/3 1/3 1/3 1/3 1/3
With the aid of a symbolic mathematics program, we compute Awsn = T JT −1 where  1 0 J = 0 0
0 0 0 0
0 0√ 1 24 (5 − 73) 0  T
−1
 0  0 , 0√  1 24 (5 + 73) 1 6
 0  = − 1 + 19  96 96√73 1 √ − 96 − 9619 73
1 − 48 1 − 48
1 Therefore, the eigenvalues of A are 1, 0, 24 (5 − and left eigenvector equations are:
    1 1 1 1    Awsn  1 = 1 1 1
√
√  √  1 0 −2 + 2√73 −2 − 2√73 1 0 −11 − 73 −11 + 73 , T =  1 −1 8 8 1 1 8 8  1 1 1 3
0 5 − 48√ 73 5 + 48√ 73
1 73), 24 (5 +
and
4
1 64 1 64
√
− 12 3 − 64√ 73 3 + 64√ 73
1 64 1 64
− +
4 1 2
and
 
3 . √ 64 73  3 √ 64 73
73). Corresponding to the eigenvalue 1, the right
 >  > 1/6 1/6 1/3 1/3     1/4 Awsn = 1/4 . 1/4 1/4
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
•
 2.1. Linear systems and the Jordan normal form
2.1.3
17
Semi-convergence and convergence for discrete-time linear systems
We can now use the Jordan normal form to study the powers of the matrix A:
−1 −1 −1 k −1 Ak = T | JT · T JT {z · · · · · T JT } = T J T k times
 k J1  0 =T  .  .. 0
 ··· 0  . J2k . . 0   T −1 ,  .. .. . . 0 k · · · 0 Jm 0
where the kth power of the generic Jordan block Ji , as a function of block size 1, 2, 3, . . . , ji , are respectively:
Jik
  k  λki = λi , 0
  λk kλk−1 i kλk−1  i i k , 0 k λ i λi 0 0
 λk kλk−1 ··· i  i k!λk−2 i  .. (k−2)!2!   . λki ,..., 0 kλk−1 i . .. ..  .. . . λki 0 ··· 0 
k−j +1
k!λi i (k−ji +1)!(ji −1)! 
.. .
kλk−1 i λki
  .  
We can now derive necessary and sufficient conditions for semi-convergence and convergence of an arbitrary square matrix. The proof of the following result is an immediate consequence of the Jordan normal form and of the following equality   0, j k lim k λ = 1,  k→∞  non-existent or unbounded,
if |λ| < 1, if j = 0 and λ = 1, otherwise.
(2.6)
for any nonnegative integer j; see also Exercise E2.3.
Theorem 2.6 (Semi-convergent and convergent matrices). For a square matrix A with Jordan normal form J and Jordan blocks Ji , i ∈ {1, . . . , m}, the following statements are equivalent: (i) A is semi-convergent (resp. convergent), (ii) J is semi-convergent (resp. convergent), and (iii) each block Ji is semi-convergent (resp. convergent). Moreover, the following statements hold for each block Ji with eigenvalues λi : (i) for Ji of size 1, Ji is convergent if and only if |λi | < 1,
(ii) for Ji of size 1, Ji is semi-convergent and not convergent if and only if λi = 1, and (iii) for Ji of size larger than 1, Ji is semiconvergent and convergent if and only if |λi | < 1. We complete this discussion with two useful definitions and an equivalent reformulation of Theorem 2.6. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 18
Chapter 2. Elements of Matrix Theory
1
(a) The spectrum of a convergent matrix
1
(b) The spectrum of a semiconvergent matrix, provided the eigenvalue 1 is semisimple.
1
(c) The spectrum of a matrix that is not semiconvergent.
Figure 2.1: Eigenvalues and convergence properties of discrete-time linear systems
Definition 2.7 (Spectrum and spectral radius of a matrix). Given a square matrix A, (i) the spectrum of A, denoted spec(A), is the set of eigenvalues of A; and (ii) the spectral radius of A is the maximum norm of the eigenvalues of A, that is, ρ(A) = max{|λ| | λ ∈ spec(A)},
or, equivalently, the radius of the smallest disk in C centered at the origin and containing the spectrum of A. Theorem 2.8 (Convergence and spectral radius). For a square matrix A, the following statements hold: (i) A is convergent if and only if ρ(A) < 1, (ii) A is semi-convergent if and only if ρ(A) ≤ 1, no eigenvalue has unit norm other than possibly the number 1, and if 1 is an eigenvalue, then it is semisimple.
2.2
Row-stochastic matrices and their spectral radius
Motivated by the example systems in Chapter 1, we are now interested in discrete-time linear systems defined by matrices with special properties. Specifically, we are interested in matrices with nonnegative entries and whose row-sums are all equal to 1. The square matrix A ∈ Rn×n is (i) nonnegative (respectively positive) if aij ≥ 0 (respectively aij > 0) for all i and j in {1, . . . , n};
(ii) row-stochastic if nonnegative and A1n = 1n ;
(iii) column-stochastic if nonnegative and A> 1n = 1n ; and (iv) doubly-stochastic if it is row- and column-stochastic. In the following, we write A > 0 and v > 0 (respectively A ≥ 0 and v ≥ 0) for a positive (respectively nonnegative) matrix A and vector v. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 2.2. Row-stochastic matrices and their spectral radius
19
Given a finite number of points p1 , p2 , . . . , pn in Rn , a convex combination of p1 , p2 , . . . , pn is a point of the form
p1
q
η1 p1 + η2 p2 + · · · + ηn pn
p3
p2
where the real numbers η1 , . . . , ηn satisfy η1 +· · ·+ηn = 1 and ηi ≥ 0 for all i ∈ {1, . . . , n}. For example, on the plane R2 , the set of convex Figure 2.2: Convex combination: q combinations of two distinct points is the segment connecting them is inside the triangle if and only if q and the set of convex combinations of three distinct points is the is a convex combination of p1 , p2 , p3 . triangle (including its interior) defined by them; see Figure 2.2. The numbers η1 , . . . , ηn are called convex combination coefficients and each row of a row-stochastic matrix consists of convex combination coefficients.
2.2.1
The spectral radius for row-stochastic matrices
To characterize the spectral radius of a row-stochastic matrix, we introduce a useful general method to localize the spectrum of a matrix. Theorem 2.9 (Geršgorin Disks Theorem). For any square matrix A ∈ Rn×n , spec(A) ⊂
[
i∈{1,...,n}
n o Xn z ∈ C |z − aii | ≤ |aij | j=1,j6=i | {z } P
disk in the complex plane centered at aii with radius
n j=1,j6=i
. |aij |
Proof. Consider the eigenvalue equation Ax = λx for the eigenpair (λ, x), where and x 6= 0n are complex. Choose the index i ∈ {1, . . . , n} so that |xi | = P maxj∈{1,...,n} |xj | > 0. The ith component of the eigenvalue equation can be rewritten as λ − aii = nj=1,j6=i aij xj /xi . Now, take the complex magnitude of this equality and upper-bound its right-hand side: n n n X X X xj |xj | ≤ |aij | . |λ − aii | = aij ≤ |aij | xi |xi | j=1,j6=i
j=1,j6=i
j=1,j6=i
The theorem statement follows by interpreting this inequality as a bound on the possible location of each arbitrary eigenvalue λ of A.  Each disk in the theorem statement is referred to as a Geršgorin disks, or more accurately, as a Geršgorin row disks; an analogous disk theorem can be stated for Geršgorin column disks. Exercise E2.15 showcases an instructive application to distributed computing of numerous topics covered so far, including convergence notions and the Geršgorin Disks Theorem. Lemma 2.10 (Spectral properties of a row-stochastic matrix). For a row-stochastic matrix A, (i) 1 is an eigenvalue, and (ii) spec(A) is a subset of the unit disk and ρ(A) = 1. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 20
Chapter 2. Elements of Matrix Theory
Proof. First, recall that A being row-stochastic is equivalent to two facts: aij ≥ 0, i, j ∈ {1, . . . , n}, and A1n = 1n . The second fact implies that 1n is an eigenvector with eigenvalue 1. Therefore, by definition of spectral radius, ρ(A) ≥ 1. Next, we prove that ρ(A) ≤ 1 by invoking the Geršgorin Disks Theorem 2.9 to show that spec(A) is contained in the unit disk centered at the origin. The Geršgorin disks of a row-stochastic matrix as illustrated in Figure 2.3.
aii
1 X
aij
j6=i
Figure 2.3: All Geršgorin disks of a row-stochastic matrix are contained in the unit disk.
P Note that A being row-stochastic implies aii ∈ [0, 1] and aii + j6=i aij = 1. Hence, the center of the ith Geršgorin disk belongs to the positive real axis between 0 and 1, and the right-most point in the disk is at 1.  Note: because 1 is an eigenvalue of each row-stochastic matrix A, clearly A is not convergent. But it is possible for A to be semi-convergent.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 2.3. Perron–Frobenius theory
2.3
21
Perron–Frobenius theory
We have seen how row-stochastic matrices are not convergent; we now focus on characterizing those that are semi-convergent. To establish whether a row-stochastic matrix is semi-convergent, we introduce the widely-established Perron–Frobenius theory for nonnegative matrices.
2.3.1
Classification of nonnegative matrices
In the previous section we already defined nonnegative and positive matrices. Here we study two sets of nonnegative matrices with certain characteristic properties. Definition 2.11 (Irreducible and primitive matrices). For n ≥ 2, an n × n nonnegative matrix A is
(i) irreducible if, for all partitions {I, J } of the index set {1, . . . , n}, there exists i ∈ I and j ∈ J such that aij 6= 0,
(ii) primitive if there exists k ∈ N such that Ak is a positive matrix.
Here {I, J } is a partition of {1, . . . , n} if I ∪ J = {1, . . . , n} and I ∩ J = ∅. A matrix that is not irreducible is said to be reducible. Note: a positive matrix is clearly primitive. Also note that, if there is k ∈ N such that Ak is positive, then (one can show that) all subsequent powers Ak+1 , Ak+2 , . . . are necessarily positive as well; see Exercise E2.6. We postpone the proof of the following result until Section 2.3.4. Lemma 2.12. If a square nonnegative matrix is primitive, then it is irreducible. As a consequence of this lemma we can draw the set diagram in Figure 2.4 describing the set of nonnegative square matrices and its subsets of irreducible, primitive and positive matrices. Note that the inclusions in the diagram are strict in the sense that:   0 1 (i) the matrix is nonnegative and but not irreducible; 0 0   0 1 (ii) the matrix is irreducible but not primitive; 1 0   1 1 (iii) the matrix is primitive but not positive. 1 0
non-negative (A 0)
irreducible (no permutation brings A into block upper triangular form)
primitive (there exists k such that Ak > 0)
positive (A > 0)
Figure 2.4: The set of nonnegative square matrices and its subsets of irreducible, primitive and positive matrices. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 22
Chapter 2. Elements of Matrix Theory
Permutation characterization of irreducibility It is useful to elaborate on Definition 2.11. First, we note that the notion of irreducibility is applicable to matrices that are not necessarily nonnegative. Second, we provide a formal version of the following intuition: a matrix is irreducible if it is not similar via a permutation to a block upper triangular matrix. We start with a useful definition. A square matrix P is binary if all its entries are equal to 0 or 1; accordingly, we write P ∈ {0, 1}n×n . A permutation matrix is a square binary matrix with precisely one entry equal to 1 in every row and every columns. (In other words, the columns of a permutation matrix are e1 , . . . , en , modulo reordering.) A permutation matrix acts on a vector by permuting its entries. Lemma 2.13. For n ≥ 2, the n × n matrix A ∈ Rn×n is reducible if there exists a permutation matrix P ∈ {0, 1}n×n and a number r ∈ {1, . . . , n − 1} such that   Br×r Cr×(n−r) > P AP = 0(n−r)×r D(n−r)×(n−r) where B, C and D are arbitrary. We leave the proof of this lemma to the reader as Exercise E2.9. Note that P > AP is the similarity transformation of A defined by P because the permutation matrix P satisfies P −1 = P > ; see Exercise E2.13. Moreover, note that P > AP is simply a reordering of rows       0 0 1 1 3 and columns. For example, consider P = 1 0 0, note P 2 = 1 and compute 0 1 0 3 2     a11 a12 a13 a22 a23 a21 A = a21 a22 a23  → P > AP = a32 a33 a31  , a31 a32 a33 a12 a13 a11 so that the entries of the 1st, 2nd and 3rd rows of A are mapped respectively to the 3rd, 1st and 2nd rows of P > AP — and, at the same time, — the entries of the 1st, 2nd and 3rd columns of A are mapped respectively to the 3rd, 1st and 2nd columns of P > AP .
2.3.2
Main results
We are now ready to state the main results in Perron-Frobenius-theory theory and characterize the properties of the spectral radius of a nonnegative matrix as a function of the matrix properties. We state the results in three related theorems. Theorem 2.14 (Perron-Frobenius for nonnegative matrices). If A is a nonnegative matrix, then (i) there exists a real eigenvalue λ ≥ 0 such that λ ≥ |µ| for all other eigenvalues µ,
(ii) the right and left eigenvectors v and w of λ can be selected nonnegative.
Theorem 2.15 (Perron–Frobenius for irreducible matrices). If A is nonnegative and irreducible, then (i) there exists a real simple eigenvalue λ > 0 such that λ ≥ |µ| for all other eigenvalues µ, Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 2.3. Perron–Frobenius theory
23
(ii) the right and left eigenvectors v and w of λ can be selected positive and unique, up to rescaling. Theorem 2.16 (Perron–Frobenius for primitive matrices). If A is nonnegative, irreducible and primitive, then (i) there exists a real simple eigenvalue λ > 0 such that λ > |µ| for all other eigenvalues µ,
(ii) the right and left eigenvectors v and w of λ can be selected positive and unique, up to rescaling. We refer to Theorem 5.2 in Section 5.2 for a version of the Perron–Frobenius Theorem for reducible matrices. Some remarks and some additional statements are in order. Remark 2.17 (Dominant eigenvalue and eigenvectors). In all three cases, the real positive eigenvalue λ is the spectral radius ρ(A) of A. We refer to λ as the dominant eigenvalue; it is sometimes also referred to as the Perron root. The dominant eigenvalue is equivalently defined by ρ(A) = inf{λ ∈ R | Au ≤ λu for all u > 0}, and it satisfies the following bound (see Exercise E2.8): min(A1n ) ≤ ρ(A) ≤ max(A1n ). Associated with the dominant eigenvalue, the right and left eigenvectors v and w (unique up to rescaling) are called the right and left dominant eigenvector. The right dominant eigenvector together with its positive multiples are the only positive right eigenvectors of a primitive matrix A (a similar statement holds for the left dominant eigenvector. Remark 2.18 (Counterexamples). The characterizations in the three theorems are sharp in the following sense:   0 1 (i) the matrix is nonnegative and reducible, and, indeed, its dominant eigenvalue is 0; 0 0   0 1 (ii) the matrix is irreducible but not primitive and, indeed, its dominant eigenvalues +1 is not stricly 1 0 larger, in magnitude than the other eigenvalues −1.
2.3.3
Applications to dynamical systems
The Perron–Frobenius Theorem for a primitive matrix A has immediate consequences for the behavior of Ak as k → ∞ and, therefore, the asymptotic behavior of the dynamical system x(k + 1) = Ax(k). Proposition 2.19 (Powers of primitive matrices). For a primitive matrix A with dominant eigenvalue λ and with dominant right and left eigenvectors v and w normalized so that v > w = 1, we have lim Ak /λk = vw> .
k→∞
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 24
Chapter 2. Elements of Matrix Theory
We now apply this result to row-stochastic matrices. Recall that A ≥ 0 is row-stochastic if A1n = 1n . Therefore, the right eigenvector of the eigenvalue 1 can be selected as 1n . Corollary 2.20 (Consensus for primitive row-stochastic). For a primitive row-stochastic matrix A, (i) the simple eigenvalue ρ(A) = 1 is strictly larger than the magnitude of all other eigenvalues, hence A is semi-convergent; (ii) limk→∞ Ak = 1n w> , where w is the left positive eigenvector of A with eigenvalue 1 satisfying w1 + · · · + wn = 1; (iii) the solution to x(k + 1) = Ax(k) satisfies  lim x(k) = w> x(0) 1n ;
k→∞
(iv) if additionally A is doubly-stochastic, then w = n1 1n (because A> 1n = 1n and n1 1> n 1n = 1) so that lim x(k) =
k→∞
 1> n x(0) 1n = average x(0) 1n . n
In this case we say that the dynamical system achieves average consensus.    >  >  w x(0) w w1 w2 · · · wn    .. , and (1 w> )x(0) = (w> x(0))1 =  .. . .. .. Note: 1n w> =  ...  =  ...  .  n n .  . .
w1 w2 · · · wn w> x(0) w> Note: the limiting vector is therefore a weighted average of the initial conditions. The relative weights of the initial conditions are the convex combination coefficients w1 , . . . , wn . In a social influence network, the coefficient wi is regarded as the “social influence” of agent i. An early reference to average consensus is (Harary 1959). Example 2.21 (Revisiting the wireless sensor network example). Finally, as numerical example, let us reconsider the wireless sensor network discussed in Section 1.2 and the 4-dimensional row-stochastic matrix Awsn . First, note that Awsn is primitives because A2wsn is positive:
Awsn
 1/2 1/4 =  0 0
 1/2 0 0 1/4 1/4 1/4  1/3 1/3 1/3 1/3 1/3 1/3
=⇒
A2wsn
 3/8 3/8 1/8 1/8 3/16 17/48 11/48 11/48  = 1/12 11/36 11/36 11/36 . 1/12 11/36 11/36 11/36
Therefore, the Perron–Frobenius Theorem 2.16 for primitive matrices applies to Awsn . The four pairs of eigenvalues and right eigenvectors of Awsn (as computed in Example 2.5) are:
(1, 14 ),
√   −2 − 2√73 √ 1    (5 + 73), −11 + 73 ,  24   8 8 
√   2(−1 + √73) √ 1    (5 − 73),  −11 − 73  ,  24   8 8 
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 
 0   0  0,   .   1  −1
 2.3. Perron–Frobenius theory
25
Moreover, we know that Awsn is semi-convergent. To apply the convergence results in Corollary 2.20, we numerically compute its left dominant eigenvector, normalized to have unit sum, to be w = [1/6, 1/3, 1/4, 1/4]> so that we have:   1/6 1/3 1/4 1/4 1/6 1/3 1/4 1/4  . lim Akwsn = 1> 4w = 1/6 1/3 1/4 1/4 k→∞ 1/6 1/3 1/4 1/4
Therefore, each solution to the averaging system x(k +1) = Awsn x(k) converges to a consensus vector (w> x(0))14 , that is, the value at each node of the wireless sensor network converges to w> x(0) = (1/6)x1 (0) + (1/3)x2 (0) + (1/4)x3 (0) + (1/4)x4 (0). Note that Awsn is not doubly-stochastic and, therefore, the averaging algorithm does not achieve average consensus and that node 2 has more influence than the other nodes. • Note: If A is reducible, then clearly it is not primitive. Yet, it is possible for an averaging algorithm described by a reducible matrix to converge to consensus. In other words, Corollary 2.20 provides only a sufficient condition for consensus. Here is a simple example of an averaging algorithm described by a reducible matrix that converges to consensus: x1 (k + 1) = x1 (k), x2 (k + 1) = x1 (k). To fully understand what all phenomena are possible and what properties of A are necessary and sufficient for convergence to consensus, we will study graph theory in the next two chapters.
2.3.4
Selected proofs
We conclude this section with the proof of some selected statements. Proof of Lemma 2.12 We aim to show that a primitive matrix A ∈ Rn×n is irreducible. By contradiction, we assume that A is reducible. In other words, we assume that, after appropriately permuting rows and columns according to a permutation P , the matrix A is block upper triangular:   > ? ? A=P P. 0 ? A simple calculation shows that A2 has the same sparsity pattern:       2 > ? ? > ? ? > ? ? A =P P ·P P =P P. 0 ? 0 ? 0 ? Thus, also Ak for any k ∈ {1, 2, . . . } has the sparsity pattern   k > ? ? A =P P, 0 ? that is, Ak is never positive for any k ∈ {1, 2, . . . }. Equivalently, A is not primitive. This contradiction concludes the proof of Lemma 2.12. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 26
Chapter 2. Elements of Matrix Theory
Proof of Theorem 2.16 We start by establishing that a primitive A matrix satisfies ρ(A) > 0. By contradiction, if spec(A) = {0}, then the Jordan normal form J of A is nilpotent, that is, there is a k ∗ ∈ N so that J k = Ak = 0 for all k ≥ k ∗ . But this is a contradiction because A being primitive implies that there is k ∗ ∈ N so that Ak > 0 for all k ≥ k ∗ . Next, we prove that ρ(A) is a real positive eigenvalue with a positive right eigenvector v > 0. We first focus on the case that A is a positive matrix, and later show how to generalize the proof to the case of primitive matrices. Without loss of generality, assume ρ(A) = 1. If (λ, x) is an eigenpair for A such that |λ| = ρ(A) = 1, then |x| = |λ||x| = |λx| = |Ax| ≤ |A||x| = A|x|
=⇒
|x| ≤ A|x|.
(2.7)
Here, we use the notation |x| = (|xi |)i∈{1,...,n} , |A| = {|aij |}i,j∈{1,...,n} , and vector inequalities are understood component-wise. In what follows, we show |x| = A|x|. With the shorthands z = A|x| and y = z − |x|, equation (2.7) reads y ≥ 0 and we aim to show y = 0. By contradiction, assume y has a non-zero component. Therefore, Ay > 0. Independently, we also know z = A|x| > 0. Thus, there must exist ε > 0 such that Ay > εz. Eliminating the variable y in the latter equation, we obtain Aε z > z, where we define Aε = A/(1 + ε). The inequality Aε z > z implies Akε z > z for all k > 0. Now, observe that ρ(Aε ) < 1 so that limk→∞ Akε = 0n×n and therefore 0 > z. Since we also knew z > 0, we now have a contradiction. Therefore, we know y = 0. So far, we have established that |x| = A|x|, so that (1, |x|) is an eigenpair for A. Also note that A > 0 and x 6= 0 together imply A|x| > 0. Therefore we have established that 1 is an eigenvalue of A with eigenvector |x| > 0. Next, observe that the above reasoning is correct also for primitive matrices if one replaces the first equality (2.7) by |x| = |λk ||x| and carries the exponent k throughout the proof. In summary, we have established that there exists a real eigenvalue λ > 0 such that λ ≥ |µ| for all other eigenvalues µ, and that each right (and therefore also left) eigenvector of λ can be selected positive up to rescaling. It remains to prove that λ is simple and is strictly greater than the magnitude of all other eigenvalues. For the proof of se two proofs, we refer to (Meyer 2001, Chapter 8). 
 1 0 Proof of Proposition 2.19 We write the Jordan normal form of A as A = T T −1 with 0 B  > w1 w2>   >     T = v1 v2 v3 . . . vm , and T −1 = w3  ,  .   ..  > wm
where v1 , . . . , vn (respectively, w1 , . . . , wn ) are the columns of T (respectively the rows of T −1 ). Equivalently, we have    1 0    . A v1 v2 v3 . . . vm = v1 v2 v3 . . . vm {z } | {z } 0 B | =T
=T
The first column of the above matrix equation is Av1 = v1 , that is, v1 is the dominant right eigenvector of A. By analogous arguments, we find that w1 is the dominant left eigenvector of A. Next we recall Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 2.3. Perron–Frobenius theory
Ak
1 0 =T 0 B
k
27
T −1 so that k
lim A = T
k→+∞
lim
k→+∞
1k 0 0 Bk
T
−1
 1 0 −1 =T T 0 0
since Theorem 2.16 implies ρ(B) < 1, which in turn implies limk+→∞ B k = 0(n−1)×(n−1) by Theorem 2.8. Moreover,    > w1 1 0 0 ... 0 0 0 0 . . . 0 w2>    >      lim Ak = v1 v2 v3 . . . vm 0 0 0 . . . 0 w3  = v1 w1> . . . . .  .  k→+∞ .  .. .. .. . . ..   ..  0 0 0 ... 0
> wm
Finally, the (1, 1) entry of the matrix equality T T −1 = In gives precisely the normalization v1> w1 = 1. This concludes the proof of Proposition 2.19.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 28
Chapter 2. Elements of Matrix Theory
2.4
Exercises
E2.1
Simple properties of stochastic matrices. Let A1 , A2 , . . . , Ak be n × n matrices, let A1 A2 · · · Ak be their product and let η1 A1 + · · · + ηk Ak be their convex combination with arbitrary convex combination coefficients. Show that (i) if A1 , A2 , . . . , Ak are nonnegative, then their product and all their convex combinations are nonnegative, (ii) if A1 , A2 , . . . , Ak are row-stochastic, then their product and all their convex combinations are row-stochastic, and (iii) if A1 , A2 , . . . , Ak are doubly-stochastic, then their product and all their convex combinations are doubly-stochastic.
E2.2
Semi-convergence and Jordan block decomposition. Consider a square matrix A with ρ(A) = 1. Show that the following statements are equivalent: (i) A is semi-convergent, (ii) there exists a nonsingular matrix T and a number m ∈ {1, . . . , n} such that   Im 0m×(n−m) −1 A=T T , 0(n−m)×m B where B ∈ R(n−m)×(n−m) is convergent, that is, ρ(B) < 1.
E2.3
Semi-convergent and convergent matrices. Prove equation (2.6) and Theorem 2.6.
E2.4
Row-stochastic matrices after pairwise-difference stochastic. Define T ∈ Rn×n by  −1 1  . ..  T =  1/n 1/n
similarity transform. Let A ∈ Rn×n be row ..
. −1 ...
Perform the following tasks:
E2.5
  . 1  1/n
(i) for x = [x1 , . . . , xn ]> , write T x in components and show T is invertible,   Astable 0n−1 −1 (ii) show T AT = for some Astable ∈ R(n−1)×(n−1) and c ∈ R1×(n−1) , c 1 (iii) show that A primitive implies ρ(Astable ) < 1, and   0 1 (iv) compute T AT −1 for A = . 1 0
Substochastic matrices. A (row) substochastic matrix is a nonnegative matrix with all row-sums at most 1 and one row-sum strictly less than 1. Given a substochastic matrix A, show that (i) if the jth row sum of A is strictly less than 1, then the jth row sum of A2 is strictly less than 1; and (ii) if the jth row sum of A is strictly less than 1 and Aij > 0, then the ith row sum of A2 is strictly less than 1.
E2.6 E2.7
Powers of primitive matrices. Let A ∈ Rn×n be nonnegative. Show that Ak > 0, for some k ∈ N, implies Am > 0 for all m ≥ k. Symmetric doubly-stochastic matrix. Let A ∈ Rn×n be doubly-stochastic. Show that: (i) the matrix A> A is doubly-stochastic and symmetric,
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Exercises for Chapter 2
29
(ii) spec(A> A) ⊂ [0, 1], (iii) the eigenvalue 1 of A> A is not necessarily simple even if A is irreducible. E2.8
Bounds on spectral radius of primitive matrices. Consider a primitive matrix A ∈ Rn×n and show the following upper and lower bounds: min(A1n ) ≤ ρ(A) ≤ max(A1n ).
E2.9
Equivalent definitions of irreducibility. Prove Lemma 2.13.
E2.10 Discrete-time affine systems. Given A ∈ Rn×n and b ∈ Rn , consider the discrete-time affine systems x(k + 1) = Ax(k) + b. Assume A is convergent and show that (i) the matrix (In − A) is invertible, (ii) the only equilibrium point of the system is (In − A)−1 b, and (iii) limk→∞ x(k) = (In − A)−1 b for all initial conditions x(0) ∈ Rn .
E2.11 An affine averaging system. Given a primitive doubly-stochastic matrix A and a vector b satisfying 1> n b = 0, consider the dynamical system x(k + 1) = Ax(k) + b. Show that (i) the quantity k 7→ 1> n x(k) is constant, ∗ (ii) for each α ∈ R, there exists a unique equilibrium point x∗α satisfying 1> n xα = α, and ∗ (iii) all solutions with initial condition x(0) satisfying 1> n x(0) = α converge to xα . Hint: Use Exercise E2.2 and E2.10 E2.12 The Neumann series. For A ∈ Cn×n , the following statements are equivalent: (i) ρ(A) < 1, (ii) limk→∞ Ak = 0n×n , and P∞ (iii) the Neumann series k=0 Ak converges.
P∞ If any and hence all of these conditions hold, then the matrix (I −A) is invertible and k=0 Ak = (I −A)−1 . Hint: This statement, written in the style of (Meyer 2001, Section 7.10), is an extension of Theorem 2.8 and a P∞ 1 generalization of the classic geometric series 1−x = k=0 xk , convergent for all |x| < 1. For the proof, the hint is to use the Jordan normal form.
E2.13 Orthogonal and permutation matrices. A set G with a binary operation mapping two elements of G into another element of G, denoted by (a, b) 7→ a ? b, is a group if: • a ? (b ? c) = (a ? b) ? c for all a, b, c ∈ G (associativity property); • there exists e ∈ G such that a ? e = e ? a = a for all a ∈ G (existence of an identity element); and • there exists a−1 ∈ G such that a ? a−1 = a−1 ? a = e for all a ∈ G (existence of inverse elements).
Recall that: an orthogonal matrix R is a square matrix whose columns and rows are orthonormal vectors, i.e., RR> = In ; an orthogonal matrix acts on a vector like a rotation and/or reflection; let O(n) denote the set of orthogonal matrices. Similarly, recall that: a permutation matrix is a square binary (i.e., entries equal to 0 and 1) matrix with precisely one entry equal to 1 in every row and every columns; a permutation matrix acts on a vector by permuting its entries; let Pn denote the set of permutation matrices. Prove that Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 30
Chapter 2. Elements of Matrix Theory (i) the set of orthogonal matrices O(n) with the operation of matrix multiplication is a group; (ii) the set of permutation matrices Pn with the operation of matrix multiplication is a group; and (iii) each permutation matrix is orthogonal.
E2.14 On doubly-stochastic and permutation matrices. The following result is known as the Birkhoff – Von Neumann Theorem. For a matrix A ∈ Rn×n , the following statements are equivalent: (i) A is doubly-stochastic; and (ii) A is a convex combination of permutation matrices.
Do the following: • show that the set of doubly-stochastic matrices is convex (i.e., given any two doubly-stochastic matrices A1 and A2 , any matrix of the form λA1 + (1 − λ)A2 , for λ ∈ [0, 1], is again doublystochastic); • show that (ii) =⇒ (i); • find in the literature a proof of (i) =⇒ (ii) and sketch it in one or two paragraphs. E2.15 The Jacobi relaxation in parallel computation. Consider n distributed processors that aim to collectively solve the linear equation Ax = b, where b ∈ Rn and A ∈ Rn×n is invertible and its diagonal elements aii are nonzero. Each processor stores a variable xi (k) as the discrete-time variable k evolves and applies the following iterative strategy termed Jacobi relaxation. At time step k ∈ N each processor performs the local computation n  X 1  xi (k + 1) = bi − aij xj (k) , i ∈ {1, . . . , n}. aii j=1,j6=i
Next, each processor i ∈ {1, . . . , n} sends its value xi (k + 1) to all other processors j ∈ {1, . . . , n} with aji 6= 0, and they iteratively repeat the previous computation. The initial values of the processors are arbitrary. (i) Assume the Jacobi relaxation converges, i.e., assume limk→∞ x(k) = x∗ . Show that Ax∗ = b. (ii) Give a necessary and sufficient condition for the Jacobi relaxation to converge. (iii) Use Geršgorin Disks Theorem 2.9 to show Pn that the Jacobi relaxation converges if A is strictly row diagonally dominant, that is, if |aii | > j=1,j6=i |aij | for all i ∈ {1, . . . , n}.
E2.16 The Jacobi over-relaxation in parallel computation. We now consider a more sophisticated version of the Jacobi relaxation presented in Exercise E2.15. Consider again n distributed processors that aim to collectively solve the linear equation Ax = b, where b ∈ Rn and A ∈ Rn×n is invertible and its diagonal elements aii are nonzero. Each processor stores a variable xi (k) as the discrete-time variable k evolves and applies the following iterative strategy termed Jacobi over-relaxation. At time step k ∈ N each processor performs the local computation xi (k + 1) = (1 − ω)xi (k) +
n  X ω bi − aij xj (k) , aii j=1,j6=i
i ∈ {1, . . . , n},
where ω ∈ R is an adjustable parameter. Next, each processor i ∈ {1, . . . , n} sends its value xi (k + 1) to all other processors j 6= i with aji 6= 0, and they iteratively repeat the previous computation. The initial values of the processors are arbitrary. (i) Assume the Jacobi over-relaxation converges to x? and show that Ax? = b if ω 6= 0. (ii) Find the expression governing the dynamics of the error variable e(k) := x(k) − x? . Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Exercises for Chapter 2
31
P (iii) Suppose that A is strictly row diagonally dominant, that is |aii | > j6=i |aij |. Use the Geršgorin Disks Theorem 2.9 to discuss the convergence properties of the algorithm for all possible values of ω ∈ R. Hint: Consider different thresholds for ω. E2.17 Solutions of partial differential equations. This exercise is taken from (Luenberger 1979, Chapter 6). A partial differential equation (PDE) is a differential equation that contains unknown functions and their partial derivatives; PDEs are very common models for physical phenomena in fluids, electromagnetic fields, temperature distributions and other spatially-distributed quantities. For example, the electric potential V within a two-dimensional rectangular enclosure is governed by Laplace’s equation: ∂2V ∂2V + = 0, ∂x2 ∂y 2
(E2.1)
combined with the value of V along the boundary of the enclosure; see the left image in Figure E2.1. b1
@2V @2V + =0 @x2 @y 2
y x
b2
b3
b4
b5
V1
V2
V3
V4
b7
b6
V5
V6
V7
V8
b8
b10
b11
b12
b9
Figure E2.1: Laplace’s equation over a continuous and discrete grid. For illustration’s sake, the grid is lowdimensional. For arbitrary enclosures and boundary conditions, it is not possible to solve the Laplace’s equation in closed form. An approximate solution is computed as follows. A finite regular Cartesian grid of points is placed inside the enclosure, see the right image in Figure E2.1, and the second-order derivatives are approximated by second-order finite differences. Specifically, at node 2 of the grid, we have along the x direction ∂2V (V2 ) ≈ (V3 − V2 ) − (V2 − V1 ) = V3 + V2 − 2V2 , ∂x2 so that 0=
∂2V ∂2V (V ) + (V2 ) ≈ V1 + V3 + V6 + b2 − 4V2 2 ∂x2 ∂y 2
=⇒
4V2 = V1 + V3 + V6 + b2 .
Thus, Laplace’s equation is equivalent to requiring that the electric potential at each grid node be equal to the average of its neighboring nodes. In summary, this specification translate into the matrix equation: 4V = Agrid V + Cgrid-boundary b,
(E2.2)
where V ∈ Rn is the vector of unknown potentials, b ∈ Rm is the vector of boundary conditions, Agrid ∈ {0, 1}n×n is the adjacency matrix of the interior grid (that is, (Agrid )ij = 1 if and only if the interior nodes i and j are connected), and Cgrid-boundary ∈ {0, 1}n×m is the connection matrix between interior and boundary nodes (that is, (Cgrid-boundary )iα = 1 if and only if grid interior node i is connected with boundary node α). Show that (i) ρ(Agrid ) < 4 and Agrid is primitive, Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 32
Chapter 2. Elements of Matrix Theory (ii) there exists a unique solution V ∗ to equation (E2.2) and, if b ≥ 0m , then V ∗ ≥ 0n , and (iii) each solution to the following iteration converges to V ∗ : 4V (k + 1) = Agrid V (k) + Cgrid-boundary b, whereby, at each step, the value of V at each node is updated to be equal to the average of its neighboring nodes.
E2.18 Robotic coordination and geometric optimization on the real line. Consider n ≥ 3 robots with dynamics p˙i = ui , where i ∈ {1, . . . , n} is an index labeling each robot, pi ∈ R is the position of robot i, and ui ∈ R is a steering control input. For simplicity, assume that the robots are indexed according to their initial position: p1 (0) ≤ p2 (0) ≤ p3 (0) ≤ · · · ≤ pn (0). We consider the following distributed control laws to achieve some geometric configuration: (i) Move towards the centroid of your neighbors: The robots i ∈ {2, . . . , n − 1} (each having two neighbors) move to the centroid of the local subset {pi−1 , pi , pi+1 }: p˙i =
1 (pi−1 + pi + pi+1 ) − pi , 3
i ∈ {2, . . . , n − 1} .
The robots {1, n} (each having one neighbor) move to the centroid of the local subsets {p1 , p2 } and {pn−1 , pn }, respectively: p˙1 =
1 (p1 + p2 ) − p1 2
and
p˙n =
1 (pn−1 + pn ) − pn . 2
By using these coordination laws, the robots asymptotically rendezvous. (ii) Move towards the centroid of your neighbors or walls: Consider two walls at the positions p0 ≤ p1 and pn+1 ≥ pn so that all robots are contained between the walls. The walls are stationary, that is, p˙0 = 0 and p˙n+1 = 0. Again, the robots i ∈ {2, . . . , n − 1} (each having two neighbors) move to the centroid of the local subset {pi−1 , pi , pi+1 }. The robots {1, n} (each having one robotic neighbor and one neighboring wall) move to the centroid of the local subsets {p0 , p1 , p2 } and {pn−1 , pn , pn+1 }, respectively. Hence, the closed-loop robot dynamics are p˙i =
1 (pi−1 + pi + pi+1 ) − pi , 3
i ∈ {1, . . . , n} .
By using these coordination laws, the robots become uniformly spaced on the interval [p0 , pn+1 ]. (iii) Move away from the centroid of your neighbors or walls: Again consider two stationary walls at p0 ≤ p1 and pn+1 ≥ pn containing the positions of all robots. We partition the interval [p0 , pn+1 ] into areas of interest, where each robot gets a territory assigned that is closer to itself than to other robots. Hence, robot i ∈ {2, . . . , n − 1} (having two neighbors) obtains the partition Vi = [(pi + pi−1 )/2, (pi+1 + pi )/2], robot 1 obtains the partition V1 = [p0 , (p1 + p2 )/2], and robot n obtains the partition Vn = [(pn−1 + pn )/2, pn+1 ]. We want to design a distributed algorithm such that the robots have equally sized partitions. We consider a simple coordination law, where each robot i heads for the midpoint ci (Vi (p)) of its partition Vi : p˙i = ci (Vi (p)) − pi . By using these coordination laws, the robots’ partitions asymptotically become equally large. (iv) Discrete-time update rules: If the robots move in discrete-time according to p+ i = ui , then the above coordination laws are easily modified via an Euler discretization as follows: replace p˙i = f (p) by p+ i − pi = ε · f (p) in each coordination law, where ε > 0 is sufficiently small so that the matrices involved in the discrete iterations are nonnegative. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Exercises for Chapter 2
33
Consider n = 3 robots, take your favorite problem from above, and show that both the continuous-time and discrete-time dynamics asymptotically lead to the desired geometric configurations. E2.19 Continuous-time cyclic pursuit. Consider four mobile robotic vehicles, indexed by i ∈ {1, 2, 3, 4}. We model each robot as fully-actuated kinematic point mass, that is, we write p˙i = ui , where pi ∈ C is the position of robot i in the plane and ui ∈ C is its velocity command. The robots are equipped with onboard cameras as sensors. The task of the robots is rendezvous at a common point (while using only onboard sensors). A simple strategy to achieve rendezvous is cyclic pursuit: each robot i picks another robot, say i + 1, and pursues it. This gives rise to the control ui = pi+1 − pi and the closed-loop system      p˙1 −1 1 0 0 p1 p˙2   0 −1 1   0  =  p2  . p˙3   0 0 −1 1  p3  p˙4 1 0 0 −1 p4 A simulation of the cyclic-pursuit dynamics is shown in Figure E2.2. 0.3
0.2
0.1
y
0
-0.1
-0.2
-0.3
-0.4
-0.3
-0.2
-0.1
x
0
0.1
0.2
0.3
Figure E2.2: Four robots with initial positions  that perform a cyclic pursuit to rendezvous at Your tasks are as follows.
•.
(i) Prove that the center of mass average(p(t)) =
4 X pi (t) i=1
4
is constant for all t ≥ 0. Notice that this is equivalent to saying d/dt average(p(t)) = 0. (ii) Prove that the robots asymptotically rendezvous at the initial center of mass, that is, lim pi (t) = average(p(0)) for i ∈ {1, . . . , 4} .
t→∞
(iii) Prove that if the robots are initially arranged in a square formation, they remain in a square formation under cyclic pursuit. Hint: Recall that for a matrix P A with semisimple eigenvalues, the solution to the equation x˙ = Ax is given by the modal expansion x(t) = i eλi t vi wi> x(0), where λi is an eigenvalue, and vi and wi are the associated right and left eigenvectors pairwise normalized to wi> vi = 1. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 34
Chapter 2. Elements of Matrix Theory
E2.20 Simulation (cont’d). This is a followup to Exercise E1.1. Consider the linear averaging algorithm in equation (1.1): set n = 5, select the initial state equal to (1, −1, 1, −1, 1), and use (a) the complete graph (b) a ring graph, and (c) a star graph with node 1 as center. (i) To which value do all nodes converge to? (ii) Compute the dominant left eigenvector of the averaging matrix associated to each of the three graphs and verify that the result in Corollary 2.20(iii) is correct. E2.21 Continuous- and discrete-time control control of mobile robots. Consider n robots moving on the line with positions z1 , z2 , . . . zn ∈ R. In order to gather at a common location (i.e., reach rendezvous), each robot heads for the centroid of its neighbors, that is, ! n X 1 z˙i = z j − zi . n−1 j=1,j6=i
(i) Will the robots asymptotically rendezvous at a common location? (ii) Consider the Euler discretization of the above closed-loop dynamics with sampling rate T > 0: ! ! n X 1 zj (k) − zi (k) . zi (k + 1) = zi (k) + T n−1 j=1,j6=i
For which values of the sampling period T will the robots rendezvous? Hint: Use the modal decomposition in Remark 2.3.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Chapter 3
Elements of Graph Theory In this chapter we review some basic concepts from graph theory as exposed in standard books, e.g., see (Bollobás 1998; Diestel 2000). Graph theory provides key concepts to model, analyze and design network systems and distributed algorithms; the language of graphs pervades modern science and technology and is therefore essential
3.1
Graphs and digraphs
[Graphs] An undirected graph—in short, graph—consists of a vertex set V and of a set E of unordered pairs of vertices. For u, v ∈ V and u 6= v, the set {u, v} denotes an unordered edge. We define and visualize some basic examples graphs in Figure 3.1.
Figure 3.1: Example graphs. First row: the ring graph with 6 nodes, a star graph with 7 nodes, a tree (see definition below), the complete graph with 6 nodes (usually denoted by K(6)). Second row: the complete bipartite graph with 3 + 3 nodes (usually denoted by K(3, 3)), a grid graph, and the Petersen graph. The ring, the complete bipartite K(3, 3) and the Petersen graph are 3-regular graphs.
35
 36
Chapter 3. Elements of Graph Theory
[Neighbors and degrees in graphs] In a graph G, the vertices u and v are neighbors if {u, v} is an undirected edge. Given a graph G, we let NG (v) denote the set of neighbors of v. The degree of v is the cardinality of N (v). A graph is regular if all the nodes have the same degree. [Digraphs and self-loops] A directed graph—in short, digraph—of order n is a pair G = (V, E), where V is a set with n elements called vertices (or nodes) and E is a set of ordered pairs of vertices called edges. In other words, E ⊆ V × V . We call V and E the vertex set and edge set, respectively. For u, v ∈ V , the ordered pair (u, v) denotes an edge from u to v. A digraph is undirected if (v, u) ∈ E anytime (u, v) ∈ E. In a digraph, a self-loop is an edge from a node to itself; as customary, self-loops are not allowed in graphs. We define and visualize some basic examples digraphs in Figure 3.2.
Figure 3.2: Example digraphs: the ring digraph with 6 nodes, the complete graph with 6 nodes, and a directed acyclic graph, i.e., a digraph with no directed cycles.
[Subgraphs] A digraph (V 0 , E 0 ) is a subgraph of a digraph (V, E) if V 0 ⊆ V and E 0 ⊆ E. A digraph (V 0 , E 0 ) is a spanning subgraph if it is a subgraph and V 0 = V . The subgraph of (V, E) induced by V 0 ⊆ V is the digraph (V 0 , E 0 ), where E 0 contains all edges in E between two vertices in V 0 . [In- and out-neighbors] In a digraph G with an edge (u, v) ∈ E, u is called an in-neighbor of v, and v is called an out-neighbor of u. We let N in (v) (resp., N out (v)) denote the set of in-neighbors, (resp. the set of out-neighbors) of v. Given a digraph G = (V, E), an in-neighbor of a nonempty set of nodes U is a node v ∈ V \ U for which there exists an edge (v, u) ∈ E for some u ∈ U . [In- and out-degree] The in-degree din (v) and out-degree dout (v) of v are the number of in-neighbors and out-neighbors of v, respectively. Note that a self-loop at a node v makes v both an in-neighbor as well as an out-neighbor of itself. A digraph is topologically balanced if each vertex has the same in- and out-degrees (even if distinct vertices have distinct degrees).
3.2
Paths and connectivity in undirected graphs
[Paths] A path in a graph is an ordered sequence of vertices such that any pair of consecutive vertices in the sequence is an edge of the graph. A path is simple if no vertex appears more than once in it, except possibly for the initial and final vertex. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 3.3. Paths and connectivity in digraphs
37
[Connectivity and connected components] A graph is connected if there exists a path between any two vertices. If a graph is not connected, then it is composed of multiple connected components, that is, multiple connected subgraphs. [Cycles] A cycle is a simple path that starts and ends at the same vertex and has at least three distinct vertices. A graph is acyclic if it contains no cycles. A connected acyclic graph is a tree. Lemma 3.1 (Tree properties). For a graph G = (V, E) without self-loops, the following statements are equivalent (i) G = (V, E) is a tree; (ii) G is connected and |E| = |V | − 1; and
(iii) G is acyclic and |E| = |V | − 1.
Figure 3.3: This graph has two connected components. The leftmost connected component is a tree, while the rightmost connected component is a cycle.
3.3
Paths and connectivity in digraphs
[Directed paths] A directed path in a digraph is an ordered sequence of vertices such that any pair of consecutive vertices in the sequence is a directed edge of the digraph. A directed path is simple if no vertex appears more than once in it, except possibly for the initial and final vertex. [Cycles in digraphs] A cycle in a digraph is a simple directed path that starts and ends at the same vertex. It is customary to accept as feasible cycles in digraphs also cycles of length 1 (i.e., a self-loop) and cycles of length 2 (i.e., composed of just 2 nodes). The set of cycles of a directed graph is finite. A digraph is acyclic if it contains no cycles. In a digraph, every vertex of in-degree 0 is named a source, and every vertex of out-degree 0 is named a sink. Every acyclic digraph has at least one source and at least one sink; see Exercise E3.1.
Figure 3.4: Acyclic digraph with one sink and two sources.
Figure 3.5: Directed cycle.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 38
Chapter 3. Elements of Graph Theory
[Directed trees] A directed tree (sometimes called a rooted tree) is an acyclic digraph with the following property: there exists a vertex, called the root, such that any other vertex of the digraph can be reached by one and only one directed path starting at the root. A directed spanning tree of a digraph is a spanning subgraph that is a directed tree.
3.3.1
Connectivity properties of digraphs
Next, we present four useful connectivity notions for a digraph G: (i) G is strongly connected if there exists a directed path from any node to any other node; (ii) G is weakly connected if the undirected version of the digraph is connected; (iii) G possesses a globally reachable node if one of its nodes can be reached from any other node by traversing a directed path; and (iv) G possesses a directed spanning tree if one of its nodes is the root of directed paths to every other node. An example of a strongly connected graph is shown in Figure 3.6, and a weakly connected graph with a globally reachable node is illustrated in Figure 3.7. 3
3
4
2
2
5 1
5 1
6
Figure 3.6: A strongly connected digraph
4
6
Figure 3.7: A weakly connected digraph with a globally reachable node, node #2.
For a digraph G = (V, E), the reverse digraph G(rev) has vertex set V and edge set E(rev) composed of all edges in E with reversed direction. Clearly, a digraph contains a directed spanning tree if and only if the reverse digraph contains a globally reachable node.
3.3.2
Periodicity of strongly-connected digraphs
[Periodic and aperiodic digraphs] A strongly-connected directed graph is periodic if there exists a k > 1, called the period, that divides the length of every cycle of the graph. In other words, a digraph is periodic if the greatest common divisor of the lengths of all its cycles is larger than one. A digraph is aperiodic if it is not periodic. Note: the definition of periodic digraph is well-posed because a digraph has only a finite number of cycles (because of the assumptions that nodes are not repeated in simple paths). The notions of periodicity Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 3.3. Paths and connectivity in digraphs
(a)
39
(b)
(c)
Figure 3.8: (a) A periodic digraph with period 2. (b) An aperiodic digraph with cycles of length 1 and 2. (c) An aperiodic digraph with cycles of length 2 and 3.
and aperiodicity only apply to digraphs and not to undirected graphs (where the notion of a cycle is defined differently). Any strongly-connected digraph with a self-loop is aperiodic.
3.3.3
Condensation digraphs
[Strongly connected components] A subgraph H is a strongly connected component of G if H is strongly connected and any other subgraph of G strictly containing H is not strongly connected. [Condensation digraph] The condensation digraph of a digraph G, denoted by C(G), is defined as follows: the nodes of C(G) are the strongly connected components of G, and there exists a directed edge in C(G) from node H1 to node H2 if and only if there exists a directed edge in G from a node of H1 to a node of H2 .
Figure 3.9: An example digraph, its strongly connected components and its condensation.
Lemma 3.2 (Properties of the condensation digraph). For a digraph G and its condensation C(G), (i) C(G) is acyclic; (ii) G contains a globally reachable node if and only if C(G) contains a globally reachable node; (iii) G contains a directed spanning tree if and only if C(G) contains a directed spanning tree; and (iv) G is weakly connected if and only if C(G) is weakly connected. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 40
Chapter 3. Elements of Graph Theory
Proof. Regarding statement (i): by contradiction, if there exists a cycle (H1 , H2 , . . . , Hm , H1 ) in C(G), then the set of vertices H1 , . . . , Hm will be strongly connected. However, Hi s are strongly connected components of G, and by the definition of condensation digraph, every subgraph of G containing them should not be strongly connected; this is a contradiction. Regarding the implication ⇒: in statement (ii): suppose G contains a globally reachable node v. Let Hv denote the node in C(G) containing v. Since v is globally reachable, for all u ∈ V (G) there exists a path from u to v, and due to the strong connectivity of Hv and Hu (the node in C(G) containing u), there will be a path from all nodes of Hu to all nodes of Hv , which shows that Hv is a globally reachable node in C(G). Regarding the implication ⇐: in statement (ii): suppose C(G) contains a globally reachable node Hv that can be reached from every node Hu ∈ V (C(G)). Again according to strong connectivity of Hv and Hu , for all v ∈ Hv and u ∈ Hu , there exists a path from u to v. In other words, every node of Hv is a globally reachable node in digraph G. Regarding statement (iii), a digraph contains a directed spanning tree if and only if the reverse digraph contains a globally reachable node. Thus, the proof statement (iii) is analogous to that for statement (ii). Both implications in statement (iv) are simple to prove via induction; we leave this task to the reader. 
3.4
Weighted digraphs
A weighted digraph is a triplet G = (V, E, {ae }e∈E ), where the pair (V, E) is a digraph with nodes V = {v1 , . . . , vn }, and where {ae }e∈E is a collection of strictly positive weights for the edges E. Note: for simplicity we let V = {1, . . . , n}. It is therefore equivalent to write {ae }e∈E or {aij }(i,j)∈E .
2
The collection of weights for this weighted digraph is
4 1.2
3.7
1
8.9
2.3
3.7
4.4
3 2.3
3.7 4.4
5
a12 = 3.7, a13 = 3.7, a21 = 8.9, a24 = 1.2, a34 = 3.7, a35 = 2.3, a51 = 4.4, a54 = 2.3, a55 = 4.4.
A digraph G = (V = {v1 , . . . , vn }, E) can be regarded as a weighted digraph by defining its set of weights to be all equal to 1, that is, setting ae = 1 for all e ∈ E. A weighted digraph is undirected if aij = aji for all i, j ∈ {1, . . . , n}. The notions of connectivity and definitions of in- and out-neighbors, introduced for digraphs, remain equally valid for weighted digraphs. The notions of in- and out-degree are generalized to weighted digraphs as follows. In a weighted digraph with V = {v1 , . . . , vn }, the weighted out-degree and Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 3.5. Database collections and software libraries
41
the weighted in-degree of vertex vi are defined by, respectively, dout (vi ) = din (vi ) =
n X
j=1 n X
aij ,
(i.e., dout (vi ) is the sum of the weights of all the out-edges of vi ) ,
aji ,
(i.e., din (vi ) is the sum of the weights of all the in-edges of vi ) .
j=1
The weighted digraph G is weight-balanced if dout (vi ) = din (vi ) for all vi ∈ V .
3.5
Database collections and software libraries
Useful collections of example networks are freely available online; here are some examples: (i) The Koblenz Network Collection, available at http://konect.uni-koblenz.de and described in (Kunegis 2013), contains model graphs in easily accessible MATLAB format (as well as a MATLAB toolbox for network analysis and a compact overview the various computed statistics and plots for the networks in the collection). (ii) A broad range of example networks is available online at the Stanford Large Network Dataset Collection, see http://snap.stanford.edu/data. (iii) The University of Florida Sparse Matrix Collection, available at http://www.cise.ufl.edu/ research/sparse/matrices and described in (Davis and Hu 2011), contains a large and growing set of sparse matrices and complex graphs arising in a broad range of applications; e.g., see Figure 3.10. (iv) The UCI Network Data Repository, available at http://networkdata.ics.uci.edu, is an effort to facilitate the scientific study of networks; see also (DuBois 2008). Useful software libraries for network analysis and visualization are freely available online; here are some examples: (i) Gephi, available at https://gephi.org, is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs. Datasets are available at https://wiki.gephi.org/index.php?title=Datasets. (ii) NetworkX, available at http://networkx.github.io, is a Python library for network analysis. For example, one feature is the ability to compute condensation digraphs. A second interesting feature is the ability to generate numerous well-known model graphs, see http://networkx. lanl.gov/reference/generators.html (iii) Cytoscape, available at http://www.cytoscape.org, is an open-source software platform for visualizing complex networks and integrating them with attribute data. (iv) Mathematica provides functionality for modeling, analyzing, synthesizing, and visualizing graphs and networks – beside the ability to simulate dynamical systems; see description at http:// reference.wolfram.com/mathematica/guide/GraphsAndNetworks.html. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 42
Chapter 3. Elements of Graph Theory
(a) IEEE 118 bus system
(b) Klavzar bibliography
(c) Pajek network GD99c
Figure 3.10: Example networks from distinct domains: Figure 3.10(a) shows the standard IEEE 118 power grid testbed (118 nodes); Figure 3.10(b) shows the Klavzar bibliography network (86 nodes); Figure 3.10(c) shows the GD99c Pajek network (105 nodes). Networks parameters are available at http://www.cise.ufl.edu/research/ sparse/matrices, and their layout is obtained via the graph drawing algorithm proposed by Hu (2005).
(v) Graphviz, available at http://www.graphviz.org/, is an open source graph visualization software which is also compatible with MATLAB: http://www.mathworks.com/matlabcentral/ fileexchange/4518-matlab-graphviz-interface.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 3.6. Exercises
43
3.6
Exercises
E3.1
Acyclic digraphs. Let G be an acyclic digraph with n nodes. Show that: (i) G contains at least one sink, i.e., a vertex without out-neighbors and at least one source, i.e., a vertex without in-neighbors; (ii) the vertices of G can be given labels in the set {1, . . . , n} in such a way that if (u, v) is an edge, then label(u) > label(v). This labeling is called a topological sort of G. Provide an algorithm to define this labelling; and (iii) after topologically sorting its vertices, the adjacency matrix of the digraph is lower-triangular, i.e., all its entries above the main diagonal are equal to zero.
E3.2
Condensation digraphs. Draw the condensation for each of the following digraphs.
E3.3
A simple proof. Prove Lemma 3.1 on the properties of a tree in a graph.
E3.4
Connectivity in topologically balanced digraphs. Prove the following statement: If a digraph G is topologically balanced and contains either a globally reachable vertex or a directed spanning tree, then G is strongly connected.
E3.5
Globally reachable nodes and disjoint closed subsets (Lin et al. 2005; Moreau 2005). Consider a digraph G = (V, E) with at least two nodes. Prove that the following statements are equivalent: (i) G has a globally reachable node, and (ii) for every pair S1 , S2 of non-empty disjoint subsets of V , there exists a node that is an out-neighbor of S1 or S2 .
E3.6
Swiss railroads. Consider the fictitious railroad map of Switzerland given in Figure E3.1. (i) Can a passenger go from any station to any other? (ii) Is the graph acyclic? Is it aperiodic? If not, what is its period?
BASEL
5
ST. GALLEN
2
ZURICH
1
BERN
6 LAUSANNE
4
7
ZERMATT 8
3
INTERLAKEN
CHUR
9
LUGANO
Figure E3.1: Fictitious railroad map connections in Switzerland Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 44
Chapter 3. Elements of Graph Theory
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Chapter 4
The Adjacency Matrix We review here basic concepts from algebraic graph theory. Standard books on algebraic graph theory are (Biggs 1994; Godsil and Royle 2001). One objective is to relate matrix properties with graph theoretical properties. A second objective is to understand when is a row-stochastic matrix primitive.
4.1
The adjacency matrix
Given a weighted digraph G = (V, E, {ae }e∈E ), with V = {1, . . . , n}, the weighted adjacency matrix of G is the n × n nonnegative matrix A defined as follows: for each edge (i, j) ∈ E, the entry (i, j) of A is equal to the weight a(i,j) of the edge (i, j), and all other entries of A are equal to zero. In other words, aij > 0 if and only if (i, j) is an edge of G, and aij = 0 otherwise.
2
The adjacency matrix of this weighted directed graph is
4 1.2
3.7
1
8.9
2.3
3.7
4.4
3 2.3
3.7
5
4.4
 0 3.7 3.7 0 0 8.9 0 0 1.2 0    0 0 0 3.7 2.3  . 0 0 0 0 0 4.4 0 0 2.3 4.4
The binary adjacency matrix A ∈ {0, 1}n×n of a digraph G = (V = {1, . . . , n}, E) or of a weighted digraph is defined by ( 1, if (i, j) ∈ E, aij = (4.1) 0, otherwise. Finally, in a weighted digraph, the weighted out-degree matrix Dout and the weighted in-degree matrix Din are the diagonal matrices defined by   dout (1) 0 0   > .. Dout = diag(A1n ) =  0 . 0  , and Din = diag(A 1n ). 0
0
dout (n)
45
 46
Chapter 4. The Adjacency Matrix
where diag(z1 , . . . , zn ) is the diagonal matrix with diagonal entries equal to z1 , . . . , zn .
4.2
Algebraic graph theory: basic and prototypical results
In this section we review some basic and prototypical results that involve correspondences between graphs and adjacency matrices. We start with some straightforward statements. For a weighted digraph G with adjacency matrix A, the following statements hold: (i) the graph G is undirected if and only if A is symmetric and its diagonal entries are equal to 0; (ii) the digraph G is weight-balanced if and only if A1n = A> 1n ; (iii) the node i is a sink if and only if ith row-sum of A is zero; and (iv) the node i is a source if and only if ith column-sum of A is zero. Lemma 4.1 (Digraph associated to a nonnegative matrix). Given a nonnegative n × n matrix A, its associated weighted digraph is the weighted digraph with nodes {1, . . . , n}, and weighted adjacency matrix A. The weighted adjacency matrix A is (i) row-stochastic if and only if each node of its associated digraph has weighted out-degree equal to 1 (so that In is the weighted out-degree matrix); and (ii) doubly-stochastic if and only if each node of its associated weighted digraph has weighted out-degree and weighted in-degree equal to 1 (so that G is weight-balanced and, additionally, both in-degree and out-degree matrices are equal to In ).
4.3
Powers of the adjacency matrix, paths and connectivity
Lemma 4.2 (Directed paths and powers of the adjacency matrix). Let G be a weighted digraph with n nodes, with weighted adjacency matrix A, with unweighted adjacency matrix A0,1 ∈ {0, 1}n×n , and possibly with self-loops. For all i, j ∈ {1, . . . , n} and k ∈ N (i) the (i, j) entry of Ak0,1 equals the number of directed paths of length k (including paths with self-loops) from node i to node j; and (ii) the (i, j) entry of Ak is positive if and only if there exists a directed path of length k (including paths with self-loops) from node i to node j. Proof. The first statement is proved by induction; for simplicity of notation, let A be binary. The statement is clearly true for k = 1. Next, we assume the statement is true for k ≥ 1 and we prove it for k + 1. By assumption, the entry (Ak )ij equals the number of directed paths from i to j of length k. Now, each path from i to j of length k + 1 identifies (1) a unique node h such that (i, h) is an edge of G and (2) a unique path from h to j of length k. We write Ak+1 = AAk in components as (Ak+1 )ij =
n X h=1
Aih (Ak )hj =
n X
(Ak )hj ,
h∈N out (i)
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 4.3. Powers of the adjacency matrix, paths and connectivity
47
where N out (i) are the nodes that are the out-neighbors of i. Therefore, it is true that the entry (Ak+1 )ij equals the number of directed paths from i to j of length k + 1. This concludes the induction argument. The second statement, for the case where A is not binary, is a direct consequence of the first.  Proposition 4.3 (Connectivity properties of the digraph and positive powers of the adjacency matrix). Let G be a weighted digraph with n nodes and weighted adjacency matrix A. The following statements are equivalent: (i) G is strongly connected; (ii) A is irreducible; and Pn−1 k (iii) k=0 A is positive.
For any i, j ∈ {1, . . . , n}, the following equivalent statements hold: (iv) the jth node of G is globally reachable if and only if the jth column of
Pn−1
Ak is positive; and P k (v) the ith node of G is the root of a directed spanning tree if and only if the ith row of n−1 k=0 A is positive. The adjacency matrix of this unweighted directed graph is
1
2
k=0
3
  1 1 1 0 1 1 . 0 1 1
Even though vertices 2 and 3 are globally reachable, the digraph is not strongly connected because vertex 1 has no in-neighbor other than itself. Therefore, as it is easy to observe, the associated adjacency matrix is reducible. Proof of Proposition 4.3. (ii) =⇒ (i) We assume A is irreducible and aim to show that there exist directed paths from any node to any other node. Fix i ∈ {1, . . . , n} and let Ri ⊂ {1, . . . , n} be the set of nodes that belong to directed paths originating from node i. Denote the unreachable nodes by Ui = {1, . . . , n} \ Ri . Second, by contradiction, assume Ui is not empty. Then Ri ∪ Ui is a nontrivial partition of the index set {1, . . . , n} and irreducibility implies the existence of a non-zero entry ajh with j ∈ Ri and h ∈ Ui . But then the node h is reachable. Therefore, Ui = ∅, and all nodes are reachable from i. The converse statement (i) =⇒ (ii) is proved similarly. (i) =⇒ (iii): If G is strongly connected, then there exists a directed path of length k ≤ n − 1 connecting node i to node j, for all i and j. Hence, by Lemma 4.2(ii), the entry (Ak )ij is strictly positive. This implies (iii). P h k (iii) =⇒ (i): If n−1 k=0 A is positive, then for all i and j there must exist h such that Aij > 0. This implies the existence of a path of length h from i to j.  Notice that if node j is reachable from node i via a path of length k and at least one node along that path has a self-loop, then node j is reachable from node i via paths of length k, k + 1, k + 2, and so on. This observation and statement (iv) in Proposition 4.3 lead to the following corollary. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 48
Chapter 4. The Adjacency Matrix
Corollary 4.4 (Connectivity properties of the digraph and positive powers of the adjacency matrix: cont’d). Let G be a weighted digraph with n nodes, weighted adjacency matrix A and a self-loop at each node. The following statements are equivalent: (i) G is strongly connected; and (ii) An−1 is positive, so that A is primitive. For any j ∈ {1, . . . , n}, the following two statements are equivalent: (i) the jth node of G is globally reachable; and (ii) the jth column of An−1 has positive entries.
4.4
Graph theoretical properties of primitive matrices
In this section we present the main result of this chapter, an immediate corollary and its proof. Proposition 4.5 (Strongly connected and aperiodic digraph and primitive adjacency matrix). Let G be a weighted digraph with weighted adjacency matrix A. The following two statements are equivalent: (i) G is strongly connected and aperiodic; and (ii) A is primitive, that is, there exists k ∈ N such that Ak is positive. Corollary 4.6 (Strongly connected digraph with self-loops and primitive adjacency matrix). Let G be a weighted digraph with weighted adjacency matrix A. If G is strongly connected and has at least one self-loop, then A is primitive. Before proving Proposition 4.5, we introduce a useful fact from number theory, whose proof we leave as Exercise E4.6. Loosely, the following lemma states that coprime numbers (i.e., numbers whose greatest common divisor is 1) generate, via linear combinations with nonnegative integer coefficients, all numbers larger than a given threshold. Lemma 4.7 (Frobenius number). Given a finite set A = {a1 , a2 , . . . , an } of positive integers, an integer M is said to be representable by A if there exists nonnegative integers {α1 , α2 , . . . , αn } such that M = α1 a1 + · · · + αN aN . The following statements are equivalent: (i) there exists a finite largest unrepresentable integer, called the Frobenius number of A, and (ii) the greatest common divisor of A is 1. Finally, we provide a proof for Proposition 4.5 taken from (Bullo et al. 2009). Proof of Proposition 4.5. (i) =⇒ (ii) Pick any ordered pair (i, j). We claim that there exists a number k(i, j) with the property that, for all m > k(i, j), we have (Am )ij > 0, that is, there exists a directed path from i to j of length m for all m ≥ k(i, j). If this claim is correct, then the statement (ii) is proved with k = max{k(i, j) | i, j ∈ {1, . . . , n}}. To show this claim, let {c1 , . . . , cN } be the set of the cycles of G and let {k1 , . . . , kN } be their lengths. Because G is aperiodic, the lenghts {k1 , . . . , kN } are coprime Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 4.4. Graph theoretical properties of primitive matrices
49
and Lemma 4.7 implies the existence of a number h(k1 , . . . , kN ) such that any number larger than h(k1 , . . . , kN ) is a linear combination of k1 , . . . , kN with nonnegative integer as coefficients. Because G is strongly connected, there exists a path γ of arbitrary length Γ(i, j) that starts at i, contains a vertex of each of the cycles c1 , . . . , cN , and terminates at j. Now, we claim that k(i, j) = Γ(i, j)+h(k1 , . . . , kN ) has the desired property. Indeed, pick any number m > k(i, j) and write it as m = Γ(i, j)+β1 k1 +· · ·+βN kN for appropriate numbers β1 , . . . , βN ∈ N. A directed path from i to j of length m is constructed by attaching to the path γ the following cycles: β1 times the cycle c1 , β2 times the cycle c2 , . . . , βN times the cycle cN . (ii) =⇒ (i) From Lemma 4.2 we know that Ak > 0 means that there are paths of length k from every node to every other node. Hence, the digraph G is strongly connected. Next, we prove aperiodicity. Because G is strongly connected, each node of G has at least one outgoing edge, that is, for all i, there exists at least one index j such that aij > 0. This implies that the matrix Ak+1 = AAk is positive Pfact n k+1 via the following simple calculation: (A )il = h=1 aih (Ak )hl ≥ aij (Ak )jl > 0. In summary, if Ak is positive for some k, then Am is positive for all subsequent m > k (see also Exercise E2.6). Therefore, there are closed paths in G of any sufficiently large length. This fact implies that G is aperiodic; indeed, by contradiction, if the cycle lengths were coprimes, than G would not possess such closed paths of arbitrary sufficiently large length. 
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 50
Chapter 4. The Adjacency Matrix
4.5
Exercises
E4.1
Edges and triangles in an undirected graph. Let A be the binaryP adjacency matrix for an undirected n graph G without self-loops. Recall that the trace of A is trace(A) = i=1 aii . Show that
E4.2
(i) trace(A) = 0, (ii) trace(A2 ) = 2|E|, where |E| is the number of edges of G, and (iii) trace(A3 ) = 6|T |, where |T | is the number of triangles of G. (A triangle is a complete subgraph with three vertices.) (iv) Give the formula relating trace(An ) to the number of closed walks on the graph of length n.   0 1 1 (v) Verify results (i)-(iii) on the matrix A = 1 0 1. 1 1 0
A sufficient condition for primitivity. Assume the square matrix A is nonnegative and irreducible. Show that (i) if A has a positive diagonal element, then A is primitive, (ii) if A is primitive, then it is false that A must have a positive diagonal element.
E4.3
Example row-stochastic matrices and associated digraph. Consider the row-stochastic matrices  0 1 1 A1 =  2 0 1
0 0 1 1
1 1 0 0
 1 0 , 1 0
 1 1 1 A2 =  2 0 0
0 0 1 1
1 1 0 0
 0 0 , 1 1
 1 1 1 and A3 =  2 0 0
0 1 0 1
1 0 1 0
 0 0 . 1 1
Draw the digraphs G1 , G2 and G3 associated with these three matrices. Using only the original definitions and without relying on the characterizations in Propositions 4.3 and 4.5, show that: (i) the matrices A1 , A2 and A3 are irreducible and primitive, (ii) the digraphs G1 , G2 and G3 are strongly connected and aperiodic, and (iii) the averaging algorithm defined by A2 converges in a finite number of steps. E4.4
Convergent substochastic matrices. Let A be a nonnegative matrix with associated digraph G. Let dout (i) denote the out-degree of node i. Show that (i) A is substochastic (as defined in Exercise E2.5) if and only if dout (i) ≤ 1 for all i and dout (j) < 1 for at least one j. Next, suppose that for each node i with dout (i) = 1 there exists a directed path from i to a node j(i) with dout (j) < 1. Show that (ii) there exists k such that Ak 1n < 1n ; and (iii) ρ(A) < 1, that is, A is convergent.
E4.5
Normalization of nonnegative irreducible matrices. Consider a strongly connected weighted digraph G with n nodes and with an irreducible adjacency matrix A ∈ Rn×n . The matrix A is not necessarily row-stochastic. Find a positive vector v ∈ Rn so that the normalized matrix Anormalized =
1 (diag(v))−1 A diag(v) ρ(A)
is nonnegative, irreducible, and row-stochastic. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Exercises for Chapter 4
51
E4.6
The Frobenius number. Prove Lemma 4.7. Hint: Read up on the Frobenius number in (Owens 2003).
E4.7
Leslie population model. The Leslie model is used in population ecology to model the changes in a population of organisms over a period of time; see the original reference (Leslie 1945) and a comprehensive text (Caswell 2001). In this model, the population is divided into n groups based on age classes; the indices i are ordered increasingly with the age, so that i = 1 is the class of the newborns. The variable xi (k), i ∈ {1, . . . , n}, denotes the number of individuals in the age class i at time k; at every time step k the xi (k) individuals • produce a number αi xi (k) of offsprings (i.e., individuals belonging to the first age class), where αi ≥ 0 is a fecundity rate, and • progress to the next age class with a survival rate βi ∈ [0, 1].
If x(k) denotes the vector of individuals at time k, the Leslie population model reads   α1 α2 . . . αn−1 αn  β1 0 . . . 0 0     . . .. ..  x(k), 0 β 0 x(k + 1) = Ax(k) =  2    . .  . . . .. .. .. ..   .. 0 0 . . . βn−1 0
(E4.1)
where A is referred to as the Leslie matrix. Consider the following two independent sets of questions. First, assume αi > 0 for all i ∈ {1, . . . , n} and 0 < βi ≤ 1 for all i ∈ {1, . . . , n − 1}. (i) Prove that the matrix A is primitive.
(ii) Let pi (k) = Pnxi (k) denote the percentage of the total population in class i at time k; accordingly, i=1 xi (k) let p(k) be the population distribution at time k. Compute the asymptotic population distribution when k → +∞, expressing it in terms of the spectral radius ρ(A) and the parameters (αi , βi ), i ∈ {1, . . . , n}. Hint: The quantity limk→∞ p(k) is independent of x(0) and of the left dominant eigenvector of A. (iii) Assume βi = β > 0 and αi = nβ for i ∈ {1, . . . , n}. What percentage of the total population belongs to the eldest class n asymptotically? (iv) Find a sufficient condition on the parameters (αi , βi ), i ∈ {1, . . . , n}, so that the population will eventually become extinct. Second, assume αi ≥ 0 for i ∈ {1, . . . , n} and 0 ≤ βi ≤ 1 for all i ∈ {1, . . . , n − 1}.
E4.8
(v) Find a necessary and sufficient condition on the parameters (αi , βi ), i ∈ {1, . . . , n} so that the Leslie matrix A is irreducible. (vi) For an irreducible Leslie matrix (as in the previous point (v)), find a sufficient condition on the parameters (αi , βi ), i ∈ {1, . . . , n}, that ensures that the population will not go extinct.
Swiss railroads: continued. From Exercise E3.6, consider the fictitious railroad map of Switzerland given in Figure E3.1. Write the unweighted adjacency matrix A of this transportation network and, relying upon A and its powers, answer the following questions:
(i) what is the number of links of the shortest path connecting St. Gallen to Zermatt? (ii) is it possible to go from Bern to Chur using 4 links? And 5? (iii) how many different routes, with strictly less then 9 links and possibly visiting the same station more than once, start from Zürich and end in Lausanne?
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 52
Chapter 4. The Adjacency Matrix
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Chapter 5
Discrete-time Averaging Systems After our discussions about matrix and graph theory, we are finally ready to go back to the examples discussed in Chapter 1. Recall from Chapter 1: (i) “Averaging Algorithm” = given an undirected graph, associate a variable to each node and iteratively compute local averages; (ii) “Distributed Hypothesis Testing” and “Distributed Parameter Estimation” = design an algorithm to compute the average of n numbers; (iii) “Reaching a Consensus in a Social Influence Network” = given an arbitrary row-stochastic matrix, what does it con- Figure 5.1: Interactions in a social influence network verge to? (iv) “Cyclic pursuit and balancing” = given a specific matrix (cyclic=sparse, doubly-stochastic), does it converge? This chapter discusses two topics. First, we present some analysis results, and, specifically, some convergence results for averaging algorithms defined by rowstochastic matrices; we discuss primitive matrices and reducible matrices with a single or multiple sinks. Our treatment is related to the discussion in (Jackson 2010, Chapter 8) and (DeMarzo et al. 2003, Appendix C and, specifically, Theorem 10). Second, we show some design results and, specifically, how to design optimal matrices; we discuss the equal-neighbor model and the Metropolis–Hastings model. The computation of optimal averaging algorithms (doubly-stochastic matrices) is discussed in Boyd et al. (2004).
5.1
Averaging with primitive row-stochastic matrices
From Chapter 2 on matrix theory, we can now re-state the main convergence result in Corollary 2.20 in a more explicit way — using the main graph-theory result in Proposition 4.5. 53
 54
Chapter 5. Discrete-time Averaging Systems
Corollary 5.1 (Consensus for row-stochastic matrices with strongly connected and aperiodic graph). If a row-stochastic matrix A has an associated digraph that is strongly connected and aperiodic (hence A is primitive), then (i) limk→∞ Ak = 1n w> , where w > 0 is the left eigenvector of A with eigenvalue 1 satisfying w1 +· · ·+wn = 1; (ii) the solution to x(k + 1) = Ax(k) satisfies  lim x(k) = w> x(0) 1n ;
k→∞
(iii) if additionally A is doubly-stochastic, then w = n1 1n (because A> 1n = 1n and n1 1> n 1n = 1) so that lim x(k) =
k→∞
5.2
 1> n x(0) 1n = average x(0) 1n . n
Averaging with reducible matrices
Next, consider a reducible row-stochastic matrix A, i.e., a row-stochastic matrix whose associated digraph G is not strongly connected. We wish to give sufficient conditions for semi-convergence of A. We first recall a useful property from Lemma 3.2: G has a globally reachable node if and only if its condensation digraph has a globally reachable node (that is asingle sink). Along these same lines one can show that the set of globally reachable nodes induces a strongly connected component of G. A digraph with a globally reachable node and its condensation digraph is illustrated in Figure 5.2.
Figure 5.2: First panel: An example digraph with a set of globally reachable nodes. Second panel: its strongly connected components (in red and blue). Third panel: its condensation digraph with a sink. For this digraph, the subgraph induced by the globally reachable nodes is aperiodic.
We are now ready to establish the semiconvergence of adjacency matrices of digraphs with globally reachable nodes. Theorem 5.2 (Consensus for row-stochastic matrices with a globally-reachable aperiodic strongly-connected component). Let A be a row-stochastic matrix and let G be its associated digraph. Assume that G has a globally reachable node and the subgraph induced by the set of globally reachable nodes is aperiodic. Then (i) the simple eigenvalue ρ(A) = 1 is strictly larger than the magnitude of all other eigenvalues, hence A is semi-convergent; (ii) limk→∞ Ak = 1n w> , where w ≥ 0 is the left eigenvector of A with eigenvalue 1 satisfying w1 +· · ·+wn = 1; Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 5.2. Averaging with reducible matrices
55
(iii) the eigenvector w ≥ 0 has positive entries corresponding to each globally reachable node and has zero entries for all other nodes; (iv) the solution to x(k + 1) = Ax(k) satisfies  lim x(k) = w> x(0) 1n .
k→∞
Note that: for all nodes j which are not globally reachable, the initial values xj (0) have no effect on the final convergence value. Note: as we discussed in Section 2.3, the limiting vector is a weighted average of the initial conditions. The relative weights of the initial conditions are the convex combination coefficients w1 , . . . , wn . In a social influence network, the coefficient wi is regarded as the “social influence” of agent i. We illustrate this concept by computing the social influence coefficients for the famous Krackhardt’s advice network (Krackhardt 1987); see Figure 5.3. Note: adjacency matrices of digraphs with globally reachable nodes are sometimes called indecomposable; see (Wolfowitz 1963).
Figure 5.3: Krackhardt’s advice network with 21 nodes. The social influence of each node is illustrated by its gray level.
Proof of Theorem 5.2. By assumption the condensation digraph of A contains a sink that is globally reachable, hence it is unique. Therefore, after a permutation of rows and columns (see Exercise E3.1),   A11 0 A= , (lower-triangular matrix), (5.1) A21 A22 The state vector x is correspondingly partitioned into x1 ∈ Rn1 and x2 ∈ Rn2 so that x1 (k + 1) = A11 x1 (k),
(5.2)
x2 (k + 1) = A21 x1 (k) + A22 x2 (k).
(5.3)
Here x1 and A11 are the variables and the matrix corresponding to the sink. Because the sink, as a subgraph of G, is strongly connected and aperiodic, A11 is primitive and row-stochastic and, by Corollary 5.1, lim Ak11 = 1n1 w1> , k→∞
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 56
Chapter 5. Discrete-time Averaging Systems
where w1 > 0 is the left eigenvector with eigenvalue 1 for A11 normalized so that 1> n1 w1 = 1. The matrix A22 is analyzed as follows. Recall from Exercise E2.5 the notion of substochastic matrix and note, from Exercise E4.4, that an irreducible substochastic matrix has spectral radius less than 1. Now, because A21 cannot be zero (otherwise the sink would not be globally reachable), the matrix A22 is substochastic. Moreover, (after appropriately permuting rows and columns of A22 ) it can be observed that A22 is a lower-triangular matrix such that each diagonal block is row substochastic and irreducible (corresponding to each node in the condensation digraph). Therefore, we know ρ(A22 ) < 1 and, in turn, In2 − A22 is invertible. Because A11 is primitive and ρ(A22 ) < 1, A is semiconvergent and limk→∞ x2 (k) exists. Taking the limit as k → ∞ in equation (5.3), some straightforward algebra shows that  lim x2 (k) = (In2 − A22 )−1 A21 lim x1 (k) = (In2 − A22 )−1 A21 (1n1 w1> ) x1 (0). k→∞
k→∞
From the row-stochasticity of A, we know A21 1n1 +A22 1n2 Collecting these results, we write  k  A11 0 1 w> lim = n1 1> 1n2 w1 k→∞ A21 A22
= 1n2 and hence (In2 −A22 )−1 A21 1n1 = 1n2 .   > 0 w = 1n 1 . 0 0 
5.3
Averaging with reducible matrices and multiple sinks
In this section we now consider the general case of digraphs that do not contain globally reachable nodes, that is, digraphs whose condensation digraph has multiple sinks. In the following statement we say that a node is connected with a sink of a digraph if there exists a directed path from the node to any node in the sink. Theorem 5.3 (Convergence for row-stochastic matrices with multiple aperiodic sinks). Let A be a row-stochastic matrix and let G be its associated digraph. Assume the condensation digraph C(G) contains M ≥ 2 sinks and assume all of them are aperiodic. Then (i) the semi-simple eigenvalue ρ(A) = 1 has multiplicity equal M and is strictly larger than the magnitude of all other eigenvalues, hence A is semi-convergent,
(ii) there exist M left eigenvectors of A, denoted by ws ∈ Rn , for m ∈ {1, . . . , M }, with the properties that: wm ≥ 0, w1m + · · · + wnm = 1 and wim is positive if and only if node i belongs to the m-th sink,
(iii) the solution to x(k + 1) = Ax(k) with initial condition x(0) satisfies   (wm )> x(0), if node i belongs tothe m-th sink,    (wm )> x(0), if node i is connected with the m-th sink and no other sink, lim xi (k) = M X  k→∞    zi,m (wm )> x(0) , if node i is connected to more than one sink,   m=1
where, for each node i connected to more than one sink, the coefficients zi,m , m ∈ {1, . . . , S}, are combination coefficients coefficients and are strictly positive if and only if there exists a directed path from node i to the sink m. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 5.3. Averaging with reducible matrices and multiple sinks
57
Proof. Rather than treating with heavy notation the general case, we work out an example and refer the reader to (DeMarzo et al. 2003, Theorem 10) for the general proof. Assume the condensation digraph of A is composed of three nodes, two of which are sinks, as in the side figure.
x3 x1
x2
Therefore, after a permutation of rows and columns (see Exercise E3.1), A can be written as   A11 0 0 A =  0 A22 0  A31 A32 A33
and the state vector x is correspondingly partitioned into the vectors x1 , x2 and x3 . The state equations are: x1 (k + 1) = A11 x1 (k),
(5.4)
x2 (k + 1) = A22 x2 (k),
(5.5)
x3 (k + 1) = A31 x1 (k) + A32 x2 (k) + A33 x3 (k).
(5.6)
By the properties of the condensation digraph and the assumption of aperiodicity of the sinks, the digraphs associated to the row-stochastic matrices A11 and A22 are strongly connected and aperiodic. Therefore we immediately conclude that   lim x1 (k) = w1> x1 (0) 1n1 and lim x2 (k) = w2> x2 (0) 1n2 , k→∞
k→∞
where w1 (resp. w2 ) is the left eigenvector of the eigenvalue 1 for matrix A11 (resp. A22 ) with the usual > normalization 1> n1 w1 = 1n2 w2 = 1. Regarding the matrix A33 , the same discussion as in the previous proof leads to ρ(A33 ) < 1 and, in turn, to the statement that In3 − A33 is nonsingular. By taking the limit as k → ∞ in equation (5.6), some straightforward algebra shows that  lim x3 (k) = (In3 − A33 )−1 A31 lim x1 (k) + A32 lim x2 (k) k→∞ k→∞ k→∞   > −1 = (w1 x1 (0)) (In3 − A33 ) A31 1n1 + (w2> x2 (0)) (In3 − A33 )−1 A32 1n2 . Moreover, because A is row-stochastic, we know
A31 1n1 + A32 1n2 + A33 1n3 = 1n3 , and, using again the fact that In3 − A33 is nonsingular,
1n3 = (In3 − A33 )−1 A31 1n1 + (In3 − A33 )−1 A32 1n2 .
This concludes our proof of Theorem 5.3 for the simplified case C(G) having three nodes and two sinks.  Note that: convergence does not occur to consensus (not all components of the state are equal) and the final value of all nodes is independent of the initial values at nodes which are not in the sinks of the condensation digraph. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 58
Chapter 5. Discrete-time Averaging Systems 1/3 1/3
4
3
4
1/4 1/4 1/4 1/2
1
2
1/3 1/3
1/3 1/3
1/2
1
3
1/4
2
Figure 5.4: The equal-neighbor model
5.4
Design of weights for undirected graphs: the equal-neighbor model
From Section 1.2 let us consider an undirected graph as in Figure 5.4 and the following simplest distributed algorithm, based on the concepts of linear averaging. Each node contains a value xi and repeatedly executes:  x+ (5.7) i := average xi , {xj , for all neighbor nodes j} . Let us make a few simple observations. The algorithm (5.7) can be written in matrix format as:   1/2 1/2 0 0 1/4 1/4 1/4 1/4  x(k + 1) =   0 1/3 1/3 1/3 x(k) =: Awsn x(k). 0 1/3 1/3 1/3
The binary symmetric adjacency matrix and the degree matrix of the undirected graph are     0 1 0 0 1 0 0 0 1 0 1 1 0 3 0 0    A= 0 1 0 1 , D = 0 0 2 0 , 0 1 1 0 0 0 0 2 and so one can verify that
Awsn = (D + I3 )−1 (A + I3 ),   X 1 xi (k + 1) = xi (k) + xj (k) . 1 + d(i) j∈N (i)
Recall that A + I3 is the adjacency matrix of a graph that is equal to the graph in figure with the addition of a self-loop at each node; this new graph has degree matrix D + I3 . Now, it is also quite easy to verify (see also see Exercise E5.1) that Awsn 14 = 14 ,
but unfortunately
> 1> 4 Awsn 6= 14 .
We summarize this discussion and state a more general result, in arbitrary dimensions and for arbitrary graphs. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 5.5. Design of weights for undirected graphs: the Metropolis–Hastings model
59
Lemma 5.4 (The equal-neighbor row-stochastic matrix). Let G be a weighted digraph with n nodes, weighted adjacency matrix A and weighted out-degree matrix Dout . Define Aequal-neighbor = (In + Dout )−1 (In + A). Note that the weighted digraph associated to (A + In ) is G with the addition of a self-loop at each node with unit weight. Then (i) Aequal-neighbor is row-stochastic; (ii) Aequal-neighbor is primitive if and only if G is strongly connected; and (iii) Aequal-neighbor is doubly-stochastic if G is weight-balanced and the weighted degree is constant for all nodes (i.e., Dout = Din = dIn for some d ∈ R>0 ). Proof. First, for any v ∈ Rn with non-zero entries, it is easy to see diag(v)−1 v = 1n . Recalling the definition Dout + In = diag((A + In )1n ),   (Dout + In )−1 (A + In ) 1n = diag((A + In )1n )−1 (A + In )1n = 1n ,
which proves statement (i). To prove statement (ii), note that, beside self-loops, G and the weighted digraph associated with Aequal-neighbor have the same edges. Also note that the weighted digraph associated with Aequal-neighbor is aperiodic by design. Finally, if Dout = Din = dIn for some d ∈ R>0 , then statement (iii) follows from (Dout + In )−1 (A + In )
5.5
>
 1 (A + In )> 1n d+1  = (Din + I)−1 (A + In )> 1n
1n =
 = diag((A + In )> 1n )−1 (A + In )> 1n = 1n .
Design of weights for undirected graphs: the Metropolis–Hastings model
Next, we suggest a second way of assigning weights to a graph for the purpose of designing an averaging algorithm. Given an undirected unweighted graph G with n nodes, edge set E and degrees d(1), . . . , d(n), define the weighted adjacency matrix AMetropolis-Hastings by  1   ,   1 + max{d(i), d(j)}    X (AMetropolis-Hastings )ij = 1 − (AMetropolis-Hastings )ih ,     {i,h}∈E   0,
if {i, j} ∈ E and i 6= j, if i = j, otherwise.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 60
Chapter 5. Discrete-time Averaging Systems 5/12 5/12
4
3
3/4
3
1/3
1/4
1/4
4
1/4
1
1
2
2
1/4
Figure 5.5: The Metropolis–Hastings model
In our example,    1 0 1 0 0 0 1 0 1 1   A= 0 1 0 1 , D = 0 0 0 1 1 0
0 3 0 0
0 0 2 0
 0 0  0 2
=⇒
AMetropolis-Hastings
 3/4 1/4 =  0 0
 1/4 0 0 1/4 1/4 1/4  . 1/4 5/12 1/3  1/4 1/3 5/12
One can verify that the Metropolis–Hastings weights have the following properties:
(i) (AMetropolis-Hastings )ij > 0 if {i, j} ∈ E, (AMetropolis-Hastings )ii > 0 for all i ∈ {1, . . . , n}, and (AMetropolis-Hastings )ij = 0 else; (ii) AMetropolis-Hastings is symmetric and doubly-stochastic; and (iii) AMetropolis-Hastings is primitive if and only if G is connected.
5.6
Centrality measures
In network science it is of interest to determine the relative importance of a node in a network. There are many ways to do so and they are referred to as centrality measures or centrality scores. Part of the treatment in this section is inspired by (Newman 2010). We refer (Brandes and Erlebach 2005) for a comprehensive review of network analysis metrics and related computational algorithms and to (Gleich 2015) for a comprehensive review of Pagerank and its multiple extentions and applications. We start by presenting four centrality notions based on the adjacency matrix. We treat the general case of a weighted digraph G with weighted adjacency matrix A (warning: many articles in the literature deal with undirected graphs only.) The matrix A is nonnegative, but not necessarily row stochastic. From the Perron-Frobenius theory, recall the following facts: (i) if G is strongly connected, then the spectral radius ρ(A) is an eigenvalue of maximum magnitude and its corresponding left eigenvector can be selected to be strictly positive and with unit sum (see Theorem 2.15); and (ii) if G contains a globally reachable node, then the spectral radius ρ(A) is an eigenvalue of maximum magnitude and its corresponding left eigenvector is nonnegative and has positive entries corresponding to each globally reachable node (see Theorem 5.2). Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 5.6. Centrality measures Degree centrality in-degree:
61
For an arbitrary weighted digraph G, the degree centrality cdegree (i) of node i is its cdegree (i) = din (i) =
n X
(5.8)
aji ,
j=1
that is, the number of in-neighbors (if G is unweighted) or the sum of the weights of the incoming edges. Degree centrality is relevant, for example, in (typically unweighted) citation networks whereby articles are ranked on the basis of their citation records. (Warning: the notion that a high citation count is an indicator of quality is clearly a fallacy.) Eigenvector centrality One problem with degree centrality is that each in-edge has unit count, even if the in-neighbor has negligible importance. To remedy this potential drawback, one could define the importance of a node to be proportional to the weighted sum of the importance of its in-neighbors (see (Bonacich 1972) for an early reference). This line of reasoning leads to the following definition. For a weighted digraph G with globally reachable nodes (or for an undirected graph that is connected), define the eigenvector centrality vector, denoted by cev , to be the left dominant eigenvector of the adjacency matrix A associated with the dominant eigenvalue and normalized to satisfy 1> n cev = 1. Note that the eigenvector centrality satisfies A> cev =
1 cev α
⇐⇒
cev (i) = α
n X
aji cev (j).
(5.9)
j=1
1 where α = ρ(A) is the only possible choice of scalar coefficient in equation (5.9) ensuring that there exists a unique solution and that the solution, denoted cev , is strictly positive in a strongly connected digraph and nonnegative in a digraph with globally reachable nodes. Note that this connectivity property may be restrictive in some cases.
Figure 5.6: Comparing degree centrality versus eigenvector centrality: the node with maximum in-degree has zero eigenvector centrality in this graph
Katz centrality For a weighted digraph G, pick an attenuation factor α < 1/ρ(A) and define the Katz centrality vector (see (Katz 1953)), denoted by cK , by the following equivalent formulations: cK (i) = α
n X
aji (cK (j) + 1),
(5.10)
j=1
or cK (i) =
n ∞ X X
αk (Ak )ji .
k=1 j=1
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
(5.11)
 62
Chapter 5. Discrete-time Averaging Systems
Katz centrality has therefore two interpretations: (i) the importance of a node is an attenuated sum of the importance and of the number of the in-neighbors – note indeed how equation (5.10) is a combination of equations (5.8) and (5.9), and (ii) the importance of a node is α times number of length-1 paths into i (i.e., the in-degree) plus α2 times the number of length-2 paths into i, etc. (From Lemma 4.2, recall that, for an unweighted digraph, (Ak )ji is equal to the number of directed paths of length k from j to i.) Note how, for α < 1/ρ(A), equation (5.10) is well-posed and equivalent to cK = αA> (cK + 1n ) ⇐⇒ cK + 1n = αA> (cK + 1n ) + 1n ⇐⇒ (In − αA> )(cK + 1n ) = 1n
⇐⇒ cK = (In − αA> )−1 1n − 1n ∞ X ⇐⇒ cK = αk (A> )k 1n ,
(5.12)
k=1
P∞ k where we used the identity (In − A)−1 = k=0 A valid for any matrix A with ρ(A) < 1; see Exercise E2.12. There are two simple ways to compute the Katz centrality. According to equation (5.12), for limited size problems, one can invert the matrix (In − αA> ). Alternatively, one can show that the following > iteration converges to the correct value: c+ K := αA (cK + 1n ). 0
1000
Figure 5.7: Image taken without permission from (Ishii and Tempo 2014). The pattern in figure displays the 2000 so-called hyperlink matrix, i.e., the transpose of the adjacency matrix, for a collection of websites at the Lincoln University in New Zealand from the year 2006. Each empty column corresponds to a webpage without any outgoing link, that is, to a so-called dan- 3000 gling node. This Web has 3756 nodes with 31,718 links. A fairly large portion of the nodes are dangling nodes: in this example, there are 3255 dangling nodes, which is over 85% of the total. 0
1000
2000
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
3000
 5.6. Centrality measures
63
Pagerank centrality For a weighted digraph G with row-stochastic adjacency matrix (i.e., unit outdegree for each node), pick a convex combination coefficient α ∈ ]0, 1[ and define the pagerank centrality vector, denoted by cpr , as the unique positive solution to cpr (i) = α
n X
aji cpr (j) +
j=1
1−α , n
(5.13)
or, equivalently, to cpr = M cpr ,
1> n cpr = 1,
where M = αA> +
1−α 1n 1> n. n
(5.14)
(To establish the equivalence between these two definitions, the only non-trivial step is to notice that if cpr solves equation (5.13), then it must satisfy 1> n cpr = 1.) Note that, for arbitrary unweighted digraphs and binary adjacency matrices A0,1 , it is natural to −1 compute the pagerank vector with A = Dout A0,1 . We refer to (Brin and Page 1998; Ishii and Tempo 2014; Page 2001) for the important interpretation of the pagerank score as the stationary distribution of the so-called random surfer of an hyperlinked document network — it is under this disguise that the pagerank score was conceived by the Google co-founders and a corresponding algorithm led to the establishment of the Google search engine. In the Google problem it is customary to set α ≈ .85. Closeness and betweenness centrality (based on shortest paths) Degree, eigenvector, Katz and Pagerank centrality are presented using the adjacency matrix. Next we present two centrality measures based on the notions of shortest path and geodesic distance; these two notions belong to the class of radial and medial centrality measures (Borgatti and Everett 2006). We start by introducing some additional graph theory. For a weighted digraph with n nodes, the length of a directed path is the sum of the weights of edges in the directed path. For i, j ∈ {1, . . . , n}, a shortest path from a node i to a node j is a directed path of smallest length. Note: it is easy to construct examples with multiple shortest paths, so that the shortest path is not unique. The geodesic distance di→j from node i to node j is the length of a shortest path from node i to node j; we also stipulate that the geodesic distance dij takes the value zero if i = j and is infinite if there is no path from i to j. Note: in general di→j 6= dj→i . Finally, For i, j, k ∈ {1, . . . , n}, we let gi→k→j denote the number of shortest paths from a node i to a node j that pass through node k. For a strongly-connected weighted digraph, the closeness of node i ∈ {1, . . . , n} is the inverse sum over the geodesic distances di→j from node i to all other nodes j ∈ {1, . . . , n}, that is: ccloseness (i) = Pn
1
j=1 di→j
.
(5.15)
For a strongly-connected weighted digraph, the betweenness of node i ∈ {1, . . . , n} is the fraction of all shortest paths gkij from any node k to any other node j passing through node i, that is: Pn j,k=1 gk→i→j . (5.16) cbetweenness (i) = Pn Pn h=1 j,k=1 gk→h→j Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 64
Chapter 5. Discrete-time Averaging Systems
Summary To conclude this section, in Table 5.1, we summarize the various centrality definitions for a weighted directed graph. Measure
Definition
degree centrality
cdegree = A> 1n
eigenvector centrality
cev = αA> cev
pagerank centrality
cpr = αA> cpr +
Katz centrality closeness centrality betweenness centrality
Assumptions 1 , G has a ρ(A) globally reachable node α=
1−α 1n n cK = αA> (cK + 1n ) ccloseness (i) = Pn cbetweenness (i) =
α < 1, A1n = 1n 1 α< ρ(A)
1
di→j j=1 P n gk→i→j Pn j,k=1 Pn h=1 j,k=1 gk→h→j
G strongly connected G strongly connected
Table 5.1: Definitions of centrality measures for a weighted digraph G with adjacency matrix A
Figure 5.8 illustrates some centrality notions on a small instructive example due to Brandes (2006). Note that a different node is the most central one in each metric; this variability is naturally expected and highlights the need to select a centrality notion relevant to the specific application of interest.
(a) degree centrality
(b) eigenvector centrality
(c) closeness centrality
(d) betweenness centrality
Figure 5.8: Degree, eigenvector, closeness, and betweenness centrality for an undirected unweighted graph. The dark node is the most central node in the respective metric; a different node is the most central one in each metric.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 5.7. Exercises
65
5.7
Exercises
E5.1
Left eigenvector for “equal-neighbor” row-stochastic matrices. Let A01 be the binary (i.e., each entry is either 0 or 1) adjacency matrix for a unweighted undirected graph. Assume the associated graph is connected. Let D = diag(d1 , . . . , dn ) be the degree matrix, let |E| be the number of edges of the graph, and define A = D−1 A01 . Show that (i) the definition of A is well-posed and A is row-stochastic, and (ii) the left eigenvector of A associated to the eigenvalue 1 and normalized so that 1> n w = 1 is 
 d1 1 . w=  . . 2|E| . dn Next, consider the equal-neighbor averaging algorithm in equation (5.7) with associated row-stochastic matrix Aequal-neighbor = (D + In )−1 (A01 + In ). (iii) Show that lim x(k) =
k→∞
n  X 1 (1 + di )xi (0) 1n . 2|E| + n i=1
(iv) Verify that the left dominant eigenvector of the matrix Awsn = Aequal-neighbor defined in Section 1.2 is [1/6, 1/3, 1/4, 1/4]> , as seen in Example 2.5. E5.2
A stubborn agent. Pick α ∈ ]0, 1[, and consider the discrete-time consensus algorithm x1 (k + 1) = x1 (k), x2 (k + 1) = αx1 (k) + (1 − α)x2 (k). Perform the following tasks: (i) (ii) (iii) (iv) (v)
E5.3
compute the matrix A representing this algorithm and verify it is row-stochastic, compute the eigenvalues and eigenvectors of A, draw the directed graph G representing this algorithm and discuss its connectivity properties, compute the condensation digraph of G, compute the final value of this algorithm as a function of the initial values in two alternate ways: invoking and without invoking Theorem 5.2.
Agents with self-confidence levels. Consider 2 agents, labeled +1 and −1, described by the selfconfidence levels s+1 and s−1 . Assume s+1 ≥ 0, s−1 ≥ 0, and s+1 + s−1 = 1. For i ∈ {+1, −1}, define x+ i := si xi + (1 − si )x−i .
Perform the following tasks: (i) (ii) (iii) (iv)
compute the matrix A representating this algorithm and verify it is row-stochastic, compute A2 , compute the eigenvalues, the right eigenvectors, and the left eigenvectors of A, compute the final value of this algorithm as a function of the initial values and of the self-confidence levels. Is it true that an agent with higher self-confidence makes a larger contribution to the final value? Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 66 E5.4
Chapter 5. Discrete-time Averaging Systems Persistent disagreement and the Friedkin-Johnsen model of opinion dynamics (Friedkin and Johnsen 1999). Let W be a row-stochastic matrix describing a network of interpersonal influences; assume W is irreducible. Let λi ∈ [0, 1], i ∈ {1, . . . , n}, be a parameter descring how open is an individual to changing her initial opinion about a subject; set Λ = diag(λ1 , . . . , λn ). Consider the Friedkin-Johnsen model of opinion dynamics x(k + 1) = ΛW x(k) + (In − Λ)x(0). Assume at least one individual is not completely open to change her opinion, that is, assume λi < 1 for some i. Perform the following tasks: (i) show that the matrix ΛW is convergent, (ii) show that the matrix V = (In − ΛW )−1 (In − Λ) is well-defined and row-stochastic, Hint: Review Exercises E2.10 and E2.12 (iii) show that the limiting opinions are limk→+∞ x(k) = V x(0), (iv) compute the matrix V and state whether two agents will achieve consensus or mantain persistent disagreement for the following pairs of matrices:   1/2 1/2 , and Λ1 = diag(1/2, 1), 1/2 1/2   1/2 1/2 W2 = , and Λ2 = diag(1/4, 3/4). 1/2 1/2 W1 =
(Note: Friedkin and Johnsen (1999) make the additional assumption that Λ + diag(W ) = In ; this assumption is not needed here. This model is sometimes referred to the opinion dynamics model with stubborn agents.) E5.5
Necessary and sufficient conditions for consensus. Let A be a row-stochastic matrix. Prove that the following statements are equivalent: (i) the eigenvalue 1 is simple and all other eigenvalues have magnitude strictly smaller than 1, (ii) limk→∞ Ak = 1n w> , for some w ∈ Rn , w ≥ 0, and 1> n w = 1, (iii) the digraph associated to A contains a globally reachable node and the subgraph of globally reachable nodes is aperiodic. Hint: Use the Jordan normal form to show that (i) =⇒ (ii).
E5.6
Computing centrality. Write in your favorite programming language algorithms to compute degree, eigenvector, Katz and pagerank centralities. Compute these four centralities for the following undirected unweighted graphs (without self-loops): (i) (ii) (iii) (iv)
the ring graph with 5 nodes; the star graph with 5 nodes; the line graph with 5 nodes; and the Zachary karate club network dataset. This dataset can be downloaded for example from: http: //konect.uni-koblenz.de/networks/ucidata-zachary
To compute Katz centrality of a matrix A, select α = 1/(2ρ(A)). For pagerank, use α = 1/2. Hint: Recall that pagerank centrality is well-defined for a row-stochastic matrix. E5.7
Iterative computation of Katz centrality. Given a graph with adjacency matrix A, show that the solution to the iteration x(k + 1) := αA> (x(k) + 1n ) with α < 1/ρ(A) converges to the Katz centrality vector cK , for all initial conditions x(0). Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Exercises for Chapter 5 E5.8
67
A sample DeGroot panel. A conversation between 5 panelists model by an averaging algorithm x+ = Apanel x, where  0.15 0.15 0.1 0.2  0 0.55 0 0  0.3 0.05 0.05 0 Apanel =    0 0.4 0.1 0.5 0 0.3 0 0
is modeled according to the DeGroot  0.4 0.45  0.6   0  0.7
Assuming that the panel has had sufficiently long deliberations, answer the following: (i) Based on the associated digraph, do the panelists finally agree on a common decision? (ii) In the event of agreement, does the initial opinion of any panelists get rejected? If so, which ones? (iii) If the panelists’ initial opinions are their self-appraisals (i.e., the self-weights aii , i ∈ {1, . . . , 5}), what is the final opinion? E5.9
Three DeGroot panels. Recall the DeGroot model introduced in Chapter 1. Denote by xi (0) the initial opinion of each individual, and xi (k) its updated opinion after k communications with its neighbors. Then the vector of opinions evolves over time according to x(k + 1) = Ax(k) where the coefficient aij ∈ [0, 1] is the P influence of the opinion of individual j on the update of the opinion of agent i, subject to the constraint j aij = 1. Consider the following three scenarios: (i) Everybody gives the same weight to the opinion of everybody else. (ii) There is a distinct agent (suppose the agent with index i = 1) that weights equally the opinion of all the others, and the remaining agents compute the mean between their opinion and the one of first agent. (iii) All the agents compute the mean between their opinion and the one of the first agent. Agent 1 does not change her opinion.
In each case, derive the averaging matrix A, show that the opinions converge asymptotically to a final opinion vector, and characterize this final opinion vector. E5.10 Move away from your nearest neighbor and reducible averaging. Consider n ≥ 3 robots with positions pi ∈ R, i ∈ {1, . . . , n}, dynamics pi (t + 1) = ui (t), where ui ∈ R is a steering control input. For simplicity, assume that the robots are indexed according to their initial position: p1 (0) ≤ p2 (0) ≤ p3 (0) ≤ · · · ≤ pn (0). Consider two walls at the positions p0 ≤ p1 (0) and pn+1 ≥ pn (0) so that all robots are contained between the walls. The walls are stationary, that is, p0 (t + 1) = p0 (t) = p0 and pn+1 (t + 1) = pn+1 (t) = pn+1 . Consider the following coordination law: robots i ∈ {2, . . . , n − 1} (each having two neighbors) move to the centroid of the local subset {pi−1 , pi , pi+1 }. The robots {1, n} (each having one robotic neighbor and one neighboring wall) move to the centroid of the local subsets {p0 , p1 , p2 } and {pn−1 , pn , pn+1 }, respectively. Hence, the closed-loop robot dynamics are pi (t + 1) =
1 (pi−1 (t) + pi (t) + pi+1 (t)) , 3
i ∈ {1, . . . , n} .
Show that the robots become uniformly spaced on the interval [p0 , pn+1 ] using Theorem 5.3. (Note: This exercise is a discrete-time version of E2.18(ii) based on averaging with multiple sinks.) E5.11 Central nodes in example graph. For the unweighted undirected graph in Figure 5.8, verify (possibly with the aid of a computational package) that the dark nodes have indeed the largest degree, eigenvector, closeness and betweenness centrality as stated in the figure caption.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 68
Chapter 5. Discrete-time Averaging Systems
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Chapter 6
The Laplacian Matrix So far, we have studied adjacency matrices. In this chapter, we study a second relevant matrix associated to a digraph, called the Laplacian matrix. More information on adjacency and Laplacian matrices can be found in standard books on algebraic graph theory such as (Biggs 1994) and (Godsil and Royle 2001). Two surveys about Laplacian matrices are (Merris 1994; Mohar 1991).
6.1
The Laplacian matrix
The Laplacian matrix of the weighted digraph G is L = Dout − A. In components L = (`ij )i,j∈{1,...,n}   −anij , X `ij = aih ,  
if i 6= j, if i = j,
h=1,h6=i
or, for an unweighted undirected graph,   −1, if {i, j} is an edge, not self-loop, `ij = d(i), if i = j,   0, otherwise. Note:
(i) the sign pattern of L is important — diagonal elements are positive and off-diagonal elements are nonpositive (zero or negative); (ii) the matrix L does not depend upon the existence and values of self-loops (or lack thereof ); and (iii) the graph G is undirected (i.e., symmetric adjacency matrix) if and only if L is symmetric. In this case, Dout = Din = D and A = A> . 69
 70
Chapter 6. The Laplacian Matrix We now present some useful equalities. By the way, obviously (Ax)i =
n X
(6.1)
aij xj
j=1
First, for x ∈ Rn , (Lx)i = =
n X
`ij xj = `ii xi +
j=1 n X
n X
`ij xj =
j=1,j6=i
j=1,j6=i
aij (xi − xj ) =
n  X
j=1,j6=i
X
j∈N out (i)
n  X aij xi + (−aij )xj j=1,j6=i
(6.2)
aij (xi − xj )
 dout (i) xi − average({xj , for all out-neighbors j}) .
for unit weights
=
Second, assume L = L> (i.e., aij = aji ) and compute: x> Lx = =
n X
xi (Lx)i =
i=1 n X
i,j=1
n X i=1
xi
n  X
j=1,j6=i
aij (xi − xj )
n n X 1 X 2 + aij xi − aij xi xj aij xi (xi − xj ) = 2 2
by symmetry
=
1 2
n X
1
aij x2i +
i,j=1
n 1 X = aij (xi − xj )2 2 i,j=1 X = aij (xi − xj )2 .
1 2
i,j=1
n X
i,j=1
aij x2j −
i,j=1
n X
aij xi xj
i,j=1
(6.3) (6.4)
{i,j}∈E
These equalities are useful because it is common to encounter the “array of differences” Lx and the quadratic “error” or “disagreement” function x> Lx. They provide the correct intuition for the definition of the Laplacian matrix. In the following, we will refer to x 7→ x> Lx as the Laplacian potential function; this name is justified based on the energy and power interpetation we present in the next two examples.
6.2
The Laplacian in mechanical networks of springs
x Let xi ∈ R denote the displacement of the ith rigid body. Assume that each spring is ideal linear-elastic and let aij be the spring constant for the spring connecting the ith and jth bodies. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 6.3. The Laplacian in electrical networks of resistors
71
Define a graph as follows: the nodes are the rigid bodies {1, . . . , n} with locations x1 , . . . , xn , and the edges are the springs with weights aij . Each node i is subject to a force X Fi = aij (xj − xi ) = −(Lx)i , j6=i
where L is the Laplacian for the network of springs (modeled as an undirected weighted graph). Moreover, recalling that the spring {i, j} stores the quadratic energy 21 aij (xi − xj )2 , the total elastic energy is Eelastic =
1 X 1 aij (xi − xj )2 = x> Lx. 2 2 {i,j}∈E
In this role, the Laplacian matrix is referred to as the stiffness matrix. Stiffness matrices can be defined for spring networks in arbitrary dimensions (not only on the line) and with arbitrary topology (not only a chain graph, or line graph, as in figure). More complex spring networks can be found, for example, in finite-element discretization of flexible bodies and finite-difference discretization of diffusive media.
6.3
The Laplacian in electrical networks of resistors +
3
4 1
2
Suppose the graph is an electrical network with only pure resistors and ideal voltage sources: (i) each graph vertex i ∈ {1, . . . , n} is possibly connected to an ideal voltage source, (ii) each edge is a resistor, say with resistance rij between nodes i and j. (This is an undirected weighted graph.) Ohm’s law along each edge {i, j} gives the current flowing from i to j as ci→j = (vi − vj )/rij = aij (vi − vj ), where aij is the inverse resistance, called conductance. We set aij = 0 whenever two nodes are not connected by a resistance. Kirchhoff’s current law says that at each node i: cinjected at i =
n X
j=1,j6=i
ci→j =
n X
j=1,j6=i
aij (vi − vj )
Hence, the vector of injected currents cinjected and the vector of voltages at the nodes v satisfy cinjected = L v. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 72
Chapter 6. The Laplacian Matrix
Moreover, the power dissipated on resistor {i, j} is ci→j (vi − vj ), so that the total dissipated power is X
Pdissipated =
{i,j}∈E
aij (vi − vj )2 = v> Lv.
Historical Note: Kirchhoff (1847) is a founder of graph theory in that he was an early adopter of graph models to analyze electrical circuits.
6.4
Properties of the Laplacian matrix
Lemma 6.1 (Zero row-sums). Let G be a weighted digraph with Laplacian L and n nodes. Then L1n = 0n . In equivalent words, 0 is an eigenvalue of L with eigenvector 1n . Proof. For all rows i, the ith row-sum is zero: n X
`ij = `ii +
j=1
n X
`ij =
j=1,j6=i
n  X
j=1,j6=i
n  X aij + (−aij ) = 0. j=1,j6=i
Equivalently, in vector format (remembering the weighted out-degree matrix Dout is diagonal and contains the row-sums of A):    dout (1) dout (1)     L1n = Dout 1n − A1n =  ...  −  ...  = 0n 
dout (n)
dout (n)
Note: Each graph has a Laplacian matrix. Vice versa, a square matrix is called a Laplacian if (i) its row-sums are zero, (ii) its diagonal entries are nonnegative, and (iii) its non-diagonal entries are nonpositive. Such a matrix uniquely induces a weighted digraph with the exception of the self-loops. Lemma 6.2 (Zero column-sums). Let G be a weighted digraph with Laplacian L and n nodes. The following statements are equivalent: (i) G is weight-balanced; and > (ii) 1> n L = 0n .
Proof. Pick j ∈ {1, . . . , n} and compute > (1> n L)j = (L 1n )j =
n X i=1
`ij = `jj +
n X
i=1,j6=i
`ij = dout (i) − din (i),
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 6.5. Graph connectivity and the rank of the Laplacian
73
where the last equality follows from `jj = dout (j) − ajj
and
n X
i=1,j6=i
`ij = −(din (i) − ajj ).
> In summary, we know that 1> n L = 0n if and only if Dout = Din .
Lemma 6.3 (Spectrum of the Laplacian matrix). Given a weighted digraph G with Laplacian L, the eigenvalues of L different from 0 have strictly-positive real part. P Proof. Recall `ii = nj=1,j6=i aij ≥ 0 and `ij = −aij ≤ 0 for i 6= j. By the Geršgorin Disks Theorem 2.9, we know that each eigenvalue of L belongs to at least one of the disks n n o  X z ∈ C |z − `ii | ≤ |`ij | = z ∈ C | |z − `ii | ≤ `ii . j=1,j6=i
`ii `jj
These disks, with radius equal to the center, contain the origin and complex numbers with positive real part.  For an undirected graph with symmetric adjacency matrix A = A> , therefore, L is symmetric and positive semidefinite, that is, all eigenvalues of L are real and nonnegative. By convention we write these eigenvalues as 0 = λ1 ≤ λ2 ≤ · · · ≤ λn . The second smallest eigenvalue λ2 is called the Fiedler eigenvalue or the algebraic connectivity (Fiedler 1973). Note that the theorem proof also implies λn ≤ 2 max{dout (1), . . . , dout (n)}.
6.5
Graph connectivity and the rank of the Laplacian
Theorem 6.4 (Rank of the Laplacian). Let L be the Laplacian matrix of a weighted digraph G with n nodes. Let d be the number of sinks in the condensation digraph of G. Then rank(L) = n − d. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 74
Chapter 6. The Laplacian Matrix
Early references for this theorem include (Agaev and Chebotarev 2000; Foster and Jacquez 1975), even though the proof here is independent. This theorem has the following immediate consequences: (i) a digraph G contains a globally reachable vertex if and only if rank(L) = n − 1 (also recall the properties of C(G) from Lemma 3.2); (ii) for the case of undirected graphs, we have the following two results: the rank of L is equal to n minus the number of connected components of G and an undirected graph G is connected if and only if λ2 > 0. ¯ by modifying G as Proof. We start by simplifying the problem. Define a new weighted digraph G follows: at each node, add a self-loop with unit weight if no self-loop is present, or increase the weight ¯ by modyfing of the self-loop by 1 if a self-loop is present. Also, define another weighted digraph G ¯ as follows: for each node, divide the weights of its out-going edges by its out-degree, so that the G −1 ¯ ¯ = L, and define A¯ = D ¯ out A out-degree of each node is 1. In other words, define A¯ = A + I and L −1 ¯ ¯ ¯ ¯ out L ¯ = I − A. Clearly, the rank of L is equal to the rank of L. Therefore, without loss of and L = D generality, we consider in what follows only digraphs with row-stochastic adjacency matrices. Because the condensation digraph C(G) has d sinks, after a renumbering of the nodes, that is, a permutation of rows and columns (see Exercise E3.1), the adjacency matrix A can be written in block lower tridiagonal form as  A11 0 0   0 A22 0   ..  0 . 0 A=  . . . .. ..  ..   .. ..  0 . . A1o A2o · · ·
··· .. . .. . .. .
0 .. .
0
..
.
0 .. .
..
.
0
0 Add 0 · · · Ado Aothers
      ∈ Rn×n .     
where the state vector x is correspondingly partitioned into the vectors x1 , . . . , xd and xothers of dimensions n1 , . . . , nd and n − (n1 + · · · + nd ) respectively, corresponding to the d sinks and all other nodes. Each sink of C(G) is a strongly connected and aperiodic digraph. Therefore, the square matrices A11 , . . . , Add are nonnegative, irreducible, and primitive. By the Perron–Frobenius Theorem for primitive matrices 2.16, we know that the number 1 is a simple eigenvalue for each of them. The square matrix Aothers is nonnegative and it can itself be written as a block lower triangular matrix, whose diagonal block matrices, say (Aothers )1 , . . . , (Aothers )N are nonnegative and irreducible. Moreover, each of these diagonal block matrices must be row-substochastic because (1) each row-sum for each of these matrices is at most 1, and (2) at least one of the row-sums of each of these matrices must be smaller than 1, otherwise that matrix would correspond to a sink of C(G). In summary, because the matrices (Aothers )1 , . . . , (Aothers )N are irreducible and row-substochastic, the matrix Aothers has spectral radius ρ(Aothers ) < 1. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 6.6. The algebraic connectivity, its eigenvector, and graph partitioning
75
We now write the Laplacian matrix L = In − A with the same block lower triangular structure:   L11 0 0 ··· 0 0   .. ..  0  . . L 0 0 22     . .. .. .. .  0  . . . 0 . , L= (6.5)  .  .. .. .. ..  ..  . . . . 0     . . .. .. 0  0 L 0  dd
−A1o −A2o · · · · · · −Ado Lothers
where, for example, L11 = In1 − A11 . Because the number 1 is a simple eigenvalue of A11 , the number 0 is a simple eigenvalue of L11 . Therefore, rank(L11 ) = n1 − 1. This same argument establishes that the rank of L is at most n − d because each one of the matrices L11 , . . . , Ldd is of rank n1 − 1, . . . , nd − 1, respectively. Finally, we note that the rank of Lothers is maximal, because Lothers = I − Aothers and ρ(Aothers ) < 1 together imply that 0 is not an eigenvalue for Lothers . 
6.6
The algebraic connectivity, its eigenvector, and graph partitioning
As shown before, the algebraic connectivity λ2 of an undirected and weighted graph G is positive if and only if G is connected. We build on this insight and show that the algebraic connectivity does not only provide a binary connectivity measure, but it also quantifies the “bottleneck” of the graph. To develop this intuition, we study the problem of community detection in a large-scale undirected graph. This problem arises, for example, when identifying group of friends in a social network by means of the interaction graph. We consider the specific problem of partitioning the vertices V of an undirected connected graph G in two sets V1 and V2 so that V1 ∪ V2 = V, V1 ∩ V2 = ∅, and V1 , V2 6= ∅. Of course, there are many such partitions. We measure the quality of a partition by the sum of the weights of all edges that need to be cut to separate the vertices V1 and V2 into two disconnected components. Formally, the size of the cut separating V1 and V2 is X J= aij . i∈V1 ,j∈V2
We are interested in finding the cut with minimal size that identifies the two groups of nodes that are most loosely connected. The problem of minimizing the cut size J is combinatorial and computationally hard since we need to consider all possible partitions of the vertex set V . We present here a tractable approach based on a “relaxation” step. First, define a vector x ∈ {−1, +1}n with entries xi = 1 for i ∈ V1 and xi = −1 for i ∈ V2 . Then the cut size J can be rewritten via the Laplacian potential as J=
n 1 X aij (xi − xj )2 = x> Lx 2 i,j=1
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 76
Chapter 6. The Laplacian Matrix
and the minimum cut size problem is: x> Lx.
minimize
x∈{−1,1}n \{−1n ,1n }
(Here we exclude the cases x = ±1n because they correspond to one of the two groups being empty.) Second, since this problem is still computationally hard, we relax the problem from binary decision variables xi ∈ {−1, +1} to continuous decision variables xi ∈ [−1, 1] (or kxk∞ ≤ 1), where we exclude x ∈ span(1n ) (corresponding to one of the two groups being empty). Then the minimization problem becomes minimize y > Ly. y∈Rn ,y⊥1n ,kyk∞ =1
As a third and final step, we consider a 2-norm constraint kyk2 = 1 instead of an ∞-norm constraint √ kyk∞ = 1 (recall that kyk∞ ≤ kyk2 ≤ nkyk∞ ) to obtain the following heuristic: minimize
y > Ly.
y∈Rn ,y⊥1n ,kyk2 =1
Notice that y > Ly ≥ λ2 kyk2 and this inequality is strict whenever y = v2 , the normalized eigenvector associated to λ2 . Thus, the unique minimum of the relaxed optimization problem is λ2 and the minimizer is y = v2 . We can then use as a heuristic x = sign(v2 ) to find the desired partition {V1 , V2 }. Hence, the algebraic connectivity λ2 is an estimate for the size of the minimum cut, and the signs of the entries of v2 identify the associated partition in the graph. For these reasons λ2 and v2 can be interpreted as the size and the location of a “bottleneck” in a graph. To illustrate the above concepts, we construct a randomly generated graph as follows. First, we partition n = 1000 nodes in two groups V1 and V2 of sizes 450 and 550 nodes, respectively. Second, we connect any pair of nodes in the set V1 (respectively V2 ) with probability 0.3 (respectively 0.2). Third and finally, any two nodes in distinct groups, i ∈ V1 and j ∈ V2 , are connected with a probability of 0.1. The sparsity pattern of the associated adjacency matrix is shown in the left panel of Figure 6.1. No obvious partition is visible at first glance since the indices are not necessarily sorted, that is, V1 is not necessarily {1, . . . , 450}. The second panel displays the entries of the eigenvector v2 sorted according to their magnitude showing a sharp transition between positive and negative entries. Finally, the third panel displays the correspondingly sorted adjacency matrix A˜ clearly indicating the partition V = V1 ∪ V2 . The Matlab code to generate Figure 6.1 can be found below. 1 2
% choose a graph size n = 1000;
3 4 5 6 7 8
% randomly assign the nodes to two grous x = randperm(n); group_size = 450; group1 = x(1:group_size); group2 = x(group_size+1:end);
9 10
% assign probabilities of connecting nodes Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 6.6. The algebraic connectivity, its eigenvector, and graph partitioning
77
Figure 6.1: The first panel shows a randomly-generated sparse adjacency matrix A for a graph with 1000 nodes. The second panel displays the eigenvector v˜2 which is identical to the normalized eigenvector v2 after sorting the ˜ entries according to their magnitude, and the third panel displays the correspondingly sorted adjacency matrix A.
11 12 13
p_group1 = 0.3; p_group2 = 0.2; p_between_groups = 0.1;
14 15 16 17 18 19
% construct adjacency matrix A(group1, group1) = rand(group_size,group_size) < p_group1; A(group2, group2) = rand(n−group_size,n−group_size) < p_group2; A(group1, group2) = rand(group_size, n−group_size) < p_between_groups; A = triu(A,1); A = A + A';
20 21 22 23
% can you see the groups? subplot(1,3,1); spy(A); xlabel('$A$', 'Interpreter','latex','FontSize',28);
24 25 26 27
% construct Laplacian and its spectrum L = diag(sum(A))−A; [V D] = eigs(L, 2, 'SA');
28 29 30 31
% plot the components of the algebraic connectivity sorted by magnitude subplot(1,3,2); plot(sort(V(:,2)), '.−'); xlabel('$\tilde v_2$', 'Interpreter','latex','FontSize',28);
32 33 34 35 36
% partition the matrix accordingly and spot the communities [ignore p] = sort(V(:,2)); subplot(1,3,3); spy(A(p,p)); xlabel('$\tilde A$', 'Interpreter','latex','FontSize',28);
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 78
Chapter 6. The Laplacian Matrix
6.7
Exercises
E6.1
The adjacency and Laplacian matrices for the complete graph. For any number n ∈ N, the complete graph with n nodes, denoted by K(n), is the undirected and unweighted graph in which any two distinct nodes are connected. For example, see K(6) in figure. Compute, for arbitrary n, (i) the adjacency matrix of K(n) and its eigenvalues; and (ii) the Laplacian matrix of K(n) and its eigenvalues.
E6.2
The adjacency and Laplacian matrices for the complete bipartite graph. A bipartite graph is a graph whose vertices can be divided into two disjoint sets U and V with the property that every edge connects a vertex in U to one in V . A complete bipartite graph is a bipartite graph in which every vertex of U is connected with every vertex of V . If U has n vertices and V has m vertices, for arbitrary n, m ∈ N, the resulting complete bipartite graph is denoted by K(n, m). For example, see K(1, 6) and K(3, 3) in figure. Compute, for arbitrary n and m, (i) the adjacency matrix of K(n, m) and its eigenvalues; and (ii) the Laplacian matrix of K(n, m) and its eigenvalues.
E6.3
The Laplacian matrix of an undirected graph is positive semidefinite. Give an alternative proof, without relying on the Geršgorin Disks Theorem 2.9, that the Laplacian matrix L of an undirected weighted graph is symmetric positive semidefinite. (Note that the proof of Lemma 6.3 relies on Geršgorin Disks Theorem 2.9).
E6.4
The Laplacian matrix of a weight-balanced digraph. Prove the following statements are equivalent: (i) the digraph G is weight-balanced, (ii) L + L> is positive semidefinite. Hint: Recall the proof of Lamma 6.3.
E6.5
A property of weight-balanced Laplacian matrices. Let L be the Laplacian matrix of a stronglyconnected weight-balanced digraph G. Show that, for x ∈ Rn , 
2 1 
> > λ2 (L + L> ) x − (1> x)1 n ≤ x (L + L )x, n n 2
where λ2 (L + L> ) is the smallest non-zero eigenvalue of L + L> . E6.6
The disagreement function in a directed graph (Gao et al. 2008). Recall that the quadratic form associated with a symmatric matrix B ∈ Rn×n is the function x 7→ x> Bx. Let G be a weighted digraph G with n nodes and define the quadratic disagreement function ΦG : Rn → R by ΦG (x) =
n 1 X aij (xj − xi )2 . 2 i,j=1
Show that: (i) ΦG is the quadratic form associated with the symmetric positive-semidefinite matrix P =
1 (Dout + Din − A − A> ), 2
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Exercises for Chapter 6 (ii) P = E6.7
1 2
79
 L + L(rev) , where the Laplacian of the reverse digraph is L(rev) = Din − A> .
The pseudoinverse Laplacian matrix. The Moore-Penrose pseudoinverse of an n × m matrix M is the unique m × n matrix M † with the following properties: (i) M M † M = M , (ii) M † M M † = M † , and (iii) M M † is symmetric and M † M is symmetric.
Assume L is the Laplacian matrix of a weighted connected undirected graph with n nodes. Let U ∈ Rn×n be an orthonormal matrix of eigenvectors of L such that  0 0  L = U .  .. 0
Show that
 0 0  (i) L† = U  .  .. 0
0 1/λ2 .. .
... ... .. .
0 0 .. .
... ... .. .
0 0 .. .
0
...
λn
  > U . 
  > U , 
. . . 1/λn 1 (ii) LL† = L† L = In − 1n 1> n , and n (iii) L† 1n = 0n . E6.8
0
0 λ2 .. .
The Green matrix of a Laplacian matrix. Assume L is the Laplacian matrix of a weighted connected undirected graph with n nodes. Show that (i) the matrix L + n1 1n 1> n is positive definite, (ii) the so-called Green matrix  −1 1 1 X = L + 1n 1> − 1n 1> n n n n
(E6.1)
is the unique solution to the system of equations: (
LX = In − n1 1n 1> n, > 1> X = 0 , n n
(iii) X = L† , where L† is defined in Exercise E6.7. In other words, the Green matrix formula (E6.1) is an alternative definition of the pseudoinverse Laplacian matrix. E6.9
Monotonicity of Laplacian eigenvalues. Consider a symmetric Laplacian matrix L ∈ Rn×n associated to a weighted and undirected graph G = {V, E, A}. Assume G is connected and let λ2 (G) > 0 be its algebraic connectivity, i.e., the second-smallest eigenvalue of L. Show that (i) λ2 (G) is a monotonically non-decreasing function of each weight aij , {i, j} ∈ E; and (ii) λ2 (G) is monotonically non-decreasing function in the edge set in the following sense: λ2 (G) ≤ λ2 (G0 ) for any graph G0 = (V, E 0 , A0 ) with E ⊂ E 0 and aij = a0ij for all {i, j} ∈ E.
Hint: Use the disagreement function.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 80
Chapter 6. The Laplacian Matrix
E6.10 Invertibility of principal minors of the Laplacian matrix. Consider a connected and undirected graph and an arbitrary partition of the node set V = V1 ∪ V2 . The associated symmetric and irreducible Laplacian matrix L ∈ Rn×n is partitioned accordingly as  L L = 11 L> 12
 L12 . L22
Show that the submatrices L11 ∈ R|V1 |×|V1 | and L22 ∈ R|V2 |×|V2 | are nonsingular. Hint: Look up the concept of irreducible diagonally-dominant matrices. E6.11 Gaussian elimination and Laplacian matrices. Consider an undirected and connected graph and its associated Laplacian matrix L ∈ Rn×n . Consider the associated linear Laplacian equation y = Lx, where x ∈ Rn is unknown and y ∈ Rn is a given vector. Verify that an elimination of xn from the last row of this equation yields the following reduced set of equations:     .. y1 −L1n /Lnn .   ..    .  .  . +  yn = . . . . . yn−1 −Ln−1,n /Lnn .. | {z } | =A 
.. . Lij −
Lin ·Ljn Lnn
.. . {z
=Lred
  .  x1  .  . . .   ..  , .. xn−1 . } ..
where the (i, j)-element of Lred is given by Lij − Lin · Ljn /Lnn . Show that the matrices A ∈ Rn−1×1 and L ∈ Rn−1×n−1 obtained after Gaussian elimination have the following properties: (i) A is nonnegative and column-stochastic matrix with at least one strictly positive element; and (ii) Lred is a symmetric and irreducible Laplacian matrix.
Hint: To show the irreducibility of Lred , verify the following property regarding the fill-in of the matrix Lred : The graph associated to the Laplacian Lred has an edge between nodes i and j if and only if (i) either {i, j} was an edge in the original graph associated to L, (ii) or {i, n} and {j, n} were edges in the original graph associated to L.
E6.12 The spectra of Laplacian and row-stochastic adjacency matrices. Consider a row-stochastic matrix A ∈ Rn×n . Let L be the Laplacian matrix of the digraph associated to A. Compute the spectrum of L as a function of the spectrum spec(A) of A. E6.13 Thomson’s principle and energy routing. Consider a connected and undirected resistive electrical network with n nodes, with external nodal current injections c ∈ Rn satisfying the balance condition 1> n c = 0, and with resistances Rij > 0 for every undirected edge {i, j} ∈ E. For simplicity, we set Rij = ∞ if there is no edge connecting i and j. As shown earlier in this chapter, Kirchhoff’s and Ohm’s laws lead to the network equations cinjected at i =
X
j∈N (i)
cj→i =
n X
j∈N (i)
1 (vi − vj ) , Rij
where vi is the potential at node i and cj→i = 1/Rij · (vi − vj ) is the current flow from node i to node j. Consider now a more general set of current flows fi→j (for all i, j ∈ Rn ) “routing energy through the network” and compatible with the following basic assumptions: (i) Skew-symmetry: fi→j = −fj→i for all i, j ∈ Rn ; (ii) Consistency: fi→j = 0 if {i, j} 6∈ E; P (iii) Conservation: cinjected at i = j∈N (i) fj→i for all i ∈ Rn .
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Exercises for Chapter 6
81
Show that among all possible current flows fi→j , the physical current flow fi→j = ci→j = 1/Rij · (vj − vi ) uniquely minimizes the energy dissipation: minimize
fi→j , i,j∈{1,...,n}
J=
n 1 X 2 Rij fi→j 2 i,j=1
subject to fi→j = −fj→i fi→j = 0
cinjected at i =
X
∀i, j ∈ Rn ,
∀{i, j} 6∈ E , fj→i
j∈N (i)
∀i ∈ Rn .
Hint: The solution requires knowledge of the Karush-Kuhn-Tucker (KKT) conditions for optimality; this is a classic topic in nonlinear constrained optimization discussed in numerous textbooks, e.g., in (Luenberger 1984). E6.14 Linear spring networks with loads. Consider the two (connected) spring networks with n moving masses in figure. For the right network, assume one of the masses is connected with a single stationary object with a spring. Refer to the left spring network as free and to the right network as grounded. Let Fload be a load force applied to the n moving masses.
x
x
For the left network, let Lfree,n be the n × n Laplacian matrix describing the free spring network among the n moving masses, as defined in Section 6.2. For the right network, let Lfree,n + 1 be the (n + 1) × (n + 1) Laplacian matrix for the spring network among the n masses and the stationary object. Let Lgrounded be the n × n grounded Laplacian of the n masses constructed by removing the row and column of Lfree,n + 1 corresponding to the stationary object. For the free spring network subject to Fload , (i) do equilibrium displacements exist for arbitrary loads? (ii) if the load force Fload is balanced in the sense that 1> n Fload = 0, is the resulting equilibrium displacement unique? (iii) compute the equilibrium displacement if unique, or the set of equilibrium displacements otherwise, assuming a balanced force profile is applied. For the grounded spring network, (iv) derive an expression relating Lgrounded to Lfree,n , (v) show that Lgrounded is invertible, (vi) compute the displacement for the “grounded” spring network for arbitrary load forces.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 82
Chapter 6. The Laplacian Matrix
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Chapter 7
Continuous-time Averaging Systems In this chapter we consider averaging algorithms in which the variables evolve in continuous time, instead of discrete time. Therefore we look at some interesting differential equations. We borrow ideas from (Mesbahi and Egerstedt 2010; Ren et al. 2007).
7.1
Example #1: Flocking behavior for a group of animals
We are interested in a continuous-time agreement phenomenon based on the simple “alignment rule” for each agent to steer towards the average heading of its neighbors; see Figure 7.1.
Figure 7.1: Alignment rule: the left fish rotates clockwise to align itself with the average heading of its neighbors.
This alignment rule amounts to a “spring-like” force, described as follows:   if ith agent has one neighbor (θj − θi ), ˙θi = 1 (θj − θi ) + 1 (θj − θi ), if ith agent has two neighbors 1 2 2 2  1 1 if ith agent has m neighbors m (θj1 − θi ) + · · · + m (θjm − θi ),  = average {θj , for all neighbors j} − θi .
This interaction law can be written as
θ˙ = −Lθ
where L is the Laplacian of an appropriate weighted digraph G: each bird is a node and each directed edge (i, j) has weight 1/dout (i). Here it is useful to recall the interpretation of −(Lx)i as a force perceived by node i in a network of springs. 83
 84
Chapter 7. Continuous-time Averaging Systems
Note: it is weird (i.e., mathematically ill-posed) to compute averages on a circle, but let us not worry about it for now. Note: this incomplete model does not concern itself with positions in any way. Hence, (1) there is no discussion about collision avoidance and formation/cohesion maintenance. Moreover, (2) the graph G should be really state dependent. For example, we may assume that two birds see each other and interact if and only if their pairwise Euclidean distance is below a certain threshold.
Figure 7.2: Many animal species exhibit flocking behaviors that arise from decentralized interactions. On the left: pacific threadfins (Polydactylus sexfilis); public domain image from the U.S. National Oceanic and Atmospheric Administration. On the right: flock of snow geese (Chen caerulescens); public domain image from the U.S. Fish and Wildlife Service.
7.2
Example #2: A simple RC circuit
Consider an electrical network with only pure resistors and with pure capacitors connecting each node to ground; this example is taken from (Mesbahi and Egerstedt 2010; Ren et al. 2007). From the previous chapter, we know the vector of injected currents cinjected and the vector of voltages at the nodes v satisfy cinjected = L v, where L is the Laplacian for the graph with coefficients aij = 1/rij . Additionally, assuming Ci is the capacitance at node i, and keeping proper track of the current into each capacitor, we have Ci
d vi = −cinjected at i dt
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 7.3. Continuous-time linear systems and their convergence properties
85
so that, with the shorthand C = diag(C1 , . . . , Cn ), d v = −C −1 L v. dt Note: C −1 L is again a Laplacian matrix (for a directed weighted graph). Note: it is physically intuitive that after some transient all nodes will have the same potential. This intuition will be proved later in the chapter.
7.3
Continuous-time linear systems and their convergence properties
In Section 2.1 we presented discrete-time linear systems and their convergence properties; here we present their continuous-time analog. A continuous-time linear system is x(t) ˙ = Ax(t).
(7.1)
Its solution t 7→ x(t), t ∈ R≥0 from an initial confition x(0) satisfies x(t) = eAt x(0), where the matrix exponential of a square matrix A is defined by eA =
∞ X 1 k A . k! k=0
The matrix exponential is a remarkable operation with numerous properties; we ask the reader to review a few basic ones in Exercise E7.1. A matrix A ∈ Rn×n is (i) continuous-time semi-convergent if limt→+∞ eAt exists, and (ii) continuous-time convergent (Hurwitz) if it is continuous-time semi-convergent and limt→+∞ eAt = 0n×n . The spectral abscissa of a square matrix A is the maximum of the real parts of the eigenvalues of A, that is, µ(A) = max{<(λ) | λ ∈ spec(A)}. Theorem 7.1 (Convergence and spectral abscissa). For a square matrix A, the following statements hold: (i) A is continuous-time convergent (Hurwitz) ⇐⇒ µ(A) < 0,
(ii) A is semi-convergent ⇐⇒ µ(A) ≤ 0, no eigenvalue has zero real part other than possibly the number 0, and if 0 is an eigenvalue, then it is semisimple. We leave the proof of this theorem to the reader and mention that most required steps are similar to the dicussion in Section 2.1 and are discussed later in this chapter. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 86
7.4
Chapter 7. Continuous-time Averaging Systems
The Laplacian flow
Let G be a weighted directed graph with n nodes and Laplacian matrix L. The Laplacian flow on Rn is the dynamics (7.2)
x˙ = −Lx, or, equivalently in components, x˙ i =
n X j=1
aij (xj − xi ) =
X
j∈N out (i)
aij (xj − xi ).
Lemma 7.2 (Equilibrium points). If G contains a globally reachable node, then the only equilibrium points of the Laplacian flow (7.2) are: x = α1n , for some α ∈ R Proof. A point x is an equilibrium for the Laplacian flow if Lx = 0n . Hence, any point in the kernel of the matrix L is an equilibrium. From Theorem 6.4, if G contains a globally reachable node, then rank(L) = n − 1. Hence, the dimension of the kernel space is 1. The lemma follows by recalling that L1n = 0n .  In what follows, we are interested in characterizing the evolution of the Laplacian flow (7.2). To build some intuition, let us first consider an undirected graph G and write the modal decomposition of the solution as we did in Section 2.1 for a discrete-time linear system. We proceed in two steps. First, because G is undirected, the matrix L is symmetric and has real eigenvalues 0 = λ1 ≤ λ2 ≤ · · · ≤ λn with corresponding orthonormal (i.e., orthogonal and unit-length) eigenvectors v1 , . . . , vn . Define xi (t) = vi> x(t) and left-multiply x˙ = −Lx by vi : d xi (t) = −λi xi (t), dt
xi (0) = vi> x(0).
These n decoupled ordinary differential equations are immediately solved to give x(t) = x1 (t)v1 + x2 (t)v2 + · · · + xn (t)vn
= e−λ1 t (v1> x(0))v1 + e−λ2 t (v2> x(0))v2 + · · · + e−λn t (vn> x(0))vn .
√ Second, recall that λ1 = 0 and v1 = 1n / n because L is a Laplacian matrix (L1n = 0n ). Therefore, we compute (v1> x(0))v1 = average(x(0))1n and substitute x(t) = average(x(0))1n + e−λ2 t (v2> x(0))v2 + · · · + e−λn t (vn> x(0))vn . Now, let us assume that G is connected so that its second smallest eigenvalue λ2 is strictly positive. In this case, we can infer that lim x(t) = average(x(0))1n ,
t→∞
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 7.4. The Laplacian flow
87
or, defining a disagreement vector δ(t) = x(t) − average(x(0))1n , we infer δ(t) = e−λ2 t (v2> x(0))v2 + · · · + e−λn t (vn> x(0))vn . In summary, we discovered that, for a connected undirected graph, the disagreement vector converges to zero with an exponential rate λ2 . In what follows, we state a more convergence to consensus result for the continuous-time Laplacian flow. This result is parallel to Theorem 5.2. Theorem 7.3 (Consensus for Laplacian matrices with globally reachable node). If a Laplacian matrix L has associated digraph G with a globally reachable node, then (i) the eigenvalue 0 of −L is simple and all other eigenvalues of −L have negative real part,
(ii) limt→∞ e−Lt = 1n w> , where w ≥ 0 is the left eigenvector of L with eigenvalue 0 satisfying w1 + · · · + wn = 1, (iii) wi > 0 if and only if node i is globally reachable. Accordingly, wi = 0 if and only if node i is not globally reachable, (iv) the solution to
d dt x(t)
= −Lx(t) satisfies
 lim x(t) = w> x(0) 1n ,
t→∞
> (v) if additionally G is weight-balanced, then G is strongly connected, 1> n L = 0n and w = 1 > n 1n 1n = 1) so that  1> x(0) 1n = average x(0) 1n . lim x(t) = n t→∞ n
1 n 1n
(because
Note: as a corollary to the statement (iii), the left eigenvector w ∈ Rn associated to the 0 eigenvalue has strictly positive entries if and only if G is strongly connected. Proof. Because the associated digraph has a globally reachable node, from the previous chapter we know a few properties of L: L1n = 0n , L has rank n − 1, and all eigenvalues of L have nonnegative real part. Therefore, we immediately conclude that 0 is a simple eigenvalue with right eigenvector 1n and that all other eigenvalues of L have negative real part. This concludes the proof of (i). In what follows we let w denote the left eigenvector associated to the eigenvalues 0, that is, w> L = 0> n , normalized so that 1> w = 1. n To prove statement (ii), we proceed in three steps. First, we write the Laplacian matrix in its Jordan normal form:   0 0 ··· 0   0 J2 . . . 0  −1 −1  P , L = P JP = P  . (7.3)  . . . . . . . . 0 0 ···
0
Jm
where m ≤ n is the number of Jordan blocks, the first block is the scalar 0 (being the only eigenvalue we know), the other Jordan blocks J2 , . . . , Jm (unique up to re-ordering) are associated with eigenvalues Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 88
Chapter 7. Continuous-time Averaging Systems
with strictly positive real part, and where the columns of P are the generalized eigenvectors of L (unique up to rescaling). Second, using some properties from Exercise E7.1, we compute the limit as t → ∞ of e−Lt = P e−Jt P −1 as   1 0 ··· 0   0 0 . . . 0 −1 −Lt −Jt −1 −1   P = (P e1 )(e> lim e = P lim e P = P . ) = c1 r1 , 1P  . . t→∞ t→∞ . . . . . . 0 0 ··· 0 0
where c1 is the first column of P and r1 is the first row of P −1 . The contributions of the Jordan blocks J2 , . . . , Jm vanish because their eigenvalues have negative real part; e.g., for more details see (Hespanha 2009). Third and final, we characterize c1 and r1 . By definition, the first column of P (unique up to rescaling) is a right eigenvector of the eigenvalue 0 for the matrix L, that is, c1 = α1n for some scalar α since we know L1n = 0n . Of course, it is convenient to define c1 = 1n . Next, equation (7.3) can > be rewritten as P −1 L = JP −1 , whose first row is r1 L = 0> n . This equality implies r1 = βw for −1 > some scalar β. Finally, we note that P P = In implies r1 c1 = 1, that is, βw 1n = 1. Since we know w> 1n = 1, we infer that β = 1 and that r1 = w> . This concludes the proof of statement (ii). Next, we prove statement (iii). Pick a positive constant ε < 1/dmax , where the maximum out-degree is dmax = max{dout (1), . . . , dout (n)}. Define B = In − εL. It is easy to show that B is nonnegative, > > row-stochastic, and has strictly positive diagonal elements. Moreover, w> L = 0> n implies w B = w so that w is the left eigenvector with unit eigenvalue for B. Now, note that the digraph G(L) associated to L (without self-loops) is identical to the digraph G(B) associated to B, except for the fact that B has self-loops at each node. By assumption G(L) has a globally reachable node and therefore so does G(B), where the subgraph induced by the set of globally reachable nodes is aperiodic (due to the self-loops). Therefore, statement (iii) is now an immediate transcription of the same statement for row-stochastic matrices established in Theorem 5.2 (statement (iii)). Statements (iv) and (v) are straightforward. 
7.5
Design of weight-balanced digraphs from strongly-connected
Problem: Given a directed graph G that is strongly connected, but not weight-balanced, how do we choose the weights in order to obtain a weight-balanced digraph and a Laplacian satisfying 1> n L = 0n ? (Note that an undirected graph is automatically weight-balanced.) Answer: As usual, let w > 0 be the left eigenvector of L with eigenvalue 0 satisfying w1 +· · ·+wn = 1. In other words, w is a vector of convex combination coefficients, and the Laplacian L satisfies L1n = 0n ,
and
w> L = 0> n.
Following (Ren et al. 2007), define a new matrix: Lrescaled = diag(w)L. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 7.6. Distributed optimization using the Laplacian flow
89
It is immediate to see that Lrescaled 1n = diag(w)L1n = 0n ,
> > > 1> n Lrescaled = 1n diag(w)L = w L = 0n
Note that: • Lrescaled is again a Laplacian matrix because (i) its row-sums are zero, (ii) its diagonal entries are positive, and (iii) its non-diagonal entries are nonpositive; • Lrescaled is the Laplacian matrix for a new digraph Grescaled with the same nodes and directed edges as G, but whose weights are rescaled as follows: aij 7→ wi aij . In other words, the weight of each out-edge of node i is rescaled by wi .
7.6
Distributed optimization using the Laplacian flow
In the following, we present a computational application of the Laplacian flow in distributed optimization. The materials in this section are inspired by (Cherukuri and Cortés 2015; Droge et al. 2013; Gharesifard and Cortes 2014; Wang and Elia 2010), and we present them here in a self-contained way. As only preliminaries notions, we introduce the following two definitions: A function f : Rn → R is said be convex if f (αx + βy) ≤ αf (x) + βf (y) for all x and y in Rn and for all convex combination coefficients α and β, i.e., coefficients satisfying α, β ≥ 0 and α + β = 1. A function is said to be strictly convex if the previous inequality holds strictly. Consider a network of n processors that can perform local computation and communicate with another. The communication architecture is modeled by an undirected, connected, and weighted graph with n nodes and symmetric Laplacian L = L> ∈ Rn×n . The objective of the processor network is to solve the optimization problem minimize x∈R f (x) =
n X
fi (x),
(7.4)
i=1
where fi : R → R is a strictly convex and twice continuously differentiable cost function known only to processor i ∈ {1, . . . , n}. In a centralized setup, the decision variable x is globally available and the minimizers x∗ ∈ R of the optimization problem (7.4) can be found by solving for the critical points of f (x) n X ∂ ∂ f (x) = fi (x). 0n = ∂x ∂x i=1
A centralized continuous-time algorithm converging to the set of critical points is the negative gradient flow ∂ x˙ = − f (x) . ∂x To find a distributed approach to solving the optimization problem (7.4), we associate a local estimate yi ∈ R of the global variable x ∈ R to every processor and solve the equivalent problem minimize y∈Rn
f˜(y) =
n X i=1
1 fi (yi ) + y > Ly 2
subject to Ly = 0n ,
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
(7.5)
 90
Chapter 7. Continuous-time Averaging Systems
where the consistency constraint Ly = 0 assures that yi = yj for all i, j ∈ {1, . . . , n}, that is, the local estimates of all processors coincide. We also augmented the cost function with the term y > Ly, which clearly has no effect on the minimizers of (7.5) (due to the consistency constraint), but it provides supplementary damping and favorable convergence properties for our algorithm. The minimizers of the optimizatio problems (7.4) and (7.5) are then related by y ∗ = x∗ 1n . Without any further motivation, consider the function L : Rn × Rn → R given by 1 L(y, z) = f˜(y) + y > Ly + z > Ly. 2 In the literature on convex optimization this function is known as (augmented) Lagrangian function and z ∈ Rn is referred to as Lagrange multiplier. What is important for us is that the Lagrangian function is strictly convex in y and linear (and hence concave) in z. Hence, the augmented Lagrangian function admits a set of saddle points (y ∗ , z ∗ ) ∈ Rn × Rn satisfying L(y ∗ , z) ≤ L(y ∗ , z ∗ ) ≤ L(y, z ∗ ). Since L(y, z) is differentiable in y and z, the saddle points can be obtained as solutions to the equations ∂ ∂ ˜ L(y, z) = f (y) + Ly + Lz, ∂y ∂y ∂ L(y, z) = Ly. 0n = ∂z
0n =
Our motivation for introducing the Lagrangian is the the following lemma. Lemma 7.4 (Properties of saddle points). Let L = L> ∈ Rn×n be a symmetric Laplacian associated to an undirected, connected, and weighted graph, and consider the Lagrangian function L, where each fi is strictly convex and twice continuously differentiable for all i ∈ {1, . . . , n}. Then (i) if (y ∗ , z ∗ ) ∈ Rn × Rn is a saddle point of L, then so is (y ∗ , z ∗ + α1n ) for any α ∈ R; (ii) if (y ∗ , z ∗ ) ∈ Rn × Rn is a saddle point of L, then y ∗ = x∗ 1n where x∗ ∈ R is a solution of the original optimization problem (7.4); and (iii) if x∗ ∈ R is a solution of the original optimization problem (7.4), then there are z ∗ ∈ Rn and y ∗ = x∗ 1n ∂ ˜ ∗ satisfying Lz ∗ + ∂y f (y ) = 0n so that (y ∗ , z ∗ ) is a saddle point of L. We leave the proof to the reader in Exercise E7.10. Since the Lagrangian function is convex in y and concave in z, we can compute its saddle points by following the so-called saddle-point dynamics, consisting of a positive and negative gradient: ∂ ∂ L(y, z) = − f˜(y) − Ly − Lz, ∂y ∂y ∂ z˙ = L(y, z) = Ly. ∂z
y˙ = −
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
(7.6a) (7.6b)
 7.6. Distributed optimization using the Laplacian flow
91
For processor i ∈ {1, . . . , n}, the saddle-point dynamics (7.6) read component-wise as y˙ i = − z˙i =
n
n
j=1
j=1
X X ∂ fi (yi ) − aij (yi − yj ) − aij (zi − zj ), ∂yi
∂ L(y, z) = ∂zi
n X j=1
aij (yi − yj ).
Hence, the saddle-point dynamics can be implement in a distributed processor network using only local knowledge of fi (yi ), local computation, nearest-neighbor communication and—of course—after discretizing the continuous-time dynamics. As shown in (Cherukuri and Cortés 2015; Droge et al. 2013; Gharesifard and Cortes 2014; Wang and Elia 2010), this distributed optimization setup is very versatile and robust and extends to directed graphs and non-differentiable convex objective functions. We will later establish using a powerful tool termed LaSalle Invariance Principle to show that the saddle-point dynamics (7.6) always converge to the set of saddle points; see Exercise E13.2. For now we restrict our analysis to the case of quadratic cost functions fi (x) = (x − x∗i )> Pi (x − x∗i ), where Pi > 0 and x∗i ∈ R. In this case, the saddle-point dynamics (7.6) are a linear system      −P − L −L y˜ y˜˙ , (7.7) = z L 0 z˙ | {z } =A
where y˜ = y − x∗ 1n and P = diag({Pi }i∈{1,...,n} ). The matrix A is a so-called saddle matrix (Benzi et al. 2005). We will in the following establish the convergence of the the dynamics (7.7) to the set of saddle points. First, observe that 0 is an eigenvalue of A with multiplicity 1 and the corresponding eigenvector,   > > corresponds to the set of saddle points: given by 0> n 1n      0n −P − L −L y˜ = =⇒ (P + L)˜ y + Lz = 0n and L˜ y = 0n =⇒ y˜ ∈ span(1n ) 0n L 0 z =⇒
by multiplying (P + L)˜ y + Lz = 0n by y˜> : y˜> P y˜ = 0n
=⇒ y˜ = 0n and z = 1n .   Next, note that 12 (A + A> ) = −P0−L 00 is negative semidefinite. It follows by a Lyapunov or standard linear algebra result (Bernstein 2009, Fact 5.10.28) that all eigenvalues of A have real part less or equal than zero. Since there is a unique zero eigenvalue associated with the set of saddle points, it remains to show that the matrix A has no purely imaginary eigenvalues. This is established in the following lemma whose proof is left to the reader in Exercise E7.11: Lemma 7.5 (Absence of sustained oscillations in saddle matrices). Consider a negative semidefinite matrix B ∈ Rn×n and a not necessarily square matrix C ∈ Rn×m . If kernel(B) ∩ image(C) = {0n }, then the composite block-matrix   B C A= −C > 0 has no eigenvalues on the imaginary axis except for 0.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 92
Chapter 7. Continuous-time Averaging Systems
 > It follows that the saddle point dynamics (7.7) converge to the set of saddle points y˜> z > ∈    > > . Since 1> z˙ = 0, it follows that average(z(t)) = average(z ), we can further conclude span 0> 0 n 1n n that the dynamics converge to a unique saddle point satisfying limt→∞ y(t) = x∗ 1n and limt→∞ z(t) = z0 1n .
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 7.7. Exercises
7.7 E7.1
93
Exercises P∞ 1 k Properties of the matrix exponential. Recall the definition of eA = k=0 k! A for any square matrix A. Complete the following tasks: P∞ 1 k (i) show that k=0 k! A converges absolutely for all square matrices A, P∞ P∞ Hint: Recall that a matrix series k=1 Ak is said to converge absolutely if k=1 kAk k converges, where k · k is a matrix norm. Introduce a sub-multiplicative matrix norm k · k and show k eA k ≤ ekAk . (ii) show that, if A = diag(a1 , . . . , an ), then eA = diag(ea1 , . . . ean ), −1
E7.2
(iii) show that eT AT = T eA T −1 for any invertible T , (iv) give an example of matrices A and B such that eAB 6= eBA , and (v) compute the matrix exponential of etJ where J is a Jordan block of arbitrary size and t ∈ R.
Continuous-time affine systems. Given A ∈ Rn×n and b ∈ Rn , consider the continuous-time affine systems x(t) ˙ = Ax(t) + b. Assume A is Hurwitz and, similarly to Exercise E2.10, show that
E7.3
(i) the matrix A is invertible, (ii) the only equilibrium point of the system is −A−1 b, and (iii) limt→∞ x(t) = −A−1 b for all initial conditions x(0) ∈ Rn .
The matrix exponential of a Laplacian matrix. Let L be a Laplacian matrix. Show that, for all t > 0, (i) the matrix exp(−Lt) has unit row-sums; (ii) the matrix exp(−Lt) is nonnegative and has strictly-positive diagonal entries; and Hint: Recall that exp(A + B) = exp(A) exp(B) if AB = BA and that exp(aIn ) = ea In . (iii) each solution to the Laplacian flow x˙ = −Lx is bounded; (iv) in the weight-balanced case when L has zero column sums, the matrix exp(−Lt) has unit columnsums.
E7.4
Euler discretization of the Laplacian. Given a weighted digraph G with Laplacian matrix L and maximum out-degree dmax = max{dout (1), . . . , dout (n)}. Show that: (i) if ε < 1/dmax , then the matrix In − εL is row-stochastic, (ii) if ε < 1/dmax and G is weight-balanced, then the matrix In − εL is doubly-stochastic, and (iii) if ε < 1/dmax and G is strongly connected, then In − εL is primitive.
Given these results, note that (no additional assignment in what follows)
• In − εL is the one-step Euler discretization of the continuous-time Laplacian flow and is a discretetime consensus algorithm; and • In − εL is a possible choice of weights for an undirected unweighted graph (which is therefore also weight-balanced) in the design of a doubly-stochastic matrix (as we did in the discussion about Metropolis-Hastings). E7.5
Doubly-stochastic matrices on strongly-connected digraphs. Given a strongly-connected unweighted digraph G, design weights along the edges of G (and possibly add self-loops) so that the weighted adjacency matrix is doubly-stochastic.
E7.6
Constants of motion. In the study of mechanics, energy and momentum are two constants of motion, that is, these quantities are constant along each evolution of the mechanical system. Show that Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 94
Chapter 7. Continuous-time Averaging Systems (i) If A is a row stochastic matrix with w> A = w> , then w> x(k) = w> x(0) for all times k ∈ Z≥0 where x(k + 1) = Ax(k). > > (ii) If L is a Laplacian matrix with with w> L = 0> n , then w x(t) = w x(0) for all times t ∈ R≥0 where x(t) ˙ = −Lx(t).
E7.7
Weight-balanced digraphs with a globally reachable node. Given a weighted directed graph G, show that if G is weight-balanced and has a globally reachable node, then G is strongly connected.
E7.8
The Lyapunov equation for the Laplacian matrix of a strongly-connected digraph. Let L be the Laplacian matrix of a strongly-connected weighted digraph. Find a positive-definite matrix P such that (i) P L + L> P is positive semidefinite, and (ii) (P L + L> P )1n = 0n .
E7.9
H2 performance of balanced averaging in continuous time. Consider the continuous-time averaging dynamics with disturbance x(t) ˙ = −Lx(t) + w(t),
where L = L> is the Laplacian matrix of an undirected and connected graph and w(t) is an exogenous disturbance input signal. Pick a matrix Q ∈ Rp×n satisfying Q1n = 0p and define the output signal y(t) = Qx(t) ∈ Rp as the solution from zero initial conditions x(0) = 0n . Define the system H2 norm from w to y by Z ∞  Z ∞ Z ∞ 2 > > > > kHk2 = y(t) y(t)dt = x(t) Q Qx(t)dt = trace H(t) H(t)dt , 0
0
0
where H(t) = Qe−Lt is the so-called impulse response matrix. p (i) Show kHk2 = trace(P ), where P is the solution to the Lyapunov equation LP + P L = Q> Q.
(E7.1)
p (ii) Show kHk2 = trace (L† Q> Q) /2, where L† is the pseudoinverse of L. 1 > > (iii) Define short-range and long-range output matrices Qsr and Qlr by Q> sr Qsr = L and Qlr Qlr = In − n 1n 1n , respectively. Show:   for Q = Qsr , n − 1, n 2 X 1 kHk2 =  , for Q = Qlr .  λ (L) i i=2
Hint: The H2 norm has several interesting interpretations, including the total output signal energy in response to a unit impulse input or the root mean square of the output signal in response to a white noise input with identity covariance. You may find useful Theorem 7.3 and Exercise E6.7. E7.10 Properties of saddle points. Prove Lemma 7.4. E7.11 Absence of sustained oscillations in saddle matrices. Prove Lemma 7.5.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Chapter 8
The Incidence Matrix and Relative Measurements After studying adjacency and Laplacian matrices, in this chapter we introduce one final matrix associated with a graph: the incidence matrix. We study the properties of incidence matrices and their application to a class of estimation problems with relative measurements. For simplicity we restrict our attention to undirected graphs. We borrow ideas from (Barooah 2007; Barooah and Hespanha 2007; Bolognani et al. 2010; Piovan et al. 2013) and refer to Biggs (1994); Foulds (1995); Godsil and Royle (2001) for more information.
8.1
The incidence matrix
Let G be an undirected unweighted graph with n nodes and m edges. Number the edges of G with a unique e ∈ {1, . . . , m} and assign an arbitrary direction to each edge. The (oriented) incidence matrix B ∈ Rn×m of the graph G is defined component-wise by   +1, if node i is the source node of edge e, Bie = −1, if node i is the sink node of edge e, (8.1)   0, otherwise.
> Note: 1> n B = 0m since each column of B contains precisely one element equal to +1, one element equal to −1 and all other zeros. Note: assume the edge e ∈ {1, . . . , m} is oriented from i to j, then for any x ∈ Rn ,
(B > x)e = xi − xj .
Example 8.1. Consider the following graph depicted in figure.
3
e4
3
4
e3
e2 1
1
2 95
e1
2
4
 96
Chapter 8. The Incidence Matrix and Relative Measurements
As depicted on the right, we add an orientation to all edges, we order them and label them as follows: e1 = (1, 2), e2 = (2, 3), e3 = (4, 2), and e4 = (3, 4). Accordingly, the incidence matrix is   +1 0 0 0 −1 +1 −1 0   B=  0 −1 0 +1 . 0 0 +1 −1
8.2
Properties of the incidence matrix
Given an undirected weighted graph G with edge set E and adjacency matrix A, recall L = D − A,
where D is the degree matrix.
Lemma 8.2 (From the incidence to the Laplacian matrix). If diag({ae }e∈E ) is the diagonal matrix of edge weights, then L = B diag({ae }e∈E )B > . P Proof. Recall that, for matrices O, P and Q of appropriate dimensions, we have (OP Q)ij = k,h Oik Pkh Qhj . P Moreover, if the matrix P is diagonal, then (OP Q)ij = k Oik Pkk Qkj . For i 6= j, we compute (B diag({ae }e∈E )B > )ij = =
Xm
Xe=1 m e=1
Bie ae (B > )ej Bie Bje ae
(e-th term = 0 unless e is oriented {i, j})
= (+1) · (−1) · aij = `ij , where L = {`ij }i,j∈{1,...,n} , and along the diagonal of B we compute (B diag({ae }e∈E )B > )ii =
Xm
e=1
2 Bie ae =
m X
e=1, e=(i,∗) or e=(∗,i)
ae =
n X
aij .
j=1
 Lemma 8.3 (Rank of the incidence matrix). Let B be the incidence matrix of an undirected graph G with n nodes. Let d be the number of connected components of G. Then rank(B) = n − d. Proof. We prove this result for a connected graph with d = 1, but the proof strategy easily extends to d > 1. Recall that the rank of the Laplacian matrix L equals n − d = n − 1. Since the Laplacian matrix can be factorized as L = B diag({ae }e∈E )B > , where diag({ae }e∈E ) has full rank m (and m ≥ n − 1 due to connectivity), we have that necessarily rank(B) ≥ n − 1. On the other hand rank(B) ≤ n − 1 since B > 1n = 0n . It follows that B has rank n − 1.  Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 8.3. Distributed estimation from relative measurements
97
The factorization of the Laplacian matrix as L = B diag({ae }e∈E )B > plays an important role of relative sensing networks. For example, we can decompose, the Laplacian flow x˙ = −Lx into open-loop plant: measurements: control gains: control inputs:
i ∈ {1, . . . , n} , or
x˙ i = ui , yij = xi − xj ,
{i, j} ∈ E ,
{i, j} ∈ E ,
zij = aij yij , X ui = − zij ,
or
y = B>x ,
or
z = diag({ae }e∈E )y ,
i ∈ {1, . . . , n} , or
{i,j}∈E
x˙ = u ,
u = Bz .
Indeed, this control structure, illustrated as a block-diagram in Figure 8.1, is required to implement flocking-type behavior as in Example 7.1. The control structure in Figure 8.1 has emerged as a canonical control structure in many relative sensing and flow network problems also for more complicated open-loop dynamics and possibly nonlinear control gains (Bai et al. 2011).
.. u
_
.
x
x˙ i = ui
..
. BT
B
.. z
.
aij
..
.
y
Figure 8.1: Illustration of the canonical control structure for a relative sensing network.
8.3
Distributed estimation from relative measurements
In Chapter 1 we considered estimation problems for wireless sensor networks in which each node measures a scalar “absolute” quantity (expressing some environmental variable such as temperature, vibrations, etc). In this section, we consider a second class of examples in which meaurements are “relative,” i.e., pairs of nodes measure the difference between their corresponding variables. Estimation problems involving relative measurements are numerous. For example, imagine a group of robots (or sensors) where no robot can sense its position in an absolute reference frame, but a robot can measure other robot’s relative positions by means of on-board sensors. Similar problems arise in study of clock synchronization in networks of processors.
8.3.1
Problem statement
The optimal estimation based on relative measurement problem is stated as follows. As illustrated in Figure 8.2, we are given an undirected graph G = ({1, . . . , n}, E) with the following properties. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 98
Chapter 8. The Incidence Matrix and Relative Measurements
First, each node i ∈ {1, . . . , n} of the network is associated with an unknown scalar quantity xi (the x-coordinate of node i in figure). Second, the m undirected edges are given an orientation and, for each absolute reference frame
x xi
xj
xj
xi Figure 8.2: A wireless sensor network in which sensors can measure each other’s relative distance and bearing. We assume that, for each link between node i and node j, the relative distance along the x-axis xi − xj is available, where xi is the x-coordinate of node i.
edge e = (i, j), e ∈ E, the following scalar measurements are available: y(i,j) = xi − xj + v(i,j) = (B > x)e + v(i,j) , where B is the graph incidence matrix and the measurement noises v(i,j) , (i, j) ∈ E, are independent 2 ] = σ2 jointly-Gaussian variables with zero-mean E[v(i,j) ] = 0 and variance E[v(i,j) (i,j) > 0. The joint 2 m×m matrix covariance is the diagonal matrix Σ = diag({σ(i,j) }(i,j)∈E ) ∈ R . ∗ n The optimal estimate x b of the unknown vector x ∈ R via the relative measurements y ∈ Rm is the solution to min kB > x b − yk2Σ−1 . x b
Since no absolute information is available about x, we add the additional constraint that the optimal estimate should have zero mean and summarize this discussion as follows. Definition 8.4 (Optimal estimation based on relative measurements). Given an incidence matrix B, a set of relative measurements y with covariance Σ, find x b satisfying min kB > x b − yk2Σ−1 .
x b⊥1n
8.3.2
(8.2)
Optimal estimation via centralized computation
From the theory of least square estimation, the optimal solution to problem 8.2 is obtained as by differentiating the quadratic cost function with respect to the unknown variable x b and setting the derivative to zero. Specifically: 0=
∂ kB > x b − yk2Σ−1 = 2BΣ−1 B > x b∗ − 2BΣ−1 y. ∂b x
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 8.3. Distributed estimation from relative measurements
99
The optimal solution is therefore obtained as the unique vector x b∗ ∈ Rn satisfying BΣ−1 B > x b∗ = BΣ−1 y
⇐⇒
Lb x∗ = BΣ−1 y,
1> b∗ = 0, nx
(8.3)
where the Laplacian matrix L is defined by L = BΣ−1 B > . This matrix is the Laplacian for the weighted graph whose weights are the noise covariances associated to each relative measurement edge. Before proceeding we review the definition and properties of the pseudoinverse Laplacian matrix given in Exercise E6.7. Recall that the Moore-Penrose pseudoinverse of an n × m matrix M is the unique m × n matrix M † with the following properties: (i) M M † M = M , (ii) M † M M † = M † , and (iii) M M † is symmetric and M † M is symmetric. For our Laplacian matrix L, let U that  0 0 ... 0 λ2 . . .  L = U . . . ..  .. .. 0
0
∈ Rn×n be an orthonormal matrix of eigenvectors of L. It is known  0 0  > ..  U . 
=⇒
. . . λn
Moreover, it is known that LL† = L† L = In −
  0 0 ... 0 0 1/λ2 . . . 0    > L† = U  . . ..  U . . . . . . . . .  0 0 . . . 1/λn
1 † 1n 1> n and L 1n = 0n . n
Lemma 8.5 (Unique optimal estimate). If the undirected graph G is connected, then (i) there exists a unique solution to equations (8.3) solving the optimization problem in equation (8.2); and (ii) this unique solution is given by x b∗ = L† BΣ−1 y.
Proof. We claim there exists a unique solution to equation (8.3) and prove it as follows. Since G is connected, the rank of L is n − 1. Moreover, since L is symmetric and since L1n = 0n , the image of L is the (n − 1)-dimensional vector subspace orthogonal to the subspace spanned by the vector 1n . The > vector BΣ−1 y belongs to the image of L because the column-sums of B are zero, that is, 1> n B = 0n , so > −1 > > ∗ ∗ that 1n BΣ y = 0n . Finally, the requirement that 1n x b = 0 ensures x b is perpendicular to the kernel of L. The expression x b∗ = L† BΣ−1 y follows from left-multiplying left and right hand side of equation (8.3) by the pseudoinverse Laplacian matrix L† and using the property L† L = In − n1 1n 1> n . One can also † BΣ−1 y = 0, because L† 1 = 0 . verify that 1> L  n n n Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 100
8.3.3
Chapter 8. The Incidence Matrix and Relative Measurements
Optimal estimation via decentralized computation
To compute x b∗ in a distributed way, we propose the following distributed algorithm; see (Bolognani et al. 2010, Theorem 4). Pick a small α > 0 and let each node implement the affine averaging algorithm:  X 1  x b (k) − x b (k) − y , x bi (k + 1) = x bi (k) − α i j (i,j) σ2 (8.4) j∈N (i) (i,j) x bi (0) = 0.
There are two interpretations of this algorithm. First, note that the estimate at note i is adjusted at each iteration as a function of edge errors: each edge error (difference between estimated and measured edge difference) contributes to a weighted small correction in the node value. Second, note that the affine Laplacian flow x ˆ˙ = −Lˆ x + BΣ−1 y (8.5)
results in a steady-state satisfying Lˆ x = BΣ−1 y, which readily delivers the optimal estimate x b∗ = L† BΣ−1 y for appropriately chosen initial conditions. The algorithm (8.4) results from an Euler discretization of the affine Laplacian flow (8.5) with step size α. Lemma 8.6. Given a graph G describing a relative measurement problem for the unknown variables x ∈ Rn , 2 } m×m . The with measurements y ∈ Rm , and measurement covariance matrix Σ = diag({σ(i,j) (i,j)∈E ) ∈ R following statements hold: (i) the affine averaging algorithm can be written as x b(k + 1) = (In − αL)b x(k) + αBΣ−1 y, x b(0) = 0n .
(8.6)
(ii) if G is connected and if α < 1/dmax where dmax is the maximum weighted out-degree of G, then the solution k 7→ x b(k) of the affine averaging algorithm (8.4) converges to the unique solution x b∗ of the optimization problem 8.2.
Proof. To show fact (i), note that the algorithm can be written in vector form as x b(k + 1) = x b(k) − αBΣ−1 (B > x b(k) − y),
and, using L = BΣ−1 B > , as equation (8.6). To show fact (ii), define the error signal η(k) = x b∗ − x b(k). Note that η(0) = x b∗ and that average(η(0)) = 0 because 1> b∗ = 0. Compute nx η(k + 1) = (In − αL + αL)b x∗ − (In − αL)b x(k) − αBΣ−1 y = (In − αL)η(k) + α(Lb x∗ − BΣ−1 y) = (In − αL)η(k).
Now, according to Exercise E7.4, α is sufficiently small so that In − αL is nonnegative. Moreover, (In − αL) is doubly-stochastic and symmetric, and its corresponding undirected graph is connected and aperiodic, Corollary 5.1 implies that η(k) → average(η(0))1n = 0n .  Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 8.4. Cycle and cutset spaces
8.4
101
Cycle and cutset spaces
As stated in the factorization in Lemma 8.2, we know that the incidence matrix contains at least as much information as the Laplacian matrix. Indeed, we argue via the following example that the incidence matrix contains additional information and subtleties. Recall the distributed estimation in Example 8.3 defined over an undirected ring graph. Introduce next an arbitrary orientation for the edges and, for simplicity, assume all edges are oriented counterclockwise. Then, in the absence of noise, a summation of all measurements y(i,j) in this ring graph yields X X > > y(i,j) = xi − xj = 0 or 1> n y = 1n B x = 0 , (i,j)∈E
(i,j)∈E
that is, all relative measurements around the ring cancel out. Equivalently, 1n ∈ kernel(B). This consistency check can be used as additional information to process corrupted measurements. These insights generalize to arbitrary graphs, and the nullspace of B and its orthogonal complement, the image of B > , can be related to cycles and cutsets in the graph. In what follows, we present some of these generalizations; the presentation in this section is inspired by (Biggs 1994; Zelazo 2009). As a running example in this section we use the graph and the incidence matrix illustrated in Figure 8.3.
1
2
3
1
5
4 2
3
6
6
7 4
5
2
+1 +1 6 1 0 6 60 1 B=6 60 0 6 40 0 0 0
0 0 +1 0 0 +1 1 1 0 0 0 0
0 0 +1 0 0 1
3 0 0 0 07 7 +1 0 7 7 0 +17 7 1 15 0 0
Figure 8.3: An undirected graph with arbitrary edge orientation and its associated incidence matrix B ∈ R6×7 .
Definition 8.7 (Signed path vector). Given an undirected graph G, consider an arbitrary orientation of its m edges and a simple path γ. The signed path vector v ∈ Rm corresponding to the path γ is defined by   +1, if edge i is traversed positively by γ, vi = −1, if edge i is traversed negatively by γ,   0, otherwise.
Proposition 8.8 (Cycle space). Given an undirected graph G, consider an arbitrary orientation of its edges and the incidence matrix B ∈ Rn×m . The null space of B, called the cycle space, is spanned by the signed path vectors corresponding to all the cycles in G. The proposition follows from the following lemma. Lemma 8.9. Given an undirected graph G, consider an arbitrary orientation of its edges, its incidence matrix B ∈ Rn×m , and a simple path γ with distinct initial and final nodes described by a signed path vector v ∈ Rm . Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 102
Chapter 8. The Incidence Matrix and Relative Measurements
The vector y = Bv has components   +1, if node i is the initial node of γ, yi = −1, if node i is the final node of γ,   0, otherwise.
Proof. We write y = B diag(v)1n . The (i, e) element of the matrix B diag(v) takes the value −1 (respectively +1) if edge e is used by the path γ to enter (respectively leave) node i. Now, if node i is not the initial or final node of the path γ, then the ith row sum of B diag(v), (B diag(v)1n )i , is zero. For the initial node (B diag(v)1n )i = 1, and for the final node (B diag(v)1n )i = −1.  For the example in Figure 8.3, two cycles and their signed path vectors are illustrated in Figure 8.4. Observe that v1 , v2 ∈ kernel(B) and the cycle traversing the edges (1, 3, 7, 5, 2) in counter-clockwise orientation has a signed path vector given by the linear combination v1 + v2 .
1
2 1
2
v1 3
3
+1 B6 1 B6 B6+1 B6 6 kernel(B) = span B B6 1 B6 0 B6 @4 0 0
6
6 5 v2
4
02
7
5
4
31 0 C 07 7C 7 0 7C C C +17 7C = span(v1 , v2 ) 7 17C C 0 5A +1
Figure 8.4: Two cycles and their respective signed path vectors in kernel(B).
Definition 8.10 (Cutset orientation vector). Given an undirected graph G, consider an arbitrary orientation of its edges and a partition of its vertices V in two non-empty and disjoint sets V1 and V2 . The cutset orientation vector v ∈ Rm corresponding to the partition V = V1 ∪ V2 has components   +1, if edge e has its source node in V1 and sink node in V2 , vi = −1, if edge e has its sink node in V1 and source node in V2 ,   0, otherwise.
Proposition 8.11 (Cutset space). Given an undirected graph G, consider an arbitrary orientation of its edges and its incidence matrix B ∈ Rn×m . The image of B > , called the cutset space, is spanned by all cutset orientation vectors corresponding to all partitions of the graph. Proof. For a cutset orientation vector v ∈ Rm associated to the partition V = V1 ∪ V2 , we have 1  X > X > v> = bi − bi , 2 i∈V1
i∈V2
m > where b> i is the ith row of the incidence matrix. If Bx = 0n for some x ∈ R , then bi x = 0 for all > i ∈ {1, . . . , n}. It follows that v x = 0, or equivalently, v belongs to the orthogonal complement of
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 8.4. Cycle and cutset spaces
103
kernel(B) which is the image of B > . Finally, notice that the image of B > can be constructed this way: the kth column of B > is obtained by choosing the partition V1 = {k} and V2 = V \ {k}. Thus, the cutset orientation vectors span the image of B > .  Since rank(B) = n − 1, any n − 1 columns of the matrix B > form a basis for the cutset space. For instance, the ith column corresponds to the cut isolating node i as V = {i} ∪ V \ {i}. For the example in Figure 8.3, five cuts and their cutset orientation vectors are illustrated in Figure 8.5. Observe that vi ∈ image(B > ), for i ∈ {1, . . . , 5}, and the cut isolating node 6 has a cutset orientation vector given by the linear combination −(v1 + v2 + v3 + v4 + v5 ). Likewise, the cut separating nodes {1, 2, 3} from {4, 5, 6} has the cutset vector v1 + v2 + v3 corresponding to the sum of the first three columns of B > .
v1 1
2
v3
3
1 2 v2
3
6
6
image B T
5
4
02
5 v5
7 4 v4
+1 B6+1 B6 B6 0 B6 6 = span B B6 0 B6 0 B6 @4 0 0
1 0 +1 0 0 0 0
0 0 1 0 0 1 +1 1 +1 0 +1 0 0 +1
= span (v1 , v2 , v3 , v4 , v5 )
31 0 C 07 7C C 07 7C 7 0 7C C C 17 7C 5 0 A 1
Figure 8.5: Five cuts and their cutset orientation vectors in image(B > ).
Example 8.12 (Nonlinear network flow problem). Consider a network flow problem where a commodity (e.g., power or water) is transported through a network (e.g., a power grid or a piping system). We model this scenario with an undirected and connected graph with n nodes. With each node we associate an external supply/demand variable (positive for a source and negative for a sink) yi and assume that the overall network is balanced: Pn i=1 yi = 0. We also associate a potential variable xi with every node (e.g., voltage or pressure), and assume the flow of commodity between two connected nodes i and j depends on the potential difference as fij (xi − xj ), where fij is a strictly increasing function satisfying fij (0) = 0. For example, for piping systems and power grids these functions fij are given by the rational Hazen-Williams flow and the trigonometric power flow, which are both monotone in the region of interest. By balancing the flow at each node (akin to the Kirchhoff’s current law), we obtain at node i n X yi = aij fij (xi − xj ) , i ∈ {1, . . . , n} , j=1
where aij ∈ {0, 1} is the (i, j) element of the network adjacency matrix. In vector notation, the flow balance is   y = Bf B > x ,
where f ∈ RE is the vector-valued function with components fij . Consider also the associated linearized problem y = BB > x = Lx, where L is the network Laplacian matrix, where we implicitly assumed fij0 (0) = 1. The flows in the linear problem are obtained as B > x? = B > L† y, where L† is the Moore-Pennrose inverse of L; see Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 104
Chapter 8. The Incidence Matrix and Relative Measurements
Exercises E6.7 and E6.8. In the following we restrict ourselves to an acyclic network and show that the nonlinear solution can be obtained from the solution of the linear problem.  We formally replace the flow f B > x by a new variable v := f B > x and arrive at y = Bv ,   v = f B>x .
(8.7a)
(8.7b)
In the acyclic case, kernel(B) = {0} and necessarily v ∈ image(B > ), or v = B > w for some w ∈ Rn . Thus, equation(8.7a) reads as y = Bv = BB > w = Lw and its solution is w = L† y. Equation (8.7b) then reads as f B > x = v = B > w = B > L† y, and its unique solution (due to monotonicity) is B > x? = f −1 (B > L† y). •
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 8.5. Exercises
105
8.5
Exercises
E8.1
Continuous distributed estimation from relative measurements. Consider the continuous distributed estimation algorithm given by the affine Laplacian flow (8.5). Show that for an undirected and connected graph G and appropriately initial conditions x ˆ(0) = 0n , the affine Laplacian flow (8.5) converges to the unique solution x b∗ of the estimation problem given in Lemma 8.5.
E8.2
The edge Laplacian matrix (Zelazo and Mesbahi 2011). For an unweighted undirected graph with n nodes and m edges, introduce an arbitrary orientation for the edges and recall the notions of incidence matrix B ∈ Rn×m and Laplacian matrix L = BB > ∈ Rn×n . Next, define the edge Laplacian matrix Ledge = B > B ∈ Rm×m and show that (i) (ii) (iii) (iv)
E8.3
kernel(Ledge ) = kernel(B); for an acyclic graph Ledge is nonsingular; the non-zero eigenvalues of Ledge are equal to the non-zero eigenvalues of L, and rank(L) = rank(Ledge ).
Evolution of the local disagreement error. Consider the Laplacian flow x˙ = −Lx, x(0) = x0 , defined over an undirected and connected graph with n nodes and m edges. Beside the absolute disagreement error δ = x − average(x0 )1n ∈ Rn considered thus far, we can also analyze the relative disagreement error eij = xi − xj , for {i, j} ∈ E. (i) Write a differential equation for e ∈ Rm , (ii) Based on Exercise E8.2, show that the relative disagreement errors converge to zero with exponential convergence rate given by the algebraic connectivity λ2 (L).
E8.4
Averaging with distributed integral control. Consider a Laplacian flow implemented as a relative sensing network over a connected and undirected graph with incidence matrix B ∈ Rn×|E| and weights aij > 0 for i, j ∈ E, and subject to a constant disturbance term η ∈ R|E| , as shown in Figure E8.1.
..
u
_
.
x
x˙ i = ui
..
BT
B
⌘
+
.
+ z
..
.
aij
..
.
y
Figure E8.1: A relative sensing network with a constant disturbance input η ∈ R|E| . (i) Derive the dynamic closed-loop equations describing the model in Figure E8.1. (ii) Show that asymptotically all states x(t) converge to some constant vector x∗ ∈ Rn depending on the value of the disturbance η, i.e., x∗ is not necessarily a consensus state. Consider the system in Figure E8.1 with a distributed integral controller forcing convergence to consensus, as shown in Figure E8.2. Recall that 1s is the the Laplace symbol for the integrator. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 106
Chapter 8. The Incidence Matrix and Relative Measurements
.. u
_
.
x
x˙ i = ui
..
. BT
B
⌘
+
+
+ p
z
..
..
.
aij
..
.
..
.
y
.1 s
Figure E8.2: Relative sensing network with a disturbance η ∈ R|E| and distributed integral action. (iii) Derive the dynamic closed-loop equations describing the model in Figure E8.2. (iv) Show that the distributed integral controller in Figure E8.1 asymptotically stabilizes the set of steady states (x∗ , p∗ ), where x∗ ∈ span(1n ) corresponds to consensus. Hint: To show stability, use Lemma 7.5.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Chapter 9
Compartmental and Positive Systems This chapter is inspired by the excellent text (Walter and Contreras 1999) and the tutorial treatment in (Jacquez and Simon 1993); see also the texts (Farina and Rinaldi 2000; Haddad et al. 2010; Luenberger 1979). Before discussing compartmental systems, it is convenient to introduce a key useful definition. A square matrix A is said to be Metzler if all its off-diagonal elements are nonnegative. In other words, A is Metzler if and only if there exists a scalar a > 0 such that A + aIn is nonnegative. For example, if G is a weighted digraph and L is its Laplacian matrix, then −L is a Metzler matrix with zero row-sums.
9.1
Introduction and example systems
Compartmental systems model dynamical processes characterized by conservation laws (e.g., mass, fluid, energy) and by the flow of material between units known as compartments. Example compartmental systems are transportation networks, queueing networks, communication networks, epidemic propagation models in social contact networks, as well as ecological and biological networks. We review some examples in what follows. Ecological and environmental systems The flow of energy and nutrients (water, nitrates, phosphates, etc) in ecosystems is typically studied using compartmental modelling. For example, Figure 9.1 illustrates a widely-cited water flow model for a desert ecosystem (Noy-Meir 1973). Other classic ecological network systems include models for dissolved oxygen in stream, nutrient flow in forest growth and biomass flow in fisheries (Walter and Contreras 1999). Epidemiology of infectious deseases To study the propagation of infectious deseases, the population at risk is typically divided into compartments consisting of individiduals who are susceptible (S), infected (I), and, possibly, recovered and no longer susceptible (R). As illustrated in Figure 9.2, the three basic epidemiological models are (Hethcote 2000) called SI, SIS, SIR, depending upon how the desease spreads. A detailed discussion is postponed until Chapter 16. 107
 108
Chapter 9. Compartmental and Positive Systems
soil
precipitation
evaporation, drainage, runo↵
uptake
transpiration
plants
drinking
herbivory evaporation
animals
Figure 9.1: Water flow model for a desert ecosystem. The blue line denotes an inflow from the outside environment. The red lines denote outflows into the outside environment. Infected
Susceptible
Susceptible
Susceptible
Infected
Infected
Recovered
Figure 9.2: The three basic models SI, SIS and SIR for the propagation of an infectious desease
Drug and chemical kinetics in biomedical systems Compartmental model are also widely adopted to characterize the kinetics of drugs and chemicals in biomedical systems. Here is a classic example (Charkes et al. 1978) from nuclear medicine: bone scintigraphy (also called bone scan) is a medical test in which the patient is injected with a small amount of radioactive material and then scanned with an appropriate radiation camera. . rest of the body
radioactive material
blood
kidneys
bone ECF
bone
urine
Figure 9.3: The kinetics of a radioactive isotope through the human body (ECF = extra-cellular fluid).
9.2
Compartmental systems
A compartmental system is a system in which material is stored at individual locations and is transfered along the edged of directed graph, called the compartmental digraph; see Figure 9.4(b). The “storage” nodes are referred to as compartments; each compartment contains a time-varying quantity qi (t). Each Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 9.2. Compartmental systems
109
u3
ui
qi
q3
F3!2
Fi!j
q4
F4!3
F4!0
Fi!0
Fj!i u1
q1
F2!3 F1!2
F2!4 q2
F2!0
Figure 9.4: A compartment and a compartmental system
directed arc (i, j) represents a mass flow (or flux), denoted Fi→j , from compartment i to compartment j. The compartmental system also interacts with its surrounding environment via appropriate inputs and output arcs, denoted in figure by blue and red arcs respectively: the inflow from the environment into compartment i is denoted by ui and the outflow out of compartment i into the environment is denoted by Fi→0 . The dynamic equations describing the system evolution are precisely obtained by the instantaneous flow balance equations at each compartment, that is, the rate of accumulation at each compartment equals the net inflow rate: q˙i (t) =
n X
j=1,j6=i
(9.1)
(Fj→i − Fi→j ) − Fi→0 + ui .
In general, the flow along (i, j) is a function of the entire system state q = (q1 , . . . , qn ) and of time t, so that Fi→j = Fi→j (q, t). Note that the mass in each of the compartments as well as the mass flowing along each of the edges must be nonnegative at all times. Specifically, we require the mass flow functions to satisfy Fi→j (q, t) ≥ 0 for all (q, t), and
Fi→j (q, t) = 0 for all (q, t) such that qi = 0.
(9.2)
Under these conditions, if at some time P t0 one of the compartments has no mass, that is, qi (t0 ) = 0 n and q(t0 ) ∈ R≥0 , it follows that q˙i (t0 ) = nj=1,j6=i Fj→i (q(t0 ), t0 ) + ui ≥ 0 so that qi does not become negative. In summary, the compartmental system (9.1) is called positive in the sense that q(t) ∈ Rn≥0 for all t, provided q(0) ∈ Rn≥0 . P If M (q) = ni=1 qi = 1> n q denotes the total mass in the system, then along the solutions of (9.1) Xn d M (q(t)) = 1> ˙ =− Fi→0 (q(t), t) + n q(t) i=1 dt | {z } outflow into environment
Xn
ui | i=1 {z }
.
(9.3)
inflow from environment
This equality implies that the total mass t 7→ M (q(t)) is constant in systems without inflows and outflows. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 110
Chapter 9. Compartmental and Positive Systems
Linear compartmental systems Intuitively speaking, a compartmental system is linear if it has constant nonnegative inflow from the environment, and nonnegative flows (between compartments and from compartments into the environment) that depend linearly upon the mass in the originating compartment. Definition 9.1 (Linear compartmental system and digraph). A linear compartmental system with n compartments is triplet (F, f0 , u) consisting of (i) a nonnegative n × n matrix F = (fij )i,j∈{1,...,n} with zero diagonal, called the flow rate matrix,
(ii) a vector f0 ≥ 0n , called the outflow rates vector, and
(iii) a vector u ≥ 0n , called the inflow vector.
The weighted digraph associated to F , called the compartmental digraph, encodes the following information: the nodes are the compartments {1, . . . , n}, there is an edge (i, j) if there is a flow from compartment i to compartment j, and the weight fij of the (i, j) edge is corresponding flow rate constant. The compartmental digraph has no self-loops. In a linear compartmental system, Fi→j (q, t) = fij qi , Fi→0 (q, t) = f0i qi ,
for j ∈ {1, . . . , n}, and
ui (q, t) = ui . Indeed, this model is also referred to as donor-controlled flow. Note that this model satisfies the physicallymeaningful relationships (9.2). The affine dynamics describing the linear compartmental system is n n   X X q˙i (t) = − f0i + fij qi (t) + fji qj (t) + ui . j=1,j6=i
(9.4)
j=1,j6=i
Definition 9.2 (Compartmental matrix). The compartmental matrix C = (cij )i,j∈{1,...,n} of a compartmental system (F, f0 , u) is defined by ( fji , cij = P −f0i − nh=1,h6=i fhi ,
if i 6= j, if i = j.
Equivalently, if LF = diag(F 1n ) − F is the Laplacian matrix of the compartmental digraph, > C = −L> F − diag(f0 ) = F − diag(F 1n + f0 ).
(9.5)
In what follows it is convenient to call compartmental any matrix C with the following properties: (i) C is Metzler, i.e., its off-diagonal elements are nonnegative: cij ≥ 0,
(ii) C has nonpositive diagonal entries: cii ≤ 0, and
(iii) C is column diagonally dominant in the sense that |cii | ≥
Pn
j=1,j6=i cji .
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 9.2. Compartmental systems
111
With the notion of compartmental matrix, the dynamics of the linear compartmental system (9.4) can be written as q(t) ˙ = Cq(t) + u. (9.6) > Moreover, since LF 1n = 0n , we know 1> n C = −f0 and, consistently with equation (9.3), −f0> q(t) + 1> n u.
d dt M (q(t))
=
Algebraic and graphical properties of linear compartmental systems In this section we present useful properties of compartmental matrices, that are parallel to those enjoyed by Laplacian matrices. Lemma 9.3 (Spectral properties of compartmental matrices). For a compartmental system (F, f0 , u) with compartmental matrix C, (i) if λ ∈ spec(C), then either λ = 0 or <(λ) < 0, and
(ii) C is invertible if and only if C is Hurwitz (i.e., <(λ) < 0 for all λ ∈ spec(C)). Proof. We here sketch the proof and invite the reader to fill out the details in Exercise E9.1. Statement (i) is akin the result in Lemma 6.3 and can be proved by an application of the Geršgorin Disks Theorem 2.9. Statement (i) immediately implies statement (ii).  Next, we introduce some useful graph-theoretical notions, illustrated in Figure 9.5. In the compartmental digraph, a set of compartments S is (i) outflow-connected if there exists a directed path from every compartment in S to the environment, that is, to a compartment j with a positive flow rate constant f0j > 0, (ii) a trap if there is no directed path from any of the compartments in S to the environment or to any compartment outside S, and (iii) a simple trap is a trap that has no traps inside it. It is immediate to realize the following equivalence: the system is out-flow connected (i.e., all compartments are outflow-connected) if and only if the system contains no trap. (An outflow-connected compartmental matrix is referred to as “weakly chained diagonally dominant” in (Shivakumar et al. 1996) and related references.) Theorem 9.4 (Algebraic graph theory of compartmental systems). Consider the linear compartmental system (F, f0 , u) with dynamics (9.6) with compartmental matrix C and compartmental digraph GF . The following statements are equivalent: (i) the system is outflow-connected, (ii) each sink of the condensation of GF is outflow-connected, and (iii) the compartmental matrix C is invertible. Moreover, the sinks of the condensation of GF that are not outflow-connected are precisely the simple traps of the system and their number equals the multiplicity of 0 as a semisimple eigenvalue of C. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 112
Chapter 9. Compartmental and Positive Systems
(a) An example compartmental system and its strongly connected components: this system is outflow-connected because its two sinks in the condensation digraph are outflow-connected.
(b) This compartmental system is not outflow-connected because one of its sink strongly-connected components is a trap.
Figure 9.5: Outflow-connectivity and traps in compartmental system
The proof of the equivalence between (i) and (iii) is similar to the proof in Theorem 6.4. Proof. The equivalence between statements (i) and (ii) is immediate. To establish the equivalence between (ii) and (iii), we first consider the case in which GF is strongly connected and at least one compartment has a strictly positive outflow rate. Therefore, the Laplacian matrix LF of GF and the compartmental matrix C = −L> F − diag(f0 ) are irreducible. Pick 0 < ε < 1/ maxi |fii |, and define A = In + εC > . Because of the definition of ε, the matrix A is nonnegative and irreducible. We compute its row sums as follows: A1n = 1n + ε(−LF − diag(f0 ))1n = 1n − εf0 . Therefore, A is row-substochastic as defined in Exercise E2.5, that is, all its row-sums are at most 1 and one row-sum is strictly less than 1. Moreover, because A is irreducible, the results in Exercise E4.4 imply that ρ(A) < 1. Now, let λ1 , . . . , λn denote the eigenvalues of A. Because A = In + εC > , we know that the eigenvalues η1 , . . . , ηn of C satisfy λi = 1 + εηi so that maxi <(λi ) = 1 + ε maxi <(ηi ). Finally, we note that ρ(A) < 1 implies maxi <(λi ) < 1 so that max <(ηi ) = i
 1 max <(λi ) − 1 < 0. i ε
This concludes the proof that if G is strongly connected, then F has eigenvalues with strictly negative real part. The converse is easy to prove by contradiction: if f0 = 0n , then the matrix C should be row-stochastic, but that is a contradiction with the assumption that C is invertible. Next, to prove the equivalence between (ii) and (iii) for a graph GF whose condensation digraph has an arbitrary number of sinks, we proceed as in the proof of Theorem 6.4: we reorder the compartments as described in Exercise E3.1 so that the Laplacian matrix LF is block lower-triangular as in equation (6.5). We then define an appropriately small ε and the matrix A = In − εC > as above. We leave the remaining details to the reader.  Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 9.3. Positive systems
113
Dynamic properties of linear compartmental systems We now state our main result about the asymptotic behavior of linear compartmental systems. Theorem 9.5 (Asymptotic behavior of compartmental systems). The linear compartmental system (F, f0 , u) with compartmental matrix C and compartmental digraph GF has the following possible asymptotic behaviors: (i) if the system is outflow-connected, then the compartmental matrix C is invertible, every solution tends exponentially to the unique equilibrium q ∗ = −C −1 u ≥ 0n , and in the ith compartment qi∗ > 0 if and only if the ith compartment is inflow-connected to a strictly positive inflow; (ii) if the system contains one or more simple traps, then: (a) define the reduced compartmental systems (Frd , f0,rd , urd ) as follows: remove all traps from GF and regard the edges into the trapping compartments as outflow edges into the environment. The reduced compartmental system (Frd , f0,rd , urd ) is outflow-connected and all its solutions converge exponentially −1 > − diag(F 1 + f fast to the unique nonnegative equilibrium −Crd urd , for Crd = Frd rd n 0,rd ); (b) any simple trap H contains non-decreasing mass along time. If H is inflow-connected to a positive inflow, then the mass inside H goes to infinity. Otherwise, the mass inside H converges to a scalar multiple of the right eigenvector corresponding to the eigenvalue 0 of the compartmental submatrix for H.
Proof. Regarding statement (i), note that the system q˙ = Cq + u is an affine continuous-time system with a Hurwitz matrix C and, by Exercise E7.2, the system has a unique equilibrium point q ∗ = −C −1 u, that is globally exponentially stable. The fact that −C −1 u ≥ 0n follows from a property of Hurwitz Metzler matrices, which we study in Theorem 9.8 in the next section. We leave the proof of statement (ii) to the reader. 
9.3
Positive systems
In this section we generalize the class of compartmental systems and study more general network systems called positive systems. Definition 9.6 (Positive systems). A dynamical system x(t) ˙ = f (x(t), t), x ∈ Rn , is positive if x(0) ≥ 0n implies x(t) ≥ 0n for all t. We are especially interested in linear and affine systems, described by x(t) ˙ = Ax(t),
and
x(t) ˙ = Ax(t) + b.
Note that the set of affine systems includes the set linear systems (each linear system is affine with b = 0n ). The following result classifies which affine systems are positive. Theorem 9.7 (Positive affine systems and Metzler matrices). For the affine system x(t) ˙ = Ax(t) + b, the following statements are equivalent: Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 114
Chapter 9. Compartmental and Positive Systems
(i) the system is positive, that is, x(t) ≥ 0n for all t and all x(0) ≥ 0n ,
(ii) A is Metzler and b ≥ 0n .
Proof. We start by showing that statement (i) implies statement (ii). If x(0) = 0n , then x˙ cannot have any negative components, hence b ≥ 0n . If any off-diagonal entry (i, j), i 6= j, of A is strictly negative, then consider an initial condition x(0) with all zero entries except for x(j) > bi /|aij |. It is easy to see that x˙ i (0) < 0 which is a contradiction. Finally, to show that statement (ii) implies statement (i), it suffices to note that, anytime there exists i such that xi (t) = 0, the condition x(t) ≥ 0n , A Metzler and b ≥ 0n together imply x˙ i (t) = P  j6=j aij xj (t) + bi ≥ 0. Note: as expected, compartmental systems are positive affine systems. Specifically, compartmental systems are positive affine systems with additional properties (the compartmental matrix has nonpositive diagonal entries and it is column diagonally dominant).
Theorem 9.8 (Properties of Hurwitz Metzler matrices). For a Metzler matrix A, the following statements are equivalent: (i) A is Hurwitz, (ii) A is invertible and −A−1 ≥ 0, and
(iii) for all b ≥ 0n , there exists x∗ ≥ 0n solving Ax∗ + b = 0n .
Moreover, if A is Metzler, Hurwitz and irreducible, then −A−1 > 0. Proof. We start by showing that (i) implies (ii). Clearly, if A is Hurwitz, then it is also invertible. So it suffices to show that −A−1 is nonnegative. Pick ε > 0 and define Aε,A = In + εA, that is, (−εA) = (In − Aε,A ). Because A is Metzler, ε can be selected small enough so that Aε,A ≥ 0. Moreover, because the spectrum of M is strictly in the left half plane, one can verify that, for ε small enough, spec(εA) is inside the disk of unit radius centered at the point −1. In turn, this last property implies that spec(In + εA) is strictly inside the disk of unit radius centered at the origin, that is, ρ(Aε,A ) < 1. We now adopt the Neumann series as defined in Exercise E2.12: because ρ(Aε,A ) < 1, we know that (In − Aε,A ) = (−εA) is invertible and that (−εA)
−1
−1
= (In − Aε,A )
=
∞ X k=0
Akε,A .
(9.7)
Note now that the right-hand side is nonnegative because it is the sum of nonnegative matrices. In summary, we have shown that A is invertible and that −A−1 ≥ 0. This statement proves that (i) implies (ii). Next we show that (ii) implies (i). We know A is Metzler, invertible and satisfies −A−1 ≥ 0. By the Perron-Frobenius Theorem for Metzler matrices in Exercise E9.4, we know there exists v ≥ 0n satisfying Av = λMetzler v, where λMetzler = max{<(λ) | λ ∈ spec(A)}. Clearly, A invertible implies λMetzler 6= 0 and, moreover, v = λMetzler A−1 v. Now, we know v is nonnegative and A−1 v is nonpositive. Hence, λMetzler must be negative and, in turn, A is Hurwitz. This statement establishes the equivalence between (ii) implies (i) Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 9.3. Positive systems
115
Finally, regarding the equivalence between statement (ii) and statement (iii), note that, if −A−1 ≥ 0 and b ≥ 0n , then clearly x∗ = −A−1 b ≥ 0n solves Ax∗ + b = 0n . This proves that (ii) implies (iii). Vice versa, if statement (iii) holds, then let x∗i be the nonnegative solution of Ax∗i = −ei and let X be the nonnegative matrix with columns x∗1 , . . . , x∗n . Therefore, we know AX = −In so that A is invertible, −X is its inverse, and −A−1 = −(−X) = X is nonnegative. This statement proves that (iii) implies (ii). Finally, the statement that −A−1 > 0 for each Metzler, Hurwitz and irreducible matrix A is proved as follows. Because A is irreducible, the matrix Aε,A = In + εA is nonnegative (for ε sufficiently small) and primitive. Therefore, the right-hand side of equation (9.7) is strictly positive.  This theorem about Metzler matrices immediately leads to some useful properties of positive affine systems. Corollary 9.9 (Existence, positivity and stability of equilibria for positive affine systems). Consider a continuous-time positive affine system x˙ = Ax + b, where A is Metzler and b is nonnegative. If the matrix A is Hurwitz, then (i) the system has a unique equilibrium point x∗ ∈ Rn , that is, a unique solution to Ax∗ + b = 0n ,
(ii) the equilibrium point x∗ is nonnegative, and
(iii) all trajectories converges asymptotically to x∗ . Several other properties of positive affine systems and Metzler matrices are reviewed in (Berman and Plemmons 1994), albeit with a slightly different language.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 116
9.4
Chapter 9. Compartmental and Positive Systems
Table of asymptotic behaviors for averaging and positive systems Dynamics
Assumptions & Asymptotic Behavior
averaging system x(k + 1) = Ax(k) A row-stochastic
References
the associated digraph has a globally reachable node Theorem 5.2. Averaging example sys=⇒ limk→∞ x(k) = (w> x(0))1n where w ≥ 0 is the left tems in Chapter 1 eigenvector of A with eigenvalue 1 satisfying 1> nw = 1
affine system A convergent (i.e., its spectral radius is less than 1) x(k + 1) = Ax(k) + b =⇒ limk→∞ x(k) = (In − A)−1 b
Exercise E2.10. Friedkin-Johnsen system in Exercise E5.4
positive affine system x(0) ≥ 0n =⇒ x(k) ≥ 0n for all k, and x(k + 1) = Ax(k) + b A ≥ 0, b ≥ 0n A convergent (i.e., |λ| < 1 for all λ ∈ spec(A)) =⇒ limk→∞ x(k) = (In − A)−1 b ≥ 0n
Exercise E2.12
Table 9.1: Discrete-time systems
Dynamics
Assumptions & Asymptotic Behavior
averaging system x(t) ˙ = −Lx(t) L Laplacian matrix
affine system x(t) ˙ = Ax(t) + b positive affine system x(t) ˙ = Ax(t) + b A Metzler, b ≥ 0n
References
the associated digraph has a globally reachable node Theorem 7.3. =⇒ Flocking example syslimt→∞ x(t) = (w> x(0))1n where w ≥ 0 is the left tem in Section 7.1 eigenvector of L with eigenvalue 0 satisfying 1> nw = 1 A Hurwitz (i.e., its spectral abscissa is negative) =⇒ limt→∞ x(t) = −A−1 b
Exercise E7.2
x(0) ≥ 0n =⇒ x(t) ≥ 0n for all t, and
Theorem 9.7 and Corollary 9.9. Compartmental systems in Section 9.1.
A Hurwitz (i.e., <(λ) < 0 for all λ ∈ spec(A)) =⇒ limt→∞ x(t) = −A−1 b ≥ 0n Table 9.2: Continuous-time systems
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 9.5. Exercises
117
9.5
Exercises
E9.1
A simple proof. Write in details the proof of Lemma 9.3.
E9.2
Simple traps and strong connectivity. Show that a compartmental system that has no outflows and that is a simple trap, is strongly connected.
E9.3
The matrix exponential of a Metzler matrix. Recall Exercise E7.3 about Laplacian matrices. For M ∈ Rn×n , show that (i) exp(M t) ≥ 0 for all t if and only if M is Metzler, and (ii) if M is Metzler, then for all t ≥ 0, there exists p > 0 such that exp(M t) ≥ pIn .
E9.4
Perron-Frobenius Theorem for Metzler matrices. Let M be a Metzler matrix and spec(M ) be its spectrum. Show that: (i) λMetzler (M ) = max{<(λ) | λ ∈ spec(M )} is an eigenvalue of M , (ii) if v ∈ Rn satisfies M v = λMetzler (M )v, then v ≥ 0n .
Hint: Recall the Perron-Frobenius Theorem 2.14 for nonnegative matrices. E9.5
Perron-Frobenius Theorem for irreducible Metzler matrices. Let M be an irreducible Metzler matrix and spec(M ) be its spectrum. Show that: (i) λMetzler (M ) = max{<(λ) | λ ∈ spec(M )} is a simple eigenvalue of M , (ii) if v ∈ Rn satisfies M v = λMetzler (M )v, then v is unique (up to scalar multiple) and v > 0.
Hint: Recall the Perron-Frobenius Theorem 2.15 for irreducible matrices. E9.6
On Metzler matrices and compartmental systems with growth and decay. Let M be an n × n symmetric Mezler matrix. Recall Lemma 9.3 and define v ∈ Rn by M = −L + diag(v), where L is a symmetric Laplacian matrix. Show that: (i) if M is Hurwitz, then 1> n v < 0. Next, assume n = 2 and assume v has both nonnegative and nonpositive entries. (If v is nonnegative, lack of stability can be established from statement (i); if v is nonpositive, stability can be established via Theorem 9.4.) Show that (ii) there exist nonnegative numbers f , d and g such that, modulo a permutation, M can be written in the form:       1 −1 g 0 (g − f ) f M = −f + = , −1 1 0 −d f (−d − f ) (iii) M is Hurwitz if and only if d>g
and f >
gd . d−g
Note: The inequality d > g (for n = 2) is equivalent to the inequality 1> n v < 0 in statement (i). In the interpretation of compartmental systems with growth and decay rates, f is a flow rate, d is a decay rate and g is a growth rate; the statement (iii) is then interpreted as follows: M is Hurwitz if and only if the decay rate is larger than the growth rate and the flow rate is sufficiently large. E9.7
Nonnegative inverses for nonnegative matrices. Let A be a nonnegative square matrix and show that the following statements are equivalent: (i) λ > ρ(A), and Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 118
Chapter 9. Compartmental and Positive Systems (ii) the matrix (λIn − A) is invertible and its inverse (λIn − A)−1 is nonnegative.
Moreover, show that
(iii) if A is irreducible and λ > ρ(A), then (λIn − A)−1 is positive.
E9.8
(Given a square matrix A, the map λ 7→ (λIn − A)−1 is sometimes referred to as the resolvent of A.) Equilibrium points for positive systems. Consider two continuous-time positive affine systems x˙ = Ax + b, b + bb. x˙ = Ax
b are Hurwitz and, by Corollary 9.9, let x∗ and x Assume that A and A b∗ denote the equilibrium points of b and b ≥ bb imply x∗ ≥ x the two systems. Show that the inequalities A ≥ A b∗ .
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Chapter 10
Convergence Rates, Scalability and Optimization In this chapter we discuss the convergence rate of averaging algorithms. We borrow ideas from (Xiao and Boyd 2004). We focus on discrete-time systems and their convergence factors. The study of continuous-time systems is analogous. Before proceeding, we recall a few basic facts. Given a square matrix A, (i) the spectral radius of A is ρ(A) = max{|λ| | λ ∈ spec(A)},
 (ii) the p-induced norm of A, for p ∈ N ∪{∞}, is kAkp = max kAxkp | x ∈ Rn and kxkp = 1 , √ (iii) the induced 2-norm of A is kAk2 = max{ λ | λ ∈ spec(A> A)}, (iv) if A = A> , then kAk2 = ρ(A), and 
 1/` (v) in general, ρ(A) ≤ kAkp and even ρ(A) ≤ A` p for any p ∈ N ∪{∞} and ` ∈ N.
Definition 10.1 (Essential spectral radius of a row-stochastic matrix). The essential spectral radius of a row-stochastic matrix A is ( 0, if spec(A) = {1, . . . , 1}, ρess (A) = max{|λ| | λ ∈ spec(A) \ {1}}, otherwise.
10.1
Some preliminary calculations and observations
The convergence factor for symmetric row-stochastic matrices To build some intuition about the general case, we start with a weighted undirected graph G with adjacency matrix A that is row-stochastic and primitive (i.e., the graph G, viewed as a digraph, is strongly connected and aperiodic). We consider the corresponding discrete-time averaging algorithm x(k + 1) = Ax(k). Note that G undirected implies that A is symmetric. Therefore, A has real eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λn and corresponding orthonormal eigenvectors v1 , . . . , vn . Because A is row-stochastic, λ1 = 1 and 119
 120
Chapter 10. Convergence Rates, Scalability and Optimization
√ v1 = 1n / n. Next, along the same lines of the modal decomposion given in Section 2.1, we know that the solution can be decoupled into n independent evolution equations as x(k) = average(x(0))1n + λk2 (v2> x(0))v2 + · · · + λkn (vn> x(0))vn . Moreover, A being primitive implies that max{|λ2 |, . . . , |λn |} < 1. Specifically, for a symmetric and primitive A, we have ρess (A) = max{|λ2 |, |λn |} < 1. Therefore lim x(k) = 1n 1> n x(0)/n = average(x(0))1n .
k→∞
To upper bound the error, since the vectors v1 , . . . , vn are orthonormal, we compute v uX n 
 X 
 2 u n 
 
 k > λj (vj x(0))vj = t |λj |2k (vj> x(0))vj 
x(k) − average(x(0))1n = 2
2
j=2
j=2
2
v uX
2 
u n 
 > 
 kt ≤ ρess (A)
(vj x(0))vj = ρess (A)k x(0) − average(x(0))1n , (10.1) j=2
2
2
where the second and last equalities are Pythagoras Theorem. In summary, we have learned that, for symmetric matrices, the essential spectral radius ρess (A) < 1 is the convergence factor to average consensus. (The wording “convergence factor” is for discrete-time systems, whereas the wording “convergence rate” is for continuous-time systems.) A note on convergence factors for asymmetric matrices Consider now the asymmetric matrix   0.1 1010 Alarge-gain = . 0 0.1
Clearly, the two eigenvalues are 0.1 and so is the spectral radius. This is therefore a convergent matrix. It is however false that the evolution of the system x(k + 1) = Alarge-gain x(k) with an initial condition with non-zero second entry, satisfies a bound of the form in equation (10.1). It is still true, of course, that the solution does eventually converge to zero exponentially fast. The problem is that the eigenvalues (alone) of a non-symmetric matrix do not fully describe the state amplification that may take place during a transient period of time. (Note that the 2-norm of Alarge-gain is order 1010 .)
10.2
Convergence factors for row-stochastic matrices
Consider a discrete-time averaging algorithm (distributed linear averaging) x(k + 1) = Ax(k) Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 10.2. Convergence factors for row-stochastic matrices
121
where A is doubly-stochastic and not necessarily symmetric. If A is primitive (i.e., the associated digraph is aperiodic and strongly connected), we know  lim x(k) = average(x(0))1n = 1n 1> n /n x(0). k→∞
We now define two possible notions of convergence factors. With the shorthand xfinal = average(x(0))1n , the per-step convergence factor is rstep (A) =
sup x(k)6=xfinal
kx(k + 1) − xfinal k2 kx(k) − xfinal k2
and the asymptotic convergence factor is rasym (A) =
sup
lim
x(0)6=xfinal k→∞
kx(k) − xfinal k2 kx(0) − xfinal k2
!1/k
.
Given these definitions the preliminary calculations in the previous Section 10.1, we can now state our main results. Theorem 10.2 (Convergence factor and solution bounds). Let A be doubly-stochastic and primitive. (i) The convergence factors of A satisfy rstep (A) = kA − 1n 1> n /nk2 ,
rasym (A) = ρess (A) = ρ(A − 1n 1> n /n) < 1.
(10.2)
Moreover, rasym (A) ≤ rstep (A), and rstep (A) = rasym (A) if A is symmetric.
(ii) For any initial condition x(0) with corresponding xfinal = average(x(0))1n , 
 
x(k) − xfinal ≤ rstep (A)k x(0) − xfinal , 2 2 
 
x(k) − xfinal ≤ Cε (rasym (A) + ε)k x(0) − xfinal , 2 2
(10.3) (10.4)
where ε > 0 is an arbitrarily small constant and Cε is a sufficiently large constant independent of x(0).
Note: A sufficient condition for rstep (A) < 1 is given in Exercise E10.1. Before proving Theorem 10.2, we introduce an interesting intermediate result. For xfinal = average(x(0))1n , the disagreement vector is the error signal δ(k) = x(k) − xfinal .
(10.5)
Lemma 10.3 (Disagreement or error dynamics). Given a doubly-stochastic matrix A, the disagreement vector δ(k) satisfies (i) δ(k) ⊥ 1n for all k,
 (ii) δ(k + 1) = A − 1n 1> n /n δ(k),
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 122
Chapter 10. Convergence Rates, Scalability and Optimization
(iii) the following properties are equivalent: (a) limk→∞ Ak = 1n 1> n /n, (that is, the averaging algorithm achieves average consensus) (b) A is primitive, (that is, the digraph is aperiodic and strongly connected) (c) ρ(A − 1n 1> n /n) < 1. (that is, the error dynamics is convergent) > > > Proof. To study the error dynamics, note that 1> n x(k+1) = 1n Ax(k) and, in turn, that 1n x(k) = 1n x(0); see also Exercise E7.6. Therefore, average(x(0)) = average(x(k)) and δ(k) ⊥ 1n for all k. This completes the proof of statement (i). To prove statement (ii), we compute
 > δ(k + 1) = Ax(k) − xfinal = Ax(k) − (1n 1> n /n)x(k) = A − 1n 1n /n x(k),
 and the equation in statement (ii) follows from A − 1n 1> n /n 1n = 0n . Next, let us prove the equivalence among the three properties. From Perron–Frobenius Theorem 2.16 for primitive matrices in Chapter 2 and from Corollary 2.20, we know that A primitive (statement (iii)b) implies average consensus (statement (iii)a). The converse is true because 1n 1> n /n is a positive matrix k and, by the definition of limit, there must exist k such that each entry of A becomes positive. Finally, we prove the equivalence between statement (iii)a and (iii)c. First, note that P = In −1n 1> n /n 2 is a projection matrix, that is, P = P . This can be easily verified by expanding the matrix power P 2 . Second, let us prove a useful identity: k > Ak − 1n 1> n /n = A (In − 1n 1n /n)
(because A row-stochastic)
k = Ak (In − 1n 1> n /n) k k = A(In − 1n 1> = A − 1n 1> n /n) n /n .
(because In − 1n 1> n /n is a projection)
The statement follows from taking the limit as k → ∞ in this identity and by recalling that a matrix is convergent if and only if its spectral radius is less then one.  We are now ready to prove the main theorem in this section. Proof of Theorem 10.2. Regarding the equalities (10.2), the formula for rstep is an immediate consequence of the definition of induced 2-norm: rstep (A) = sup
δ(k)6=0n
kδ(k + 1)k2 k(A − 1n 1> n /n)δ(k)k2 = sup kδ(k)k2 kδ(k)k2 δ(k)6=0n
The equality rasym (A) = ρ(A − 1n 1> n /n) is a consequence of the error dynamics, in Lemma 10.3, statement (ii). Next, note that ρ(A) = 1 is a simple eigenvalue and A is semiconvergent. Hence, by Exercise E2.2 on the Jordan normal form of A, there exists a nonsingular T such that A=T
1 0n−1
 0> n−1 T −1 , B
where B ∈ R(n−1)×(n−1) is convergent, that is, ρ(B) < 1. Moreover we know ρess (A) = ρ(B). Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 10.3. Cumulative quadratic index for symmetric matrices
123
Usual properties of similarity transformations imply k
A =T
1 0n−1
 0> n−1 T −1 , Bk
k
=⇒
lim A = T
k→∞
1
0> n−1
0n−1 0(n−1)×(n−1)
T −1 .
Because A is doubly-stochastic and primitive, we know limk→∞ Ak = 1n 1> n /n so that A can be decomposed as   0 0> > n−1 A = 1n 1n /n + T T −1 , 0n−1 B and conclude with ρess (A) = ρ(B) = ρ(A − 1n 1> n /n). This concludes the proof of the equalities (10.2). The bound (10.3) is an immediate consequence of the definition of induced norm. Finally, we leave to the reader the proof of the bound (10.4), which, once again, relies upon the Jordan block decomposition. Note that the arbitrarily-small positive parameter ε is required because the eigenvalue corresponding to the essential spectral radius may have an algebraic multiplicity strictly larger than its geometric multiplicity. 
10.3
Cumulative quadratic index for symmetric matrices
The previous convergence metrics (per-step convergence factor and asymptotic convergence factor) are worst-case convergence metrics (both are defined with a supremum operation) that are achieved only for particular initial conditions, e.g., the performance predicted by the asymptotic metric rasym (A) is achieved when x(0) − xfinal is aligned with the eigenvector associated to ρess (A) = ρ(A − 1n 1> n /n). However, the average and transient performance may be much better. To study an appropriate average performance, we follow the treatment in (Carli et al. 2009). We consider an averaging algorithm x(k + 1) = Ax(k), defined by a row-stochastic matrix A and subject to random initial conditions x0 satisfying E[x0 ] = 0n ,
and
E[x0 x> 0 ] = In .
Recall the disagreement vector δ(k) defined in (10.5) and the associated disagreement dynamics  δ(k + 1) = A − 1n 1> n /n δ(k) ,
and observe that the initial conditions of the disagreement vector δ(0) satisfy E[δ(0)] = 0n
and
E[δ(0)δ(0)> ] = In − 1n 1> n /n .
To define an average transient and asymptotic performance of this averaging algorithm, we are define the cumulative quadratic index of the matrix A by K  1X  Jcum (A) = lim E kδ(k)k22 . K→∞ n k=0
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
(10.6)
 124
Chapter 10. Convergence Rates, Scalability and Optimization
Theorem 10.4 (Cumulative quadratic index for symmetric matrices). The cumulative quadratic index (10.6) of a row-stochastic, primitive, and symmetric matrix A satisfies Jcum (A) =
1 n
X
λ∈spec(A)\{1}
Proof. Pick a terminal time K ∈ N and define JK (A) = and the disagreement dynamics, we compute
JK (A) =
1 n
1 . 1 − λ2
PK
  kδ(k)k22 . From the definition (10.6)
k=0 E
K    1X trace E δ(k)δ(k)> n k=0
 K  k  k >    1X > > > trace A − 1n 1n /n E δ(0)δ(0) A − 1n 1n /n = n k=0 K  k  k >  1X > > trace A − 1n 1n /n A − 1n 1n /n . = n k=0
Because A is symmetric, also the matrix A − 1n 1> n /n is symmetric and can be diagonalized as A − > , where Q is orthonormal and Λ is a diagonal matrix whose diagonal entries are the 1n 1> /n = QΛQ n  elements of spec A − 1n 1> n /n = {0} ∪ spec(A) \ {1}. It follows that  K >   1X k > k > JK (A) = trace QΛ Q QΛ Q n =
= =
1 n 1 n 1 n
k=0 K X k=0 K X
  trace Λk · Λk X
(because trace(AB) = trace(BA))
λ2k
k=0 λ∈spec(A)\{1}
X
λ∈spec(A)\{1}
1 − λ2(K−1) . 1 − λ2
(because of the geometric series)
The formula for Jcum follows from taking the limit as K → ∞ and recalling that A primitive implies ρess (A) < 1.  Note: All eigenvalues of A appear in the computation of the cumulative quadratic index (10.6), not only the dominant eigenvalue as in the asymptotic convergence factor. Similar results can be obtained for normal matrices, as opposed to symmetric, as illustrated in (Carli et al. 2009); it is not known how to compute the cumulative quadratic index for arbitrary doubly-stochastic primitive matrices. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 10.4. Circulant network examples and scalability analysis
10.4
125
Circulant network examples and scalability analysis
In general it is difficult to compute explicitly the second largest eigenvalue magnitude for an arbitrary matrix. There are some graphs with constant essential spectral radius, independent of the network size n. For example, a complete graph with identical weights and doubly stochastic adjacency matrix A = 1n 1> n /n has ρess (A) = 0. In this case, the associated averaging algorithm converges in a single step. Next, we present an interesting family of examples where all eigenvalues are known. Recall the cyclic balancing problem from Section 1.4, where each bug feels an attraction towards the closest counterclockwise and clockwise neighbors. Given the angular distances between bugs di = θi+1 − θi , for i ∈ {1, . . . , n} (with the usual convention that dn+1 = d1 and d0 = dn ), the closed-loop system is d(k + 1) = An,κ d(k), where κ ∈ [0, 1/2[, and   1 − 2κ κ 0 ··· 0 κ   .. ..  κ . . 1 − 2κ κ 0     ..  .. ..  0 . . κ 1 − 2κ .  . An,κ =    . .. .. .. ..  .. . . . . 0      .. ..  0 . . κ 1 − 2κ κ  κ
0
···
0
1 − 2κ
κ
This matrix is circulant, that is, each row-vector is equal to the preceding row-vector rotated one element to the right. Circulant matrices have remarkable properties (Davis 1994). For example, from Exercise E10.2, the eigenvalues of An,κ can be computed to be (not ordered in magnitude) 2π(i − 1) + (1 − 2κ), for i ∈ {1, . . . , n}. (10.7) n An illustration is given in Figure 10.1. For n even (similar results hold for n odd), plotting the eigenvalues λi = 2κ cos
fk (x) = 2 cos(2⇡x) + (1
2)
1.0 1
=1
i
= f ((i
1)/n), i 2 {1, . . . , n}, n = 5
 = .1 0.5
 = .4
 = .2
2
 = .3
-0.5
0.4
 = .4
5
x
0.0 0.2
=
0.6
0.8
 = .5
1.0
0.2 3
=
0.4
0.6
0.8
1.0
4
-1.0
Figure 10.1: The eigenvalues of An,κ as given in equation (10.7). The left figure illustrate also the case of κ = .5, even if that value is strictly outside the allowed range κ ∈ [0, .5[.
on the segment [−1, 1] shows that ρess (An,κ ) = max{|λ2 |, |λn/2+1 |},
where λ2 = 2κ cos
2π + (1 − 2κ), and λn/2+1 = 1 − 4κ. n
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 126
Chapter 10. Convergence Rates, Scalability and Optimization
If we fix κ ∈ ]0, 1/2[ and consider sufficiently large values of n, then |λ2 | > |λn/2+1 |. In the limit of large graphs n → ∞, the Taylor expansion cos(x) = 1 − x2 /2 + O(x4 ) leads to ρess (An,κ ) = 1 − 4π 2 κ
1 1 + O . n2 n4
Note that ρess (An,κ ) < 1 for any n, but the separation from ρess (An,κ ) to 1, called the spectral gap, shrinks with 1/n2 . In summary, this discussion leads to the broad statement that certain large-scale graphs have slow convergence factors. For more results along these lines (specifically, the case of elegant study of Cayley graphs), we refer to (Carli et al. 2008). These results can also be easily mapped to the eigenvalues of the associated Laplacian matrices; e.g., see Exercise E6.12. We conclude this section by computing the cumulative quadratic cost introduced in Section 10.3. For the circulant network example, one can compute (Carli et al. 2009) C1 ·
1 1 ≤ Jcum (An,κ) ≤ C2 · , n n
where C1 and C2 are positive constants. It is instructive to compare this result with the worst-case asymptotic or per-step convergence factor that scale as ρess (An,κ ) = 1 − 4π 2 κ n12 .
10.5
Design of fastest distributed averaging
We are interested in optimization problems of the form: minimize rasym (A) or rstep (A) subject to A compatible with a digraph G, doubly-stochastic and primitive where A is compatible with G if its only non-zero entries correspond to the edges E of the graph. In > other words, P if Eij = ei ej is the matrix with entry (i, j) equal to one and all other entries equal to zero, then A = (i,j)∈E aij Eij for arbitrary weights aij ∈ R. We refer to such problems as fastest distributed averaging (FDAs) problems. Note: In what follows, we remove the constraint A ≥ 0 to widen the set of matrices of interest. Accordingly, we remove the constraint of A being primitive. Convergence to average consensus is guaranteed by (1) achieving convergence factors less than 1, (2) subject to row-sums and column-sums equal to 1. Problem 1: Asymmetric FDA with asymptotic convergence factor  minimize ρ A − 1n 1> n /n X > subject to A = aij Eij , A1n = 1n , 1> n A = 1n (i,j)∈E
The asymmetric FDA is a hard optimization problem. Even though the constraints are linear, the objective function, i.e., the spectral radius of a matrix, is not convex (and, additionally, not even Lipschitz continuous). Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 10.5. Design of fastest distributed averaging
127
Problem 2: Asymmetric FDA with per-step convergence factor 
 minimize A − 1n 1> n /n 2 X > subject to A = aij Eij , A1n = 1n , 1> n A = 1n (i,j)∈E
Problem 3: Symmetric FDA problem (recall A = A> implies ρ(A) = kAk2 ):  minimize ρ A − 1n 1> n /n X aij Eij , A = A> , A1n = 1n subject to A = (i,j)∈E
Both Problems 2 and 3 are convex and can be rewritten as so-called semi-definite programs (SDPs); see (Xiao and Boyd 2004). An SDP is an optimization problem where (1) the variable is a positive semidefinite matrix, (2) the objective function is linear, and (3) the constraints are affine equations. SDPs can be efficiently solved by software tools such as CVX; see (Grant and Boyd 2014).
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 128
10.6 E10.1
Chapter 10. Convergence Rates, Scalability and Optimization
Exercises Induced norm. Assume A is doubly stochastic, primitive and has a strictly-positive diagonal. Show that rstep (A) = kA − 1n 1> n /nk2 < 1.
E10.2
Eigenpairs for circulant matrices. Let C ∈ Cn×n (c0 , . . . , cn−1 ) such that  c0 c1 cn−1 c0  C= . ..  .. . c1
Show that
c2
be circulant, that is, assume there exists a vector ... ... .. . ...
 cn−1 cn−2   ..  . .  c0
(i) the complex eigenvectors and eigenvalues of C are, for j ∈ {0, . . . , n − 1},  > vj = 1, ωj , ωj2 , · · · , ωjn−1 ,
λj = c0 + c1 ωj + c2 ωj2 + · · · + cn−1 ωjn−1 ,
E10.3
 2jπ √−1  where ωj = exp , j ∈ {0, . . . , n − 1}, are the nth root of unity. n + (ii) for n even and (c0 , c1 , . . . , cn−1 ) = (1 − 2κ, κ, 0, . . . , 0, κ), the eigenvalues are λi = 2κ cos 2π(i−1) n (1 − 2κ) for i ∈ {1, . . . , n}.
Spectral gap of regular ring graphs. A k-regular ring graph is an undirected ring graph with n-nodes each connected to itself and its 2k nearest neighbors with a uniform weight equal to 1/(2k + 1). The associated doubly-stochastic adjacency matrix An,k is a circulant matrix with first row given by  1  1 1 1 . . . 2k+1 0 . . . 0 2k+1 . . . 2k+1 An,k (1, :) = 2k+1 . Using the results in Exercise E10.2, compute
(i) the eigenvalues of An,k as a function of n and k; (ii) the limit of the spectral gap for fixed k as n → ∞; and (iii) the limit of the spectral gap for 2k = n − 1 as n → ∞ .
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Chapter 11
Time-varying Averaging Algorithms In this chapter we discuss time-varying consensus algorithms. We borrow ideas from (Bullo et al. 2009; Hendrickx 2008).
11.1
Examples and models of time-varying discrete-time algorithms
In time-varying or time-varying algorithms the averaging row-stochastic matrix is not constant throughout time, but instead changes values and, possibly, switches among a finite number of values. Here are examples of discrete-time averaging algorithms with switching matrices.
11.1.1
Shared Communication Channel
Given a communication digraph Gshared-comm , at each communication round, only one node can transmit to all its out-neighbors over a common bus and every receiving node will implement a single averaging step. For example, if agent j receives the message from agent i, then agent j will implement: 1 x+ j := (xi + xj ). 2
(11.1)
Each node is allocated a communication slot in a periodic deterministic fashion, e.g., in a round-robin scheduling, where the n agents are numbered and, for each i, agent i talks only at times i, n + i, 2n + i, . . . , kn + i for k ∈ Z≥0 . For example, in Figure 11.1 we illustrate the communication digraph and in Figure 11.2 the resulting round-robin communication protocol.
Gshared-comm
3
4
1
2
Figure 11.1: Example communication digraph
129
 130
Chapter 11. Time-varying Averaging Algorithms 3 1
4 2
3 1
time = 1, 5, 9, . . .
4 2
time = 2, 6, 10, . . .
3 1
4 2
time = 3, 7, 11, . . .
3 1
4 2
time = 4, 8, 12, . . .
Figure 11.2: Round-robin communication protocol.
Formally, let Ai denote the averaging matrix corresponding to the transmission by agent i to its out-neighbors. With round robin scheduling, we have x(n + 1) = An An−1 · · · A1 x(1).
11.1.2
Asynchronous Execution
Imagine each node has a different clock, so that there is no common time schedule. Suppose that messages are safely delivered even if transmitting and receiving agents are not synchronized. Each time an agent wakes up, the available information from its neighbors varies. At an iteration instant for agent i, assuming agent i has new messages/information from agents i1 , . . . , im , agent i will implement: 1 1 x+ i := m+1 xi + m+1 (xi1 + · · · + xim ). Given arbitrary clocks, one can consider the set of times at which one of the n agent performs an iteration. Then the system is a discrete-time averaging algorithm. It is possible to carefully characterize all possible sequences of events (who transmitted to agent i when it wakes up).
11.1.3
Models of time-varying averaging algorithms
Consider a sequence of row-stochastic matrices {A(k)}k∈Z≥0 , or equivalently a time-varying rowstochastic matrix k 7→ A(k). The associated time-varying averaging algorithm is the discrete-time dynamical system x(k + 1) = A(k)x(k),
k ∈ Z≥0 .
(11.2)
We let {G(k)}k∈Z≥0 be the sequence of weighted digraphs associated to the matrices {A(k)}k∈Z≥0 .  Note that (1, 1n ) is an eigenpair for each matrix A(k). Hence, all points in the consensus set α1n | α ∈ R are equilibria for the algorithm. We aim to provide conditions under which each solution converges to consensus. We start with a useful definition, for two digraphs G = (V, E) and G0 = (V 0 , E 0 ), union of G and G0 is defined by G ∪ G0 = (V ∪ V 0 , E ∪ E 0 ). (In what follows, we will need to compute only the union of digraphs with the same set of vertices; in that case, the graph union is essentially defined by the union of the edge sets.) Some useful properties of the product of multiple row-stochastic matrices and of the unions of multiple digraphs are presented in Exercise E11.1. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 11.2. Convergence over time-varying connected graphs
11.2
131
Convergence over time-varying connected graphs
Let us first consider the case when A(k) induces an undirected, connected, and aperiodic graph G(k) at each time k. Theorem 11.1 (Convergence under point-wise connectivity). Let {A(k)}k∈Z≥0 be a sequence of symmetric and doubly-stochastic matrices with associated graphs {G(k)}k∈Z≥0 so that (A1) each non-zero edge weight aij (k), including the self-loops weights aii (k), is larger than a constant ε > 0; and (A2) each graph G(k) is connected and aperiodic point-wise in time.  Then the solution to x(k + 1) = A(k)x(k) converges exponentially fast to average x(0) 1n .
The first assumption in Theorem 11.1 prevents the weights from becoming arbitrarily close to zero as k → ∞ and assures that ρess (A(k)) is upper bounded by a number strictly lower than 1 at every time k ∈ Z≥0 . To gain some intuition into this non-degeneracy assumption, consider a sequence of symmetric and doubly-stochastic averaging matrices {A(k)}k∈Z≥0 with entries given by   exp(−1/(k + 1)α ) 1 − exp(−1/(k + 1)α ) A(k) = 1 − exp(−1/(k + 1)α ) exp(−1/(k + 1)α ) for k ∈ Z≥0 and exponent α ≥ 1. Clearly, for k → ∞ and for any α ≥ 1 this matrix converges to A∞ = [ 01 10 ] with spectrum spec(A∞ ) = {−1, +1} and essential spectral radius ρess (A∞ ) = 1. One can show that, for α = 1, the convergence of A(k) to A∞ is sufficiently slow so that {x(k)}k converges to average(x(0)1n , whereas this property is not satisfied for faster convergence rates α ≥ 2, and the iteration oscillates indefinitely. 1 Proof of Theorem 11.1. Under the assumptions of the theorem, we have that there exists a c ∈ [0, 1[ so that ρess (A(k)) ≤ c < 1 for all l ∈ Z≥0 . Recall the notion of the disagreement vector δ(k) = x(k) − average(x(0))1n and define V (δ) = kδk2 . It is immediate to compute V (δ(k + 1)) = V (A(k)δ(k)) = kA(k)δ(k)k2 ≤ ρess (A(k))2 kδ(k)k2 ≤ c2 V (δ(k)). It follows that V (δ(k)) ≤ c2k V (δ(0)) or kδ(k)k ≤ ck kδ(0)k, that is, δ(k) converges  to zero exponentially fast. Equivalently, as k → ∞, x(k) converges exponentially fast to average x(0) 1n .  The proof idea of Theorem 11.1 is based on the disagreement vector and a so-called common Lyapunov function, that is, a positive function that decreases along the system’s evolutions (we postpone the general definition of Lyapunov function to Chapter 13). The quadratic function V proposed above is useful also for sequences of irreducible and primitive row-stochastic matrices {A(k)}k∈Z≥0 with a common positive left eigenvector associated to the eigenvalue ρ(A(k)) = 1, see Exercise E11.5. If the matrices {A(k)}k∈Z≥0 do not share a common left eigenvector associated to the eigenvalue ρ(A(k)) = 1, then 1
To understand the essence of this example, consider the scalar iteration x(k + 1) = exp(−1/(k + 1)α )x(k). In logarithmic P 1 coordinates the solution is given by log(x(k)) = − k−1 k=0 (k+1)α x0 . For α = 1, log(x(k → ∞)) diverges to −∞, and x(k → ∞) converges. Likewise, for α > 1, log(x(k → ∞)) exists, and thus x(k → ∞) does not converge to zero. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 132
Chapter 11. Time-varying Averaging Algorithms
there exists generally no common quadratic Lyapunov function of the form V (δ) = δ > P δ with P being a positive definite matrix; e.g., see (Olshevsky and Tsitsiklis 2008). Likewise, if a sequence of symmetric matrices {A(k)}k∈Z≥0 does not induce a connected and aperiodic graph point-wise in time, then the above analysis fails, and we need to search for non-quadratic common Lyapunov functions.
11.3
Convergence over digraphs connected over time
We are now ready to state the main result in this chapter, originally due to Moreau (2005). Theorem 11.2 (Consensus for time-varying algorithms). Let {A(k)}k∈Z≥0 be a sequence of rowstochastic matrices with associated digraphs {G(k)}k∈Z≥0 . Assume that (A1) each digraph G(k) has a self-loop at each node; (A2) each non-zero edge weight aij (k), including the self-loops weights aii (k), is larger than a constant ε > 0; and (A3) there exists a duration δ ∈ N such that, for all times k ∈ Z≥0 , the digraph G(k) ∪ · · · ∪ G(k + δ − 1) contains a globally reachable node. Then (i) there exists a nonnegative w ∈ Rn normalized to w1 + · · · + wn = 1 such that limk→∞ A(k)·A(k − 1) · · · · · A(0) = 1n w> ;  (ii) the solution to x(k + 1) = A(k)x(k) converges exponentially fast to w> x(0) 1n ;
(iii) if additionally each matrix in the sequence is doubly-stochastic, then w = n1 1n so that  lim x(k) = average x(0) 1n .
k→∞
Note: In a sequence with property (A2), edges can appear and disappear, but the weight of each edge (that appears an infinite number of times) does not go to zero as k → ∞. Note: This result is analogous to the time-invariant result that we saw in Chapter 5. The existence of a globally reachable node is the connectivity requirement in both cases. Note: Assumption (A3) is a uniform connectivity requirement, that is, any interval of length δ must have the connectivity property. In equivalent words, the connectivity property holds for any contiguous interval of duration δ. Note: the theorem provides only a sufficient condition. For results on necessary and sufficient conditions we refer the reader to the recent works (Blondel and Olshevsky 2014; Xia and Cao 2014) and references therein.
11.3.1
Shared communication channel with round robin scheduling
Consider the shared communication channel model with round-robin scheduling. Assume the algorithm is implemented over a communication graph Gshared-comm that is strongly connected. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 11.3. Convergence over digraphs connected over time
133
Consider now the assumptions in Theorem 11.2. Assumptions (A1) is satisfied because in equation (11.1) the self-loop weight is equal to 1/2. Similarly, Assumption (A2) is satisfied because the edge weight is equal to 1/2. Finally, Assumption (A3) is satisfied with duration δ selected equal to n, because after n rounds each node has transmitted precisely once and so all edges of the communication graph Gshared-comm are present in the union graph. Therefore, the algorithm converges to consensus. However, the algorithm does not converge to average consensus since it is false that the averaging matrices are doubly-stochastic. Note: round robin is not necessarily the only scheduling protocol with convergence guarantees. Indeed, consensus is achieved so long as each node is guaranteed a transmission slot once every bounded period of time.
11.3.2
Convergence theorems for symmetric time-varying algorithms
Theorem 11.3 (Consensus for symmetric time-varying algorithms). Let {A(k)}k∈Z≥0 be a sequence of symmetric row-stochastic matrices with associated undirected graphs {G(k)}k∈Z≥0 . Let the matrix sequence {A(k)}k∈Z≥0 satisfy Assumptions (A1), (A2) in Theorem 11.2 as well as (A4) for all k ∈ Z≥0 , the graph ∪τ ≥k G(τ ) is connected. Then (i) limk→∞ A(k)·A(k − 1) · · · · · A(0) = n1 1n 1> n;
 (ii) each solution to x(k + 1) = A(k)x(k) converges exponentially fast to average x(0) 1n .
Note: this result is analogous to the time-invariant result that we saw in Chapter 5. For symmetric rowstochastic matrices and undirected graphs, the connectivity of an appropriate graph is the requirement in both cases. Note: Assumption (A3) in Theorem 11.2 requires the existence of a finite time-interval of duration δ so that the union graph ∪k≤τ ≤δ−1 G(τ ) contains a globally reachable node for all times k ≥ 0. This assumption is weakened in the symmetric case in Theorem 11.3 to Assumption (A4) requiring that the union graph ∪τ ≥k G(τ ) is connected for all times k ≥ 0.
11.3.3
Uniform connectivity is required for non-symmetric matrices
We have learned that for asymmetric matrices a uniform connectivity property (A3) is required, whereas for symmetric matrices, uniform connectivity is not required (see (A4)). Here is a counter-example from (Hendrickx 2008) showing that Assumption (A3) cannot be relaxed for asymmetric graphs. Initialize a group of n = 3 agents to x1 < −1, x2 < −1, x3 > +1. + + Step 1: Perform x+ 1 := (x1 + x3 )/2, x2 := x2 , x3 := x3 a number of times δ1 until
x1 > +1,
x2 < −1,
x3 > +1.
+ + Step 2: Perform x+ 1 := x1 , x2 := x2 , x3 := (x2 + x3 )/2 a number of times δ2 until
x1 > +1,
x2 < −1,
x3 < −1.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 134
Chapter 11. Time-varying Averaging Algorithms
+ + Step 3: Perform x+ 1 := x1 , x2 := (x1 + x2 )/2, x3 := x3 a number of times δ3 until
x1 > +1,
x2 > +1,
x3 < −1.
And repeat this process.
1
2
3
Step 1
1
∪ 2
1
∪ 3
Step 2
2
1
= 3
Step 3
2
3
union
Observe that on steps 1, 7, 15, . . . , the variable x1 is made to become larger than +1 by computing averages with x3 > +1. Note that every time this happens the variable x3 > +1 is increasingly smaller and closer to +1. Hence, δ1 < δ7 < δ15 < . . . , that is, it takes more steps for x1 to become larger than +1. Indeed, one can formally show the following: (i) The agents do not converge to consensus. (ii) Hence, one of the assumptions of Theorem 11.2 must be violated. (iii) It is easy to see that (A1) and (A2) are satisfied. (iv) Regarding connectivity, note that, for all k ∈ Z≥0 , the digraph ∪τ ≥k G(τ ) contains a globally reachable node. However, this property is not quite (A3). (v) Assumption (A3) in Theorem 11.2 must be violated: there does not exist a duration δ ∈ N such that, for all k ∈ Z≥0 , the digraph G(k) ∪ · · · ∪ G(k + δ − 1) contains a globally reachable node. (vi) Indeed, one can show that limk→∞ δk = ∞ so that, as we keep iterating Steps 1+2+3, their duration grows unbounded.
11.4
Analysis methods and proofs
It is well known that, for time-varying systems, the analysis of eigenvalues is not sufficient anymore! In the following example, two matrices with spectral radius equal to 1/2 are multiplied to obtain a spectral radius larger than 1: 1  1  5  2 1 2 0 = 4 0 . 0 0 1 0 0 0 Hence, it is not possible to predict the convergence of arbitrary products of matrices, just based on their spectral radii and we need to work harder and with sharper tools. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 11.4. Analysis methods and proofs
11.4.1
135
Bounded solutions and non-increasing max-min function
In what follows, we propose a so-called contraction analysis based on a common Lyapunov function (which is not quadratic). We start by defining the max-min function Vmax-min : Rn → R≥0 by Vmax-min (x) = max(x1 , . . . , xn ) − min(x1 , . . . , xn ) =
max xi −
i∈{1,...,n}
min
i∈{1,...,n}
xi .
Note that: (i) Vmax-min (x) ≥ 0, and
(ii) Vmax-min (x) = 0 if an only if x = α1n for some α ∈ R. Lemma 11.4 (Monotonicity and bounded evolutions). If A is row-stochastic, then for all x ∈ Rn Vmax-min (Ax) ≤ Vmax-min (x). For any sequence of row-stochastic matrices, the solution x(k) of the corresponding time-varying averaging algorithm satisfies, from any initial condition x(0) and at any time k, and
Vmax-min (x(k)) ≤ Vmax-min (x(0)),
min x(0) ≤ min x(k) ≤ min x(k + 1) ≤ max x(k + 1) ≤ max x(k) ≤ max x(0). Proof. For the maximum, let us compute: max(Ax)i = max i
i
n X j=1
aij xj ≤ max i
aij max xj =
n X
n  X    aij min xj = min aij min xj = 1 · min xi .
j=1
j
max i
n X
n X
j=1
aij
 max xj = 1 · max xi . j
i
Similarly, for the minimum, min(Ax)i = min i
i
n X j=1
aij xj ≥ min i
j=1
j
i
j=1
j
i
Connectivity over time Lemma 11.5 (Global reachability over time). Given a sequence of digraphs {G(k)}k∈Z≥0 such that each digraph G(k) has a self-loop at each node, the following two properties are equivalent: (i) there exists a duration δ ∈ N such that, for all times k ∈ Z≥0 , the digraph G(k) ∪ · · · ∪ G(k + δ − 1) contains a directed spanning tree; (ii) there exists a duration ∆ ∈ N such that, for all times k ∈ Z≥0 , there exists a node j = j(k) that reaches all nodes i ∈ {1, . . . , n} over the interval {k, k + ∆ − 1} in the following sense: there exists a sequence of nodes {j, h1 , . . . , h∆−1 , i} such that (j, h1 ) is an edge at time k, (h1 , h2 ) is an edge at time k + 1, . . . , (h∆−2 , h∆−1 ) is an edge at time k + ∆ − 2, and (h∆−1 , i) is an edge at time k + ∆ − 1; Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 136
Chapter 11. Time-varying Averaging Algorithms
or, equivalently, for the reverse digraph, (iii) there exists a duration δ ∈ N such that, for all times k ∈ Z≥0 , the digraph G(k) ∪ · · · ∪ G(k + δ − 1) contains a globally reachable node; (iv) there exists a duration ∆ ∈ N such that, for all times k ∈ Z≥0 , there exists a node j reachable from all nodes i ∈ {1, . . . , n} over the interval {k, k + ∆ − 1} in the following sense: there exists a sequence of nodes {j, h1 , . . . , h∆−1 , i} such that (h1 , j) is an edge at time k, (h2 , h1 ) is an edge at time k + 1, . . . , (h∆−1 , h∆−2 ) is an edge at time k + ∆ − 2, and (i, h∆−1 ) is an edge at time k + ∆ − 1. Note: It is sometimes easy to see if a sequence of digraphs satisfies properties (i) and (iii). Property (iv) is directly useful in the analysis later in the chapter. Regarding the proof of the lemma, it is easy to check that (ii) implies (i) and that (iv) implies (iii) with δ = ∆. The converse is left as Exercise E11.3.
11.4.2
Proof of Theorem 11.2: the max-min function is exponentially decreasing
This proof is inspired by the presentation in (Hendrickx 2008, Theorem 9.2). We start by noting that Assumptions (A1) and (A3) imply property Lemma 11.5(iv) about the existence of a duration ∆ with certain properties. Next, without loss of generality, we assume that at some time h∆, for some h ∈ N, the solution x(h∆) is not equal to a multiple of 1n and, therefore, satisfies Vmax-min (x(h∆)) > 0. Clearly, x((h + 1)∆) = A((h + 1)∆ − 1) · · · A(h∆ + 1) · A(h∆) x(h∆). By Assumption (A3), we know that there exists a node j reachable from all nodes i over the interval {h∆, (h + 1)∆ − 1} in the following sense: there exists a sequence of nodes {j, h1 , . . . , h∆−1 , i} such that all following edges exist in the sequence of digraphs: (h1 , j) at time h∆, (h2 , h1 ) at time h∆ + 1, . . . , (i, h∆−1 ) at time (h + 1)∆ − 1. Therefore, Assumption (A2) implies    ah1 ,j h∆ ≥ ε, ah2 ,h1 h∆ + 1 ≥ ε, . . . , ai,h∆−1 (h + 1)∆ − 1 ≥ ε,
and therefore their product satisfies     ai,h∆−1 (h + 1)∆ − 1 · ah∆−1 ,h∆−2 (h + 1)∆ − 2 · · · ah2 ,h1 h∆ + 1 · ah1 ,j h∆ ≥ ε∆ .
Remarkably, this product is one term in the (i, j) entry of the row-stochastic matrix A := A((h + 1)∆ − 1) · · · A(h∆). In other words, Assumption (A3) implies Aij ≥ ε∆ . Hence, for all nodes i, given globally reachable node j during interval {h∆, (h + 1)∆}, we compute Xn  xi (h + 1)∆ = Ai,j xj (h∆) + Ai,p xp (h∆) (by definition) p6=j,p=1   ≤ Ai,j xj (h∆) + (1 − Ai,j ) max x(h∆) (because xp (h∆) ≤ max x(h∆) )    ≤ max x(h∆) + Ai,j xj (h∆) − max x(h∆)   ≤ ε∆ xj (h∆) + (1 − ε∆ ) max x(h∆) . (because xj (h∆) ≤ max x(h∆) ) A similar argument leads to
  xi (h + 1)∆ ≥ ε∆ xj (h∆) + (1 − ε∆ ) min x(h∆) ,
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 11.5. Time-varying algorithms in continuous-time
137
so that    Vmax-min x((h + 1)∆) ≤ max xi (h + 1)∆ − min xi (h + 1)∆ i i  ∆ ≤ (1 − ε )Vmax-min x(h∆) .
This final inequality, together with Lemma 11.4, proves exponential convergence of the cost function k 7→ Vmax-min (x(k)) to zero and convergence of x(k) to a multiple of 1n . We leave the other statements in Theorem 11.2 to the reader and refer to (Hendrickx 2008; Moreau 2005) for further details.
11.5
Time-varying algorithms in continuous-time
We now consider the continuous-time linear time-varying system x(t) ˙ = −L(t)x(t). We associate a time-varying graph G(t) (without self loops) to the time-varying Laplacian L(t) in the usual manner. For example, in Chapter 7, we discussed how the heading in some flocking models is described by the continuous-time Laplacian flow: θ˙ = −Lθ,
where each θ is the heading of a bird, and where L is the Laplacian of an appropriate weighted digraph G: each bird is a node and each directed edge (i, j) has weight 1/dout (i). We discussed also the need to consider time-varying graphs: birds average their heading only with other birds within sensing range, but this sensing relationship may change with time. Recall that the solution to a continuous-time time-varying system can be given in terms of the state transition matrix: x(t) = Φ(t, 0)x(0),
We refer to (Hespanha 2009) for the proper definition and study of the state transition matrix.
11.5.1
Undirected graphs
We first consider the case when L(t) induces an undirected and connected graph G(t) for all t ∈ R≥0 . Theorem 11.6 (Convergence under point-wise connectivity). Let t 7→ L(t) = L(t)> be a timevarying Laplacian matrix with associated time-varying digraph t 7→ G(t), t ∈ R≥0 . Assume (A1) each non-zero edge weight aij (t) is larger than a constant ε > 0,
(A2) for all t ∈ R≥0 , the digraph associated to the symmetric Laplacian matrix L(t) is undirected and connected.
Then
(i) the state transition matrix Φ(t, 0) associated to −L(t) satisfies limt→∞ Φ(t, 0) = 1n 1> n /n,
(ii) the solution to x(t) ˙ = −L(t)x(t) converges exponentially fast to  lim x(t) = average x(0) 1n . t→∞
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 138
Chapter 11. Time-varying Averaging Algorithms
The first assumption in Theorem 11.1 prevents that the weights become arbitrarily close to zero as t → ∞, and it assures that λ2 (L) is strictly positive for all t ∈ R≥0 . To see the necessity of this non-degeneracy assumption, consider the time-varying Laplacian ˜ L(t) = a(t)L
(11.3)
˜=L ˜ > is a symmetric time-invariant Laplacian where a : R≥0 → R≥0 is piece-wise continuous, and L matrix. It can be verified that solution to x(t) ˙ = −L(t)x(t) is given by 
˜ x(t) = exp(−L(t))x0 = exp −L
Z
t
a(τ )dτ
0
  ˜ · exp (A(t) − A(0)) x0 , x0 = exp −L
  ˜ · A(0) . where d/dt A(t) = a(t). If a(t) is integrable on [0, ∞[, then exp(−L(t)) converges to exp L
In analogy to Theorem 11.1, Theorem 11.6 can be proved by considering the norm of the disagreement vector V (δ) = kδk2 as a common Lyapunov function. As in the continuous-time case, this quadratic Lyapunov function has some fundamental limitations pointed out by Moreau (2004). We review these limitations in the following theorem, extension of Lemma 6.2.
Theorem 11.7 (Limitations of quadratic Lyapunov functions). Let L be a Laplacian matrix associated with a weighted digraph G. The following statements are equivalent: (i) L + L> is negative semi-definite; (ii) L has zero column sums, that is, G is weight-balanced; (iii) the sum of squares function V (δ) = kδk2 is strictly decreasing along trajectories of the Laplacian flow x˙ = −Lx; and
(iv) every convex function V (x) invariant under coordinate permutations is non-increasing along the trajectories of x˙ = −Lx.
Proof sketch. The equivalence of statements (i) and (ii) has been shown in Lemma 6.2. The equivalence of (i) and (iii) can be proved with a Lyapunov argument similar to the discrete-time case; see Theorem 11.1. The implication (iv) =⇒ (iii) is trivial. To complete the proof, we show that (ii) =⇒ (iv). Recall that the matrix exponential of a Laplacian matrix exp(−Lt) is a nonnegative doubly stochastic matrix (see Exercise E7.3) that can be decomposed into a convex combination of finitely many permutation P matrices by the the Birkhoff-Von-Neumann Theorem (see Exercise E2.14). In particular, exp(−Lt) = i λi (t)Pi , where Pi are permutation matrices and λi (t) are convex coefficients for every t ≥ 0. By convexity of V (x) and invariance under coordinate permutations we have for any initial condition x0 ∈ Rn and for any t ≥ 0 X  X X V (exp(−Lt)x0 ) = V λi (t)Pi x0 ≤ λi (t)V (Pi x0 ) = λi (t)V (x0 ) = V (x0 ) . i
i
i
It follows that V (δ) = kδk2 serves as a common Lyapunov function for the time-varying Laplacian flow x(t) ˙ = −L(t)x(t) only if L(t) is weight-balanced and connected point-wise in time. To partially Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 11.5. Time-varying algorithms in continuous-time
139
remedy these strong assumptions, consider now the case when L(t) induces an undirected graph at any point in time t ≥ 0 and an integral connectivity condition holds similar to the discrete-time case. To motivate the general case, recall the example in (11.3) with a single time-varying parameter a(t). In this simple R ∞ example, a necessary and sufficient condition for convergence to consensus was that the integral 0 a(τ )dτ is divergent. The following result from (Hendrickx and Tsitsiklis 2013) generalizes this case.
Theorem 11.8 (Convergence under integral connectivity). Let t 7→ A(t) = A(t)> be a time-varying symmetric adjacency matrix. Consider an associated undirected graph G = (V, E), t ∈ R≥0 , that has an edge R∞ (i, j) ∈ E if 0 aij (τ )dτ is divergent. Assume (A1) each non-zero edge weight aij (t) is larger than a constant ε > 0,
(A2) the graph G is connected. Then (i) the state transition matrix Φ(t, 0) associated to −L(t) satisfies limt→∞ Φ(t, 0) = 1n 1> n /n,
(ii) the solution to x(t) ˙ = −L(t)x(t) converges exponentially fast to  lim x(t) = average x(0) 1n . t→∞
Theorem 11.8 is the continuous-time analog of Theorem 11.3. We remark that the original statement in (Hendrickx and Tsitsiklis 2013) does not require Assumption (A1) thus allowing for weights such as aij = 1/t which lead to non-uniform convergence, i.e., the convergence rate depends on the time t0 when the system is initialized. The proof method of Theorem 11.8 is based on the fact that the minimal (respectively maximal) element of x(t), the sum of the two smallest (respectively two largest) elements, the sum of the three smallest (respectively three largest) elements, etc., are all bounded and non-decreasing (respectively non-increasing). A continuity argument can then be used to show average consensus.
11.5.2
Directed graphs
The proof method of Theorem 11.8 does not extend to general non-symmetric Laplacian matrices. If we use the max-min function Vmax-min (x) = maxi∈{1,...,n} xi − mini∈{1,...,n} xi as a common Lyapunov function candidate, then we arrive at the following general result (Lin et al. 2007; Moreau 2004). Theorem 11.9 (Consensus for time-varying algorithms in continuous time). Let t 7→ A(t) be a time-varying adjacency matrix with associated time-varying digraph t 7→ G(t), t ∈ R≥0 . Assume (A1) each non-zero edge weight aij (t) is larger than a constant ε > 0, (A2) there exists a duration T > 0 such that, for all t ∈ R≥0 , the digraph associated to the adjancency matrix Z
t+T
A(τ )dτ
t
contains a globally reachable node. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 140
Chapter 11. Time-varying Averaging Algorithms
Then (i) there exists a nonnegative w ∈ Rn normalized to w1 + · · · + wn = 1 such that the state transition matrix Φ(t, 0) associated to −L(t) satisfies limt→∞ Φ(t, 0) = 1n w> ,  (ii) the solution to x(t) ˙ = −L(t)x(t) converges exponentially fast to w> x(0) 1n ,
> (iii) if additionally, the 1> n L(t) = 0n for almost all times t (that is, the digraph is weight-balanced at all times, except a set of measure zero), then w = n1 1n so that  lim x(t) = average x(0) 1n . t→∞
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 11.6. Exercises
11.6 E11.1
141
Exercises On the product of stochastic matrices (Jadbabaie et al. 2003). Let k ≥ 2 and A1 , A2 , . . . , Ak be nonnegative n×n matrices with positive diagonal entries. Let amin (resp. amax ) be the smallest (resp. largest) diagonal entry of A1 , A2 , . . . , Ak and let G1 , . . . , Gk be the digraphs associated with A1 , . . . , Ak . Show that  2 k−1 amin (A1 + A2 + · · · + Ak ), and (i) A1 A2 · · · Ak = 2amax (ii) if the digraph G1 ∪ . . . ∪ Gk is strongly connected, then the matrix A1 · · · Ak is irreducible.
Hint: Set Ai = amin In + Bi for a nonnegative Bi , and show statement (i) by induction on k. E11.2 E11.3 E11.4
Products of primitive matrices with positive diagonal. Let A1 , A2 , . . . , An−1 be primitive n × n matrices with positive diagonal entries. Show that A1 A2 · · · An−1 > 0. A simple proof. Prove Lemma 11.5. Hint: You will want to use Exercise E3.5.
Alternative sufficient condition. As in Theorem 11.2, let {A(k)}k∈Z≥0 be a sequence of row-stochastic matrices with associated digraphs {G(k)}k∈Z≥0 . Prove that the same asymptotic properties in Theorem 11.2 hold true under the following Assumption (A5), instead of Assumptions (A1), (A2) and (A3): (A5) there exists a node j such that, for all times k ∈ Z≥0 , each edge weight aij (k), i ∈ {1, . . . , n}, is larger than a constant ε > 0. In other words, Assumption (A5) requires that all digraphs G(k) contain all edges aij (k), i ∈ {1, . . . , n}, and that all these edges have weights larger than a strictly positive constant. Hint: Modify the proof of Theorem 11.2.
E11.5
Convergence for strongly-connected graphs point-wise in time: discrete time. Consider a sequence {A(k)}k∈Z≥0 of row-stochastic matrices with associated graphs {G(k)}k∈Z≥0 so that
(A1) each non-zero edge weight aij (k), including the self-loops weights aii (k), is larger than a constant ε > 0; (A2) each graph G(k) is strongly connected and aperiodic point-wise in time; and (A3) there is a positive vector w ∈ Rn satisfying w> 1n = 1 and w> A(k) = w> for all k ∈ Z≥0 .
Without relying on Theorem 11.2, show that the solution to x(k + 1) = A(k)x(k) converges to limk→∞ x(k) = w> x(0)1n . Hint: Search for a common quadratic Lyapunov function. E11.6
Convergence for strongly-connected graphs point-wise in time: continuous time. Let t 7→ L(t) be a time-varying Laplacian matrix with associated time-varying digraph t 7→ G(t), t ∈ R≥0 so that
(A1) each non-zero edge weight aij (t) is larger than a constant ε > 0; (A2) each graph G(t) is strongly connected point-wise in time; and > > (A3) there is a positive vector w ∈ Rn satisfying 1> n w = 1 and w L(t) = 0n for all t ∈ R≥0 .
Without relying on Theorem 11.9, show that the solution to x(t) ˙ = −L(t)x(t) satisfies limt→∞ x(t) = w> x(0)1n . Hint: Search for a common quadratic Lyapunov function.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 142
Chapter 11. Time-varying Averaging Algorithms
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Chapter 12
Randomized Averaging Algorithms In this chapter we discuss averaging algorithms defined by sequences of random stochastic matrices. In other words, we imagine that at each discrete instant, the averaging matrix is selected randomly according to some stochastic model. We refer to such algorithms as randomized averaging algorithms. Randomized averaging algorithms are well behaved and easy to study in the sense that much information can be learned simply from the expectation of the averaging matrix. Also, as compared with time-varying algorithms, it is possible to study convergence rates for randomized algorithms. In this chapter we present results from (Fagnani and Zampieri 2008; Frasca 2012; Garin and Schenato 2010; Tahbaz-Salehi and Jadbabaie 2008).
12.1
Examples of randomized averaging algorithms
Consider the following models of randomized averaging algorithms. Uniform Symmetric Gossip. Given an undirected graph G, at each iteration, select uniformly likely one of the graph edges, say agents i and j talk, and they both perform (1/2, 1/2) averaging, that is: xi (k + 1) = xj (k + 1) :=
 1 xi (k) + xj (k) . 2
A detailed analysis of this model is given by Boyd et al. (2006).
Packet Loss in Communication Network. Given a strongly connected and aperiodic digraph, at each communication round, packets travel over directed edges and, with some likelihood, each edge may drop the packet. (If information is not received, then the receiving node can either do no update whatsoever, or adjust its averaging weights to compensate for the packet loss). Broadcast Wireless Communication. Given a digraph, at each communication round, a randomlyselected node transmits to all its out-neighbors. (Here we imagine that simultaneous transmissions are prohibited by wireless interference.) Opinion Dynamics with Stochastic Interactions and Prominent Agents. (Somehow similar to uniform gossip) Given an undirected graph and a probability 0 < p < 1, at each iteration, select 143
 144
Chapter 12. Randomized Averaging Algorithms uniformly likely one of the graph edges and perform: with probability p both agents perform the (1/2, 1/2) update, and with probability (1 − p) only one agent performs the update and the “prominent agent” does not. A detailed analysis of this model is given by (Acemoglu and Ozdaglar 2011).
Note that, in the second, third and fourth example models, the row-stochastic matrices at each iteration are not symmetric in general, even if the original digraph was undirected.
12.2
A brief review of probability theory
We briefly review a few basic concepts from probability theory and refer the reader for example to (Breiman 1992). • Loosely speaking, a random variable X : Ω → E is a measurable function from the set of possible outcomes Ω to some set E which is typically a subset of R. • The probability of an event (i.e., a subset of possible outcomes) is the measure of the likelihood that the event will occur. An event occurs almost surely if it occurs with probability equal to 1. • The random variable X is called discrete if its image is finite or countably infinite. In this case, X is described by a probability mass function assigning a probability to each value in the image of X. Specifically, if X takes value in {x1 , . . . , xM } ⊂ P R, then the probability mass function p : {x1 , . . . , xM } → [0, 1] satisfies pX (xi ) ≥ 0 and ni=1 pX (xi ) = 1, and determines the probability of X being equal to xi by P[X = xi ] = pX (xi ).
• The random variable X is called continuous if its image is uncountably infinite. If X is an absolutely continuous function, X is described by a probability density function assigning a probability to intervals in the image of X. Specifically, if RX takes value in R, then the probability density function fX : R → [0, 1] satisfies f (x) ≥ 0 and R f (x)dx = 1, and determines the probability of X taking value in the interval Rb [a, b] by P[a ≤ X ≤ b] = a f (x)dx. P • The expected value of a discrete variable is E[X] = M i=1 xi pX (xi ). R∞ The expected value of a continuous variable is E[X] = −∞ xfX (x)dx.
• A (finite or infinite) sequence of random variables is independent and identically distributed (i.i.d.) if each random variable has the same probability mass/distribution as the others and all are mutually independent.
12.3
Randomized averaging algorithms
In this section we consider random sequences of row stochastic sequences. Accordingly, let A(k) be the row-stochastic averaging matrix occurring randomly at time k and G(k) be its associated graph. We then consider the stochastic linear system x(k + 1) = A(k)x(k). Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 12.3. Randomized averaging algorithms
145
We now present the main result of this chapter; for its proof we refer to (Tahbaz-Salehi and Jadbabaie 2008), see also (Fagnani and Zampieri 2008). Theorem 12.1 (Consensus for randomized algorithms). Let {A(k)}k∈Z≥0 be a sequence of random row-stochastic matrices with associated digraphs {G(k)}k∈Z≥0 . Assume (A1) the sequence of variables {A(k)}k∈Z≥0 is i.i.d.,
(A2) at each time k, random matrix A(k) has strictly positive diagonal so that each digraph in the sequence {G(k)}k∈Z≥0 has a self-loop at each node almost surely, and
(A3) the digraph associated to the expected matrix E[A(k)], for any k, has a globally reachable node. Then the following statements hold almost surely: (i) there exists a random nonnegative vector w ∈ Rn with w1 + · · · + wn = 1 such that lim A(k)·A(k − 1) · · · · · A(0) = 1n w>
k→∞
almost surely,
(ii) as k → ∞, each solution x(k) of x(k + 1) = A(k)x(k) satisfies  lim x(k) = w> x(0) 1n almost surely, k→∞
(iii) if additionally each random matrix is doubly-stochastic, then w = n1 1n so that  lim x(k) = average x(0) 1n . k→∞
Note: if each random matrix is doubly-stochastic, then E[A(k)] is doubly-stochastic. The converse is easily seen to be false.
12.3.1
Additional results on uniform symmetric gossip algorithms
Recall: given undirected graph G with edge set E, at each iteration, select uniformly likely one of the graph edges, say agents i and j talk, and they both perform (1/2, 1/2) averaging, that is: xi (k + 1) = xj (k + 1) :=
 1 xi (k) + xj (k) . 2
Corollary 12.2 (Convergence for uniform symmetric gossip). If the graph G is connected, then each solution to the uniform symmetric gossip converges to average consensus with probability 1. Proof based on Theorem 12.1. The corollary can be established by verifying that Assumptions (A1)–(A3) in Theorem 12.1 are satisfied. Regarding (A3), note that the graph associated to the expected averaging matrix is G.  We here provide a simple interesting proof by (Frasca 2012). Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 146
Chapter 12. Randomized Averaging Algorithms
Proof based on Theorem 11.3. For any time k0 ≥ 0 and any edge (i, j) ∈ E, consider the event “the edge (i, j) is not selected for update at any time larger than k0 .” Since the probability that (i, j) is not selected at any time k is 1 − 1/|E|, the probability that (i, j) is not selected at any times after k0 is lim
k→∞
1−
1 k−k0 = 0. |E|
With this fact one can verify that all assumptions in Theorem 11.3 are satisfied by the random sequence of matrices almost surely. Hence, almost sure convergence follows. Finally, since each matrix is doubly stochastic, average(x(k) is preserved, and the solution converges to average(x(0))1n . 
12.3.2
Additional results on the mean-square convergence factor
Given a sequence of stochastic averaging matrices {A(k)}k∈Z≥0 and corresponding solutions x(k) to x(k + 1) = A(k)x(k), we define the mean-square convergence factor by 
rmean-square {A(k)}k∈Z≥0 =
sup
lim sup
x(0)6=xfinal
k→∞
h
E kx(k) − average(x(k))1n k22
i
!1/k
.
Theorem 12.3 (Upper and lower bounds on the mean-square convergence factor). Under the same assumptions as in Theorem 12.1, the mean-square convergence factor satisfies  h i 2 ρess E[A(k)] ≤ rmean-square ≤ ρ E A(k)> (In − 1n 1> n /n)A(k) .
Proof. For a comprehensive proof we refer to (Fagnani and Zampieri 2008, Proposition 4.4). Here we prove the upper bound for symmetric matrices. Consider the disagreement vector δ(k) = x(k) − average(x(0))1n = x(k) − average(x(k))1n following the dynamics δ(k + 1) = (A(k) − 1n 1> n /n)δ(k). We have that 
rmean-square {A(k)}k∈Z≥0 = sup lim sup δ(0)6=0n
k→∞
E
h
h
kδ(k))k22
! i 1/k
> 2 = sup lim sup E k(A(k) − 1n 1> n /n) · · · (A(1) − 1n 1n /n)δ(0))k2 δ(0)6=0n
≤
k→∞
h
2 lim sup E k(A(k) − 1n 1> n /n)k2 k→∞
i
i
!1/k
! h i 1/k 2 · · · E k(A(1) − 1n 1> n /n)k2
h i 2 = lim sup E k(A(k) − 1n 1> n /n)k2 , k→∞ h i 2 = E k(A(k) − 1n 1> /n)k for any k, n 2 ,
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 12.3. Randomized averaging algorithms
147
where we used sub-multiplicativity of the matrix norm and the fact that A(k) is i.i.d.. The upper bound follows from   2 > > > k(A(k) − 1n 1> /n)k = ρ (A(k) − 1 1 /n) (A(k) − 1 1 /n) n n n n n 2   = ρ A(k)> (In − 1n 1> n /n)A(k) , where we used the properties (see Lemma 10.3 and its proof ) of the projector matrix (In − 1n 1> n /n).
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 148
Chapter 12. Randomized Averaging Algorithms
12.4
Table of asymptotic behaviors for averaging systems Dynamics
Assumptions & Asymptotic Behavior
References
discrete-time: x(k + 1) = Ax(k), A row-stochastic adjacency matrix of digraph G
G has a globally reachable node Thm 5.2 =⇒ limk→∞ x(k) = (w> x(0))1n , where w ≥ 0, w> A = w> , and 1> nw = 1
continuous-time: x(t) ˙ = −Lx(t), L Laplacian matrix of digraph G
G has a globally reachable node Thm 7.3 =⇒ limt→∞ x(t) = (w> x(0))1n , > where w ≥ 0, w> L = 0> n , and 1n w = 1
time-varying discrete-time: x(k + 1) = A(k)x(k), A(k) row-stochastic adjacency matrix of digraph G(k), k ∈ Z≥0
(i) at each time k, G(k) has self-loop at each node, Thm 11.2 (ii) each aij (k) ≥ 0 is larger than ε > 0, (iii) there exists duration δ s.t., for all time k, G(k) ∪ · · · ∪ G(k + δ − 1) has a globally reachable node =⇒ limk→∞ x(k) = (w> x(0))1n , where w ≥ 0, 1> nw = 1
time-varying symmetric discrete-time: x(k + 1) = A(k)x(k), A(k) symmetric stochastic adjacency of G(k), k ∈ Z≥0
(i) at each time k, G(k) has self-loop at each node, (ii) each aij (k) ≥ 0 is larger than ε > 0, (iii) for all time k, ∪τ ≥k G(τ ) is connected =⇒  limk→∞ x(k) = average x(0) 1n
Thm 11.3
time-varying continuous-time: x(t) ˙ = −L(t)x(t), L(t) Laplacian matrix of digraph G(t), t ∈ R≥0
Thm 11.9 (i) each aij (k) ≥ 0 is larger than ε > 0, (ii) there exists duration T s.t., for all time t, R t+T digraph associated to t A(τ )dτ has a globally reachable node =⇒ limk→∞ x(k) = (w> x(0))1n , where w ≥ 0, 1> nw = 1
randomized discrete-time: x(k + 1) = A(k)x(k), A(k) random row-stochastic adjacency matrix of digraph G(k), k ∈ Z≥0
(i) {A(k)}k∈Z≥0 is i.i.d., Thm 12.1 (ii) each matrix has strictly positive diagonal, (iii) digraph associated to E[A(k)] has a globally reachable node, =⇒  limk→∞ x(k) = w> x(0) 1n almost surely, where w > 0 is random vector with 1> nw = 1
Table 12.1: Averaging systems: definitions, assumptions, asymptotic behavior, and reference
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Part II
Nonlinear Systems
149
 Chapter 13
Nonlinear Systems and Robotic Coordination Coordination in relative sensing networks: rendezvous, flocking, and formations The material in this section is self-contained. Further information on flocking can be found in (Olfati-Saber 2006; Tanner et al. 2007), and further material formation control and graph rigidity can be found in (Anderson et al. 2008; Dörfler and Francis 2010; Krick et al. 2009; Oh et al. 2015).
13.1
Coordination in relative sensing networks
We consider the following setup for the coordination of n autonomous mobile robots (referred to agents) in a planar environment: (i) Agent dynamics: We consider a simple and fully actuated agent model: p˙i = ui , where pi ∈ R2 and ui ∈ R2 are the position and steering control input of agent i.
(ii) Relative sensing model: We consider the following sensing model.
• Each agent is equipped with onboard sensors only and has no communication devices. • The sensing topology is encoded by an undirected and connected graph G = (V, E) • Each agent i can measure the relative position of neighboring agents: pi − pj for {i, j} ∈ E. To formalize the relative sensing model, we introduce an arbitrary orientation and labeling k ∈ {1, . . . , |E|} for each undirected edge {i, j} ∈ E. Recall the incidence matrix B ∈ Rn×|E| of ˆ = B ⊗ I2 via the Kronecker product. the associated oriented graph and define the 2n × 2|E| matrix B The Kronecker product A ⊗ B is the “element-wise” matrix product so that each scalar entry Aij of A is replaced by a block-entry Aij · B in the matrix A ⊗ B. For example, if B is given by     +1 0 0 0 +I2 0 0 0 −1 +1 −1 0  −I2 +I2 −I2 0  ˆ ˆ   . B=  0 −1 0 +1 , then B is given by B = B ⊗ I2 =  0 −I2 0 +I2  0 0 +1 −1 0 0 +I2 −I2 151
 152
Chapter 13. Nonlinear Systems and Robotic Coordination
p2 e1 e2
p1 e3
p3
Figure 13.1: A ring graph with three agents. The first panel shows the agents embedded in the plane R2 with positions pi and relative positions ei . The second panel shows the artificial potentials as springs connecting the robots, and the third panel shows the resulting forces.
ˆ > p. With this notation the vector of relative positions is given by e = B (iii) Geometric objective: The objective is to achieve desired geometric configuration which can be expressed as a function of relative distances kpi − pj k for each {i, j} ∈ E. Examples include rendezvous (kpi − pj k = 0), collision avoidance (kpi − pj k > 0), and desired relative spacings (kpi − pj k = dij > 0). (iv) Potential-based control: We specify the geometric objective for each edge {i, j} ∈ E as the minimum of an artificial potential function Vij : Dij ⊂ R → R≥0 . We require the potential functions to be twice continuously differentiable on their domain Dij .
It is instructive to think of Vij (kpi − pj k) as a spring coupling neighboring agents {i, j} ∈ E. The ∂ resulting spring forces acting on agents i and j are fij (pi − pj ) = − ∂p Vij (kpi − pj k) and fji (pi − pj ) = i ∂ −fij (pi − pj ) = − ∂pj Vij (kpi − pj k); see Figure 13.1 for an illustration. The overall network potential function is then X V (p) = Vij (kpi − pj k) . {i,j}∈E
We design the associated gradient descent control law as p˙i = ui = −
X ∂ X ∂V (p) =− V (kpi − pj k) = fij (pi − pj ) , ∂pi ∂pi {i,j}∈E
{i,j}∈E
i ∈ {1, . . . , n} .
In vector form the control reads as the gradient flow ∂V (p) > ˆ · diag({fij }{i,j}∈E ) ◦ B ˆ >p . =B p˙ = u = − ∂p
(13.1)
The closed-loop relative sensing network (13.1) is illustrated in Figure 13.2. Controllers based on artificial potential functions induce a lot of structure in the closed-loop system. Recall the set of 2-dimensional orthogonal matrices O(2) = {R ∈ R2 | RR> = I2 }, introduced in Exercise E2.13, as the set of 2-dimensional rotations and reflections. Lemma 13.1 (Symmetries of relative sensing networks). Consider the closed-loop relative sensing network (13.1) with an undirected and connected graph G = (V, E). For every initial condition p0 ∈ R2n , we have that Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 13.1. Coordination in relative sensing networks
.. u
153
. x˙ i = ui
..
x
. ˆT B
ˆ B
.. z
. y
fij (·)
..
.
Figure 13.2: Closed-loop diagram of the relative sensing network (13.1).
(i) the center of mass is stationary: average(p(t)) = average(p0 ) for all t ≥ 0; and (ii) the closed-loop p˙ = − ∂V∂p(p)
>
is invariant under rigid body transformations: if ξi = Rpi + q, where
> R ∈ O(2) and q ∈ R2 is a translation vector, then ξ˙ = − ∂V∂ξ(ξ) .
P P P Proof. Regarding statement (i), since ni=1 p˙i = 0, it follows that ni=1 pi (t) = ni=1 pi0 . Regarding statement (ii), first, notice that potential function is invariant under translations since V (p) = V (p + 1n ⊗ q) for any translation q ∈ R2 . Second, notice that the potential function is invariant ˆ = V (p) where under rotations and reflections since Vij (kR(pi − pj )k) = Vij (kpi − pj k) and thus V (Rp) ∂ ∂ ∂ ˆ R ˆ = V (p) or V (Rp) ˆ = ∂ V (p)R ˆ > . By ˆ = In ⊗ R. From the chain rule we obtain V (Rp) R ∂p ∂p ∂p ∂p ˆ + 1n ⊗ q), we find combining these insights when changing coordinates via ξi = Rpi + q (or ξ = Rp that  > > ˆ > ∂V (Rp) ∂V (ξ) > ˙ξ = R ˆ> ˆ p˙ = −R ˆ ∂V (p) = − ∂V (p) R =− . =− ∂p ∂p ∂p ∂ξ  Example 13.2 (The linear-quadratic rendezvous problem). An undirected consensus system is a relative sensing network coordination problem where the objective is rendezvous: pi = pj for all {i, j} ∈ E. For each edge {i, j} ∈ E consider the artificial potential Vij : R2n → R≥0 which has a minimum at the desired objective. For example, for the quadratic potential function 1 Vij (pi − pj ) = aij kpi − pj k22 , 2 ˆ where L ˆ = L ⊗ I2 . The the overall potential function is obtained as the Laplacian potential V (p) = 12 p> Lp, resulting gradient descent control law gives rise to the linear Laplacian flow p˙i = ui = −
X ∂ V (p) = − aij (pi − pj ) . ∂pi {i,j}∈E
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
(13.2)
 154
Chapter 13. Nonlinear Systems and Robotic Coordination
So far, we analyzed the consensus problem (13.2) using matrix theory and exploiting the linearity of the problem. In the following, we introduce numerous tools that will allow us to analyze nonlinear consensus-type interactions and more general nonlinear dynamical systems. •
13.2
Stability theory for dynamical systems
Dynamical systems and equilibrium points A (continuous-time) dynamical systems is a pair (X, f ) where X, called the state space, is a subset of Rn and f , called the vector field, is a map from X to Rn . Given an initial state x0 ∈ X, the solution (also called trajectory or evolution) of the dynamical system is a curve t 7→ x(t) satisfying the differential equation x(t) ˙ = f (x(t)),
x(0) = x0 .
A dynamical system (X, f ) is linear if x 7→ f (x) = Ax for some square matrix A. Typically, the map f is assumed to have some continuity properties so that the solution exists and is unique for at least small times; we do not discuss this topic here and refer, for example, to (Khalil 2002). Examples of continuous-time dynamical systems include the (linear) Laplacian flow x˙ = −Lx (seePequation (7.2) in Section 7.4) and the (nonlinear) Kuramoto coupled oscillator model θ˙i = ωi − n K j=1 sin(θi − θj ) (which we discuss in Chapter 14). n An equilibrium point for the dynamical systems (X, f ) is a point x∗ ∈ X such that f (x∗ ) = 0n . If the initial state is x(0) = x∗ , then the solution exists unique for all time and is constant: x(t) = x∗ for all t ∈ R≥0 . Convergence and invariant sets A curve t 7→ x(t) approaches a set S ⊂ Rn as t → +∞ if the distance from x(t) to the set S converges to 0 as t → +∞. If the set S consists of a single point s, then x(t) converges to s in the usual sense: limt→+∞ x(t) = s. Given a dynamical system (X, f ), a set W ⊂ X is invariant if each solution starting in W remains in W , that is, if x(0) ∈ W implies x(t) ∈ W for all t ≥ 0. We also need the following general properties: a set W ⊂ Rn is (i) bounded if there exists a constant K that each w ∈ W satisfies kwk ≤ K,
(ii) closed if it contains its boundary (or, equivalently, if it contains all its limit points), and (iii) compact if it is bounded and closed. Stability An equilibrium point x∗ for the system (X, f ) is said to be (i) stable (or Lyapunov stable) if, for each ε > 0, there exists δ = δ(ε) > 0 so that if kx(0) − x∗ k < δ, then kx(t) − x∗ k < ε for all t ≥ 0,
(ii) unstable if it is not stable,
(iii) locally asymptotically stable if it is stable and if there exists δ > 0 such that limt→∞ x(t) = x∗ for all trajectories satisfying kx(0) − x∗ k < δ. Moreover, given a locally asymptotically stable equilibrium point x∗ , Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 13.2. Stability theory for dynamical systems
155
(i) the set of initial conditions x0 ∈ X whose corresponding solution x(t) converges to x∗ is a closed set termed the region of attraction of x∗ , (ii) x∗ is said to be globally asymptotically stable if its region of attraction is the whole space X, (iii) x∗ is said to be globally (respectively, locally) exponentially stable if it is globally (respectively, locally) asymptotically stable and all trajectories starting in the region of attraction satisfy kx(t) − x∗ k ≤ c1 kx(0) − x∗ ke−c2 t ,
for some positive constants c1 , c2 > 0.
Some of these concepts are illustrated in Figure 13.3.
x⇤
x⇤
x⇤
"
Figure 13.3: Illustrations of a stable, an unstable and an asymptotically stable equilibrium.
Energy functions: non-increasing functions, sublevel sets and critical points In order to establish the stability and convergence properties of a dynamical system, we will use the concept of an energy function that is non-increasing along the system’s solution. The Lie derivative of a function V : Rn → R with respect to a vector field f : Rn → Rn is the function Lf V : Rn → R defined by ∂ Lf V (x) = V (x)f (x). (13.3) ∂x A differentiable function V : Rn → R is said to be non-increasing along every trajectories of the system if each solution x : R≥0 → X satisfies V˙ (x(t)) = Lf V (x(t)) ≤ 0,
or, equivalently, if each point x ∈ X satisfies
Lf V (x) ≤ 0.
A critical point for a differentiable function V : Rn → R is a point x ¯ ∈ X satisfying ∂V (¯ x) = 0n . ∂x Every critical point of a differentiable function is either a local minimum, local maximum or a saddle point. Given a function V : Rn → R and a constant k ∈ R, the k-level set of V is {y ∈ Rn | V (y) = k}, and the k-sublevel set of V is {y ∈ Rn | V (y) ≤ k}. These concepts are illustrated in Figure 13.4. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 156
Chapter 13. Nonlinear Systems and Robotic Coordination
`1 `2
`3
x1
x2
x3
x4
x5
x
{x | V (x)  `2 }
Figure 13.4: A differentiable function, its sublevel set and its critical points. The sublevel set {x | V (x) ≤ k1 } is unbounded. The sublevel set {x | V (x) ≤ k2 } = [x1 , x5 ] is compact and contains three critical points (x2 and x4 are local minima and x3 is a local maximum). Finally, the sublevel set {x | V (x) ≤ k3 } is compact and contains a single critical point x4 .
13.2.1
Main convergence tool: the LaSalle Invariance Principle
We now present a powerful analysis tool for the convergence analysis of nonlinear systems, namely the LaSalle Invariance Principle. We refer to (Khalil 2002, Theorem 4.4) for a complete proof, many examples and much related material. Also, we refer to (Bullo et al. 2009; Mesbahi and Egerstedt 2010) for various extensions and applications to robotic coordination. Theorem 13.3 (LaSalle Invariance Principle). Consider a dynamical system (X, f ) with differentiable f . Assume (i) there exists a compact set W ⊂ X that is invariant for (X, f ),
(ii) there exists a continuously-differentiable function V : X → R satisfying Lf V (x) ≤ 0 for all x ∈ X. Then each solution t 7→ x(t) starting in W , that is, x(0) ∈ W , converges to the largest invariant set contained in n o x ∈ W Lf V (x) = 0 .
Note: If the set S is composed of multiple disconnected components and t 7→ x(t) approaches S, then it must approach one of its disconnected components. Specifically, if the set S is composed of a finite number of points, then t 7→ x(t) must converge to one of the points.
13.2.2
Application #1: Linear and linearized systems
It is interesting to study the convergence properties of a linear system. Recall that a symmetric matrix is positive definite if all its eigenvalues are strictly positive. Theorem 13.4 (Convergence of linear systems). For a matrix A ∈ Rn×n , the following properties are equivalent: (i) each solution to the differential equation x˙ = Ax satisfies limt→+∞ x(t) = 0n , Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 13.2. Stability theory for dynamical systems
157
(ii) all the eigenvalues of A have strictly-negative real parts, and (iii) for every positive-definite matrix Q, there exists a unique solution positive-definite matrix P to the so-called Lyapunov equation: A> P + P A = −Q. One can show that statement (iii) implies statement (i) using the LaSalle Invariance Principle with function V (x) = x> P x, whose derivative along the systems solutions is V˙ = x> (A> P + P A)x = −x> Qx ≤ 0. The linearization at the equilibrium point x∗ of the dynamical system (X, f ) is the linear dynamical system defined by the differential equation x˙ = Ax, where A=
∂f ∗ (x ). ∂x
Theorem 13.5 (Convergence of nonlinear systems via linearization). Consider a dynamical system (X, f ) with an equilibrium point x∗ , with twice differentiable vector field f , and with linearization A at x∗ . The following statements hold: (i) the equilibrium point x∗ is locally exponentially stable if and only if all the eigenvalues of A have strictly-negative real parts; and (ii) the equilibrium point x∗ is unstable if at least one eigenvalue of A has strictly-positive real part. Theorem 13.5 can often be invoked to analyze local stability of a nonlinear system. For example, for θ ∈ R, consider the dynamical system θ˙ = f (θ) = ω − sin(θ) , which we will study extensively in Chapters 14 and 15. If ω ∈ [0, 1[, then two equilibrium points are θ1∗ = arcsin(ω) ∈ [0, π/2[ and θ2∗ = π − arcsin(ω) ∈ ]π/2, +π]. Moreover, the 2π-periodic set of equilibria are given by {θ1∗ + 2kπ | k ∈ Z} and {θ2∗ + 2kπ | k ∈ Z}. The linearization matrix ∗ ∗ ∗ ∗ A(θi∗ ) = ∂f ∂θ (θi ) = cos(θi ) for i ∈ {1, 2} shows that θ1 is locally stable and θ2 is unstable. On the other hand, pick a scalar c and, for x ∈ R, consider the dynamical system x˙ = f (x) = c · x3 . The linearization at the equilibrium x∗ = 0 is indefinite: A(x∗ ) = 0. Thus, Theorem 13.5 offers no conclusions other than the equilibrium cannot be exponentially stable. On the other hand, the LaSalle Invariance Principle shows that for c < 0 every trajectory converges to x∗ = 0. Here, a non-increasing and differentiable function is given by V (x) = x2 with Lie derivative Lf V (x) = −2cx4 ≤ 0. Since V (x(t)) is non-increasing along the solution to the dynamical system, a compact invariant set is then readily given by any sublevel set {x | V (x) ≤ k} for k ≥ 0. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 158
13.2.3
Chapter 13. Nonlinear Systems and Robotic Coordination
Application #2: Negative gradient systems
Given a twice differentiable function U : Rn → R, the negative gradient flow defined by U is the dynamical system ∂U (x(t)). (13.4) x(t) ˙ =− ∂x For x ∈ Rn , the Hessian matrix Hess U (x) is the symmetric matrix of second order partial derivatives: (Hess U )ij (x) = ∂ 2 U/∂xi ∂xj . Theorem 13.6 (Convergence of negative gradient flows). Let U : Rn → R be twice differentiable and assume its sublevel set {x | U (x) ≤ k} is compact for some k ∈ R. Then the negative gradient flow (13.4) has the following properties: (i) the sublevel set {x | U (x) ≤ k} is invariant,
(ii) each solution t 7→ x(t) with U (x(0)) ≤ k satisfies limt→+∞ U (x(t)) = c ≤ k and approaches the set of critical points of U : n o ∂U (x) = 0n , x ∈ Rn ∂x
(iii) each local minimum point x∗ is locally asymptotically stable and it is locally exponentially stable if and only if Hess U (x∗ ) is positive definite, (iv) a critical point x∗ is unstable if at least one eigenvalue of Hess U (x∗ ) is strictly negative. Proof. To show statements (i) and (ii), we verify that the assumptions of the LaSalle Invariance Principle are satisfied as follows. First, as set W we adopt the sublevel set {x | U (x) ≤ k} which is compact by assumption and is invariant because, as we show next, the value of t 7→ U (x(t)) is non-increasing. Second, the derivative of the function U along its negative gradient flow is 
2 
 ∂U ˙
(x) U (x) = − 
 ≤ 0. ∂x
The first two facts are now an immediate consequence of the LaSalle Invariance Principle. The statements (iii) and (iv) follow from observing that the linearization of the negative gradient system at the equilibrium x∗ is the Hessian matrix evaluated at x∗ and from applying Theorem 13.5.  Note: If the function U has isolated critical points, then the negative gradient flow evolving in a compact set must converge to a single critical point. In such circumstances, it is also true that from almost all initial conditions the solution will converge to a local minimum rather than a local maximum or a saddle point. Note: given a critical point x∗ , a positive definite Hessian matrix Hess U (x∗ ) is a sufficient but not a necessary condition for x∗ to be a local minimum. As a counterexample, consider the function U (x) = x4 and the critical point x∗ = 0. Note: If the function U is radially unbounded, that is, limkxk→∞ U (x) = ∞ (where the limit is taken along any path resulting in kxk → ∞), then all its sublevel sets are compact. Note from (Łojasiewicz 1984): if the function U is analytic, then every solution starting in a compact sublevel set has finite length and converges to a single equilibrium point. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 13.3. A nonlinear rendezvous problem
159
Example 13.7 (Dissipative mechanical system). Consider a dissipative mechanical system of the form p˙ = v, mv˙ = −dv −
∂ U (p) , ∂p
where (p, v) ∈ R2 are the position and velocity coordinates, m and d are the positive inertia and damping coefficients, and U : R → R is a twice differentiable potential energy function. We assume that U is strictly convex with a unique global minimum at p∗ . Consider the mechanical energy E : R × R → R≥0 given by the sum of kinetic and potential energy: 1 E(p, v) = mv 2 + U (p). 2 We compute its derivative along trajectories of the mechanical system as follows: ∂ ˙ E(p, v) = mv v˙ + U (p)p˙ = −dv 2 ≤ 0 . ∂p Notice that the assumptions of the LaSalle Invariance Principle in Theorem 13.3 are satisfied: the function E and the vector field (the right-hand side of the mechanical system) are continuously differentiable; the derivative E˙ is nonpositive; and for any initial condition (p0 , v0 ) ∈ R2 the sublevel set {(p, v) ∈ R2 | E(p, v) ≤ E(p0 , v0 )} is compact due to the strict convexity of U . It follows that (p(t), v(t)) converges to largest invariant set contained in ∂ U (p) = {(p, v) ∈ R2 | E(p, v) ≤ E(p0 , v0 ), v = 0}, that is, {(p, v) ∈ R2 | E(p, v) ≤ E(p0 , v0 ), v = 0, ∂p ∂ ∗ 0}. Because U is strictly convex and twice differentiable, ∂p U (p) = 0 if and only if p = p . Therefore, we conclude lim (p(t), v(t) = (p∗ , 0). t→+∞
•
13.3
A nonlinear rendezvous problem
Consider the nonlinear rendezvous system p˙i = fi (p) = −
X
{i,j}∈E
(13.5)
gij (pi − pj ) ,
where (for each {i, j} ∈ E) gij = gji is a continuously differentiable and anti-symmetric function satisfying gij (e) = 0 if and only if e = 0. Notice that the linearization of the system around the consensus subspace may be zero and thus not very informative, for example, when gij (e) = kek2 e. The nonlinear rendezvous system (13.5) can be written as a gradient flow: n
p˙i = −
X ∂ ∂ V (p) = − Vij (pi − pj ) . ∂pi ∂pi j=1
with the associated edge potential function Vij (kpi − pj k) =
R kpi −pj k 0
gij (χ) dχ.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 160
Chapter 13. Nonlinear Systems and Robotic Coordination
Theorem 13.8 (Nonlinear rendezvous). Consider the nonlinear rendezvous system (13.5) with an undirected and connected graph G = (V, E). Assume that the associated edge potential functions Vij (kpi − pj k) = R kpi −pj k gij (χ) dχ are radially unbounded. For every initial condition p0 ∈ R2n , we have that 0 • the center of mass is stationary: average(p(t)) = average(p0 ) for all t ≥ 0; and • limt→∞ p(t) = 1n ⊗ average(p0 ).
Proof. Note that the nonlinear rendezvous system (13.5) is the negative gradient system defined by the network potential function X V (p) = Vij (kpi − pj k) . {i,j}∈E
Recall from Lemma 13.1 that the center of mass is stationary, and observe that the function V (p) is radially unbounded with exception of the direction span(12n ) associated with a translation of the stationary center of mass. Thus, for every initial condition p0 ∈ R2n , the set of points (with fixed center of mass) {p ∈ R2n | average(p) = average(p0 ) , V (p) ≤ V (p0 )} defines a compact set. By the LaSalle Invariance Principle in Theorem 13.3, each solution converges to the largest invariant set contained in n o ∂V (p) p ∈ R2n average(p(t)) = average(p0 ) , V (p) ≤ V (p0 ) , = 0> n . ∂p
It follows that the only positive limit set is the set of equilibria: limt→∞ p(t) ∈ span(1n ⊗average(p0 )).
13.4
Flocking and Formation Control
In flocking control, the objective is that the robots should mimic the behavior of fish schools and bird flocks and attain a pre-scribed formation defined by a set of distance constraints. Given an undirected graph G(V, E) and a distance constraint dij for every edge {i, j} ∈ E, a formation is defined by the set F = {p ∈ R2n | kpi − pj k2 = dij
∀{i, j} ∈ E} .
We embed the graph G into the plane R2 by assigning to each node i a location pi ∈ R2 . We refer to the pair (G, p) as a framework, and we denote the set of frameworks (G, F) as the target formation. A target formation is a realization of F in the configuration space R2 . A triangular example is shown in Figure 13.5. We make the following three observations on the geometry of the target formation: • To be non-empty, the formation F has to be realizable in the plane. For example, for the triangular formation in Figure 13.5 the distance constraints dij need to satisfy the triangle inequalities: d12 ≤ d13 + d23 ,
d23 ≤ d12 + d13 ,
d13 ≤ d12 + d23 .
• A framework (G, p) with p ∈ F is invariant under rigid body transformations, that is, rotation or translation, as seen in Figure 13.5. Hence, the formation F is a set of at least of “dimension 3”. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 13.4. Flocking and Formation Control
161 p2
p2 �e13 �2 = d13
�e12 �2 = d12
p1 �e12 �2 = d12
p3 �e23 �2 = d23
�e23 �2 = d23
p1 �e13 �2 = d13
�e23 �2 = d23
p3
p2
p3
p1
�e13 �2 = d13
Figure 13.5: A triangular formation specified by the distance constraints d12 , d13 , and d23 . The left subfigure shows one possible target formation, the middle subfigure shows a rotation of this target formation, and the right subfigure shows a “flip” of the left target formation. All of this triangles satisfy the specified distance constraints and are elements of F.
• The formation F may consist of multiple disconnected components. For instance, for the triangular example in Figure 13.5 there is no continuous deformation from the left framework to the right “flipped” framework, even though both are target formations. In the state space R6 , this absence of a continuous deformation corresponds to two disconnected components of the set F. To steer the agents towards the target formation consider an artificial potential function for each edge {i, j} ∈ E which mimics the Hookean potential of a spring with rest length dij : Vij (kpi − pj k) =
2 1 kpi − pj k2 − dij . 2
Since this potential function is not differentiable, we choose the modified potential function Vij (kpi − pj k) =
2 1 kpi − pj k22 − d2ij . 4
(13.6)
∂ The resulting closed loop under the gradient control law u = − ∂p V (p) is given by
p˙i = ui = −
X ∂ V (p) = − ∂pi
{i,j}∈E
 kpi − pj k22 − d2ij · (pi − pj ) .
(13.7)
Theorem 13.9 (Flocking). Consider the nonlinear flocking system (13.7) with an undirected and connected graph G = (V, E) and a realizable formation F. For every initial condition p0 ∈ Rn , we have that • the center of mass is stationary: average(p(t)) = average(p0 ) for all t ≥ 0; and
• the agents asymptotically converge to the set of critical points of the potential function. Proof. As in the proof of Theorem 13.8, the center of mass is stationary and the potential is non-increasing: 
 ∂V (p) > 2 
V˙ (p) = − 
 ≤ 0.
 ∂p 
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 162
Chapter 13. Nonlinear Systems and Robotic Coordination
Observe further that for a fixed initial center of mass, the sublevel sets of V (p) form a compact set. By the LaSalle Invariance Principle in Theorem 13.3, p(t) converges to the largest invariant set contained in o n ∂V (p) = 0> p ∈ R2n average(p(t)) = average(p0 ) , V (p) ≤ V (p0 ) , n . ∂p It follows that the positive limit set is the set of critical points of the potential function.
The above results also holds true for non-smooth potential functions Vij : ] − d2i , ∞[ → R that satisfy
(P1) regularity: Vij (χ) is defined and twice continuously-differentiable on ]0, ∞[; (P2) distance specification: fij (ξ) =
(P3) mutual attractivity: fij (ξ) =
∂ ∂χ Vij (χ)
∂ ∂χ Vij (χ)
= 0 if and only if ξ = dij ;
is strictly monotone increasing; and
(P4) collision avoidance: limχ→0 Vij (χ) = ∞.
An illustration of possible potential functions can be found in Figure 13.6.
Vij
kfij k
d2ij
kpi
pj k
(a) Artificial potential functions
kpi
pj k
d2ij (b) Induced artificial spring forces
Figure 13.6: Illustration of the quadratic potential function (13.6) (blue solid plot) and a logarithmic barrier potential function (red dashed plot) that approaches ∞ as two neighboring agents become collocated
Theorem 13.10 (Flocking with collision avoidance). Consider the gradient flow (13.1) with an undirected and connected graph G = (V, E), a realizable formation F, and artificial potential functions satisfying (P1) through (P4). For every initial condition p0 ∈ R2n satisfying pi (0) 6= pj (0) for all {i, j} ∈ E, we have that • the solution to the non-smooth dynamical system exists for all times t ≥ 0;
• the center of mass average(p(t)) = average(p(0)) is stationary for all t ≥ 0;
• neighboring robots will not collide, that is, pi (t) 6= pj (t) for all {i, j} ∈ E and for all t ≥ 0; and • the agents asymptotically converge to the set of critical points of the potential function.
Proof. The proof of Theorem 13.10 is identical to that of Theorem 13.9 after realizing that, for initial conditions satisfying pi (0) 6= pj (0) for all {i, j} ∈ E, the dynamics are confined to the compact and forward invariant set n o ∂V (p) p ∈ R2n average(p(t)) = average(p0 ) , V (p) ≤ V (p0 ) , = 0> n . ∂p Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 13.5. Rigidity and stability of the target formation
163
Within this set, the dynamics (13.7) are twice continuously differentiable and collisions are avoided.
At this point we should ask ourselves the following two questions: (i) Do the agents actually stop, that is, does there exist an p∞ ∈ Rn so that limt→∞ p(t) = p∞ ?
(ii) The formation F is a subset of the set of critical points of the potential function. How can we render this particular subset stable (amongst possible other critical points)? What are the other critical points? (iii) Does our specification of the target formation make sense? For example, in Figure 13.7 target formation can be infinitesimally deformed, such that the resulting geometric configurations are not congruent.
Figure 13.7: A rectangular target formation among four robots, which is specified by four distance constraints. The initial geometric configuration (solid circles) can be continuously deformed such that the resulting geometric configuration is not congruent anymore. All of the displayed configurations are part of the target formation set and satisfy the distance constraints, even the case when the agents are collinear.
The answers to all this question is tied to a graph-theoretic concept called rigidity.
13.5
Rigidity and stability of the target formation
To introduce the notion of graph rigidity, we view the undirected graph G = (V, E) as a framework (G, p) embedded in the plane R2 . Given a framework (G, p) is, we define the rigidity function rG (p) as rG : R2n → R|E|
,
rG (p) ,
> 1 . . . , kpi − pj k22 , . . . , 2
where each component in rG (p) corresponds the length of the relative position pi − pj for {i, j} ∈ E. Definition 13.11 (Rigidity). Given an undirected graph G(V, E) and p ∈ R2n , the framework (G, p) is said to be rigid if there is an open neighbourhood U of p such that if q ∈ U and rG (p) = rG (q), then (G, p) is congruent to (G, q). Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 164
Chapter 13. Nonlinear Systems and Robotic Coordination
(a) A flexible framework
(b) A rigid framework
Figure 13.8: The framework in Figure 13.8(a) is not rigid since a slight perturbation of the upper two points of the framework results in a framework that is not congruent to the original one although their rigidity functions coincide. If an additional cross link is added to the framework as in Figure 13.8(b), small perturbations that do not change the rigidity function result in a congruent framework. Thus, the framework in Figure 13.8(b) is rigid.
An example of a rigid and non-rigid framework is shown in Figure 13.8. Although rigidity is a very intuitive concept, its definition does not provide an easily verifiable condition, especially if one is interested in finding the exact neighbourhood U where the framework is rigid. The following “linearized rigidity concept” offers an easily checkable algebraic condition. The idea is to allow an infinitesimally small perturbation ∂p of the framework (G, p) while keeping the rigidity function constant up to first order. Then the first order Taylor approximation of the rigidity function rG about p is ∂rG (p) ∂p + O2 (∂p) . rG (p + ∂p) = rG (p) + ∂p   G (p) The rigidity function then remains constant up to first order if ∂p ∈ kernel ∂r∂p . The matrix ∂rG (p) ∂p
∈ R|E|×2n is called the rigidity matrix of the graph G. If the perturbation ∂p is a rigid body motion, that is a translation and rotation of the framework, then, by Definition 13.11, the framework is still rigid. Thus, the dimension of the kernel of the rigidity matrix is at least 3. The idea that rigidity is preserved under infinitesimal perturbations motivates the following definition of infinitesimal rigidity. Definition 13.12 (Infinitesimal rigidity). Given a formationgraph G(V, E) and p ∈ R2n , the  framework 
(G, p) is said to be infinitesimally rigid if dim kernel 2n − 3.
∂rG (p) ∂p
= 3 or equivalently if rank
∂rG (p) ∂p
=
If a framework is infinitesimally rigid, then it is also rigid but the converse is not necessarily true (Asimow and Roth 1979). Also note that a infinitesimally rigid framework must have at least 2n − 3 edges E. If it has exactly 2n − 3 edges, then we call it a minimally rigid framework. Finally, if (G, p) is infinitesimally rigid at p, so is (G, p0 ) for p0 in an open neighborhood of p. Thus, infinitesimal rigidity depends is a generic property that depends almost only on the graph G and not on the specific point p ∈ R2n . Throughout the literature (infinitesimally, minimally) rigid frameworks are often denoted as (infinitesimally, minimally) rigid graphs. Example 13.13 (Rigidity and infinitesimal rigidity of triangular formation). Consider the triangular framework in Figure 13.9(a) and the collapsed triangular framework in Figure 13.9(b) which are both embeddings Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 13.5. Rigidity and stability of the target formation
165
of the same triangular graph. The rigidity function for both frameworks is given by   kp2 − p1 k2 1 rG (p) = kp3 − p2 k2  . 2 kp1 − p3 k2 Both frameworks are rigid but only the left framework is infinitesimally rigid. To see this, consider the rigidity matrix  >  > p − p> p> 0 2 2 − p1 ∂rG (p)  1 > p> − p>  . 0 p> = 2 − p3 3 2 ∂p > > > p1 − p3 0 p> 3 − p1
The rank of the rigidity matrix at a collinear point is 2 < 2 n − 3. Hence, the collapsed triangle in Figure 13.9(b) is not infinitesimally rigid. All non-collinear realizations are infinitesimally and minimally rigid. Hence, the triangular framework in Figure 13.9(a) is generically minimally rigid (for almost every p ∈ R6 ). • Minimally rigid graphs can be constructed by adding a new node with two undirected edges to an existing minimally rigid graph; see Figure 13.10. This construction is known under the name Henneberg sequence. The flocking result in Theorem 13.9 identifies the critical points of the potential function as the positive limit set. For minimally rigid graphs, we can perform a more insightful stability analysis. To do ˆ > e. The rigidity so, we first reformulate the formation control problem in the coordinates of the e = B function can be conveniently rewritten in terms of the relative positions eij = pi − pj for every edge {i, j} ∈ E: > 1 . . . , keij k22 , . . . rG : B > R2n → R|E| , rG (e) = 2
The rigidity matrix is then obtained in terms of the relative positions as R(e) ,
∂rG (e) ∂rG (e) ∂e ˆ> . = · = diag(e> ) · B ∂p ∂e ∂p
p2
p1
p2
p1 p3
(a) A rigid and infinitesimally rigid framework (triangle inequalities are strict)
p3 (b) A rigid but not infinitesimally rigid framework (triangle inequalities are equalities)
Figure 13.9: Infinitesimal rigidity properties of a framework with three points Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 166
Chapter 13. Nonlinear Systems and Robotic Coordination
Figure 13.10: Construction of a minimally rigid graph by means of Henneberg sequence
 > Consider the shorthand v(e) − d = . . . , kpi − pj k22 − d2ij , . . . . Then the closed-loop formation control equations (13.7) can be reformulated in terms of relative positions as ˆ > p˙ = B ˆ > u = −B ˆ >B ˆ diag(e)(v(e) − d) = −B ˆ > R(e)> (v(e) − d) . e˙ = B
(13.8)
ˆ > p0 is a vector in image(B ˆ > ). The associated initial condition e0 = B Theorem 13.14 (Stability of minimally rigid formations). Consider the nonlinear flocking system (13.7) with an undirected and connected graph G = (V, E) and a realizable and minimally rigid formation F. For every initial condition p0 ∈ Rn , we have that • the center of mass is stationary: average(p(t)) = average(p0 ) for all t ≥ 0;
• the agents asymptotically converge to the set
Wp0 = {p ∈ R2n | average(p) = average(p0 ) , V (p) ≤ V (p0 ) , kR(e)> [v(e) − d]k2 = 0|E| } . In particular, the limit set Wp0 is a union of realizations of the target formation (G, p) with p ∈ Wp0 ∩ F and the set of points p ∈ Wp0 where the framework (G, p) is not infinitesimally rigid; and
• For every p0 ∈ R2n such that the framework (G, p) is minimally rigid for all p in the set {p ∈ R2n | average(p) = average(p0 ) , V (p) ≤ V (p0 )} ,
the agents converge exponentially fast to a stationary target formation (G, p∞ ) with p∞ ∈ Wp0 ∩ F. Proof. Consider the potential function (13.9), which reads in e-coordinates as
2 1 V (e) = v(e) − d , 4
(13.9)
ˆ > F is compact since the translational invariance In the space of relative positions the target formation set B is removed. Also the sublevel sets of V (e) are compact, and the derivative along the trajectories of (13.8) is ∂V (e) ˆ > R(e)[v(e) − d] = −[v(e) − d]> R(e)R(e)> [v(e) − d] ≤ 0 . e˙ = −[v(e) − d]> diag(e> )B ∂e Notice that V (e(t)) is non-increasing, and for every c ≥ 0 the sublevel set ˆ > ) | V (e) ≤ c} Ω(c) := {e ∈ Im(B Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 13.5. Rigidity and stability of the target formation
167
ˆ > ) the is forward invariant. By the LaSalle Invariance Principle, for every initial condition e0 ∈ image(B associated solution of (13.8) converges to the largest invariant set in ˆ > ) | V (e) ≤ V (e0 ) , kR(e)> [v(e) − d]k2 = 0|E| }. We0 = {e ∈ image(B In particular, the limit set We0 includes realizations of the target formation (G, p) with p ∈ Wp0 ∩ F, ˆ > p, and [v(e) − d] = 0|E| , and the set of points e ∈ We where the rigidity matrix R(e)> ∈ Rn×|E| e=B 0 looses rank corresponding to points p ∈ Wp0 where the framework (G, p) is not infinitesimally rigid. Due to minimal rigidity of the target formation the matrix RG (e)> ∈ R2n×m has full rank |E| = 2n−3 ˆ > F, or said differently RG (e)RG (e)> has no zero eigenvalues for all e ∈ B ˆ > F. The minimal for all e ∈ B > > ˆ F and thus (due to continuity of eigenvalues eigenvalue of RG (e)RG (e) is positive for all e ∈ B ˆ > F. In particular, for any with respect to the matrix elements) also in an open neighborhood of B strictly positive λ > 0, we can find ρ = ρ(λ) so that for everywhere in the sublevel set Ω(ρ) the matrix RG (e)RG (e)> is positive definite with eigenvalues lower-bounded by λ. Formally, ρ is obtained by ρ = argmaxe,˜ρ ρ˜ subject to e ∈ Ω(˜ ρ)
  min eig RG (e)RG (e)> ≥ λ.
e∈Ω(˜ ρ)
Then, for all e ∈ Ω(ρ), we can upper-bound the derivative of V (e) along trajectories as V˙ (e) ≤ −λkv(e) − dk2 = −4λ V (e) .
(13.10)
By the Grönwall-Bellman Comparison Lemma in Exercise E13.1, we have that for every e0 ∈ Ω(ρ), V (e(t)) ≤ V (e0 )e−4λt . It follows that the the target formation set (parameterized in terms of relative ˆ > F is exponentially stable with Ω(ρ) as guaranteed region of attraction. positions) B Although the e-dynamics (13.8) and the p-dynamics (13.7) both have the formation F as a limit set, convergence of the e-dynamics does not automatically imply convergence to a stationary target formation (but only convergence of the point-to-set distance to F). To establish stationarity, we rewrite the p-dynamics (13.7) as Z t p(t) = p0 + f (τ ) dτ , (13.11) 0
ˆ diag(e(t)[v(e(t) − d]. Due to the exponential convergence rate of the e-dynamics where f (t) = −B in We0 the function f (t) is exponentially decaying in time and thus an integrable (L1 ) function. It follows that the integral on the right-hand side of (13.11) exists even in the limit as t → ∞ and thus a solution of the p-dynamics converges to a finite point in F, that is, the agents converge to a stationary ˆ > p0 ∈ Ω(ρ), the agents converge target formation. In conclusion for every p0 ∈ R2n so that e0 = B exponentially fast to a stationary target formation.  Theorem 13.14 formulated for minimally rigid formations can also be extended to more redundant infinitesimally rigid formations; see (Oh et al. 2012).
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 168
13.6 E13.1
Chapter 13. Nonlinear Systems and Robotic Coordination
Exercises Grönwall-Bellman Comparison Lemma. Given a continuous function of time t 7→ a(t) ∈ R, suppose the signal t 7→ x(t) satisfies x(t) ˙ ≤ a(t)x(t). Define a new signal t 7→ y(t) satisfying y(t) ˙ = a(t)y(t). Show that  Rt (i) y(t) = y(0) exp 0 a(τ )dτ , and (ii) x(t) ≤ y(t).
E13.2
Distributed optimization using the Laplacian Flow. Consider the saddle point dynamics (7.6) that solve the optimization problem (7.5) in a fully distributed fashion. Assume that the objective functions are strictly convex and twice differentiable and that the underlying communication graph among the distributed processors is connected and undirected. By using the LaSalle Invariance principle show that all solutions of the saddle point dynamics converge to the set of saddle points. ∂ Hint: Use the following global under-estimator property of a strictly convex function: f (x0 )−f (x) > ∂x f (x)(x0 − 0 x) for all x and x in the domain of f .
E13.3
The Lotka-Volterra predator/prey dynamics. In mathematical ecology (Takeuchi 1996), the LotkaVolterra equations are frequently used to describe the dynamics of biological systems in which two animal species interact, a predator and a prey. In a simplifies model with unit parameters, the animal populations change through time according to x(t) ˙ = αx(t) − βx(t)y(t),
y(t) ˙ = −γy(t) + δx(t)y(t),
(E13.1)
where x is the number of preys, y is the number of predators individuals, and are parameters characterizing the interaction between the two species. Both variables are nonnegative and all four parameters are strictly positive. (i) Compute the unique non-zero equilibrium point (x∗ , y ∗ ) of the system. (ii) Determine, if possible, the stability properties of the equilibrium point (x∗ , y ∗ ) via linearization (Theorem 13.5). (iii) Define the function V (x, y) = −δx − βy + γ ln(x) + α ln(y) and note its level sets as illustrated in Figure (E13.1). (a) Compute the Lie derivative of V (x, y) with respect to the Lotka-Volterra vector field. (b) What can you say about the stability properties of (x∗ , y ∗ )? (c) Sketch the trajectories of the system for some initial conditions in the x-y positive orthant. E13.4
E13.5
On the gradient flow of a strictly convex function. Let f : Rn → R be a strictly convex and twice ∂ differentiable function. Show convergence of the associated negative gradient flow, x˙ = − ∂x f (x), to ∗ ∗ > ∗ the global minimizer x of f using the Lyapunov function V (x) = (x − x ) (x − x ) and the LaSalle Invariance Principle in Theorem 13.3. Hint: Use the global underestimate property of a strictly convex function stated as follows: f (x0 ) − f (x) > ∂ 0 0 ∂x f (x)(x − x) for all distinct x and x in the domain of f .
Consensus with input constraints. Consider a set of n agents each with first-order dynamics x˙ i = ui . (i) Design a consensus protocol that respects input constraints ui (t) ∈ [−1, 1] for all t ≥ 0, and prove that your protocol achieves consensus. Hint: Adopt the hyperbolic tangent function (or the arctangent function) and Theorem 13.8. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Exercises for Chapter 13
169
y
(x⇤ , y ⇤ )
x
Figure E13.1: Level sets of the function V (x, y) for unit parameter values ¨i = ui to achieve consensus (ii) Extend the protocol and the proof to the case of second-order dynamics x of position states and convergence of velocity states to zero. Hint: Recall Example 13.7. E13.6
Nonlinear distributed optimization using the Laplacian Flow. Consider the saddle point dynamics (7.6) that solve the optimization problem 7.5 in a fully distributed fashion. Assume that the objective functions are strictly convex and twice continuously differentiable and that the underlying communication graph among the distributed processors is connected and undirected. Show via the LaSalle Invariance Principle that all solutions of the saddle point dynamics converge to the set of saddle points. ∂ Hint: Use the following global underestimate property of a strictly convex function: f (x0 )−f (x) > ∂x f (x)(x0 −x) 0 for all distinct x and x in the domain of f ; and the following global overestimate property of a concave function: ∂ g(x0 ) − g(x) ≤ ∂x g(x)(x0 − x) for all distinct x and x0 in the domain of g. Finally, note that the overestimate ∂ property holds with equality g(x0 ) − g(x) = ∂x g(x)(x0 − x) if g(x) is affine.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 170
Chapter 13. Nonlinear Systems and Robotic Coordination
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Chapter 14
Coupled Oscillators: Basic Models In this chapter we discuss network of coupled oscillators. We borrow ideas from (Dörfler and Bullo 2011, 2014). This chapter focuses on phase-coupled oscillators and does not discuss models of impulsecoupled oscillators. Further information on coupled oscillator models can be found in Acebrón et al. (2005); Arenas et al. (2008); Mauroy et al. (2012); Strogatz (2000).
14.1
History
The scientific interest in synchronization of coupled oscillators can be traced back to the work by Christiaan Huygens on “an odd kind sympathy” between coupled pendulum clocks (Huygens 1673). The model of coupled oscillator which we study was originally proposed by Arthur Winfree (Winfree 1967). For complete interaction graphs, this model is nowadays known as the Kuramoto model due to the work by Yoshiki Kuramoto (Kuramoto 1975, 1984). Stephen Strogatz provides an excellent historical account in (Strogatz 2000). The Kuramoto model and its variations appear in the study of biological synchronization phenomena such as pacemaker cells in the heart (Michaels et al. 1987), circadian rhythms (Liu et al. 1997), neuroscience (Brown et al. 2003; Crook et al. 1997; Varela et al. 2001), metabolic synchrony in yeast cell populations (Ghosh et al. 1971), flashing fireflies (Buck 1988), chirping crickets (Walker 1969), and rhythmic applause (Néda et al. 2000), among others. The Kuramoto model also appears in physics and chemistry in modeling and analysis of spin glass models (Daido 1992; Jongen et al. 2001), flavor evolutions of neutrinos (Pantaleone 1998), and in the analysis of chemical oscillations (Kiss et al. 2002). Some technological applications include deep brain stimulation (Tass 2003), vehicle coordination (Klein et al. 2008; Paley et al. 2007; Sepulchre et al. 2007), semiconductor lasers (Hoppensteadt and Izhikevich 2000; Kozyreff et al. 2000), microwave oscillators (York and Compton 2002), clock synchronization in wireless networks (Simeone et al. 2008), and droop-controlled inverters in microgrids (Simpson-Porco et al. 2012). 171
 172
Chapter 14. Coupled Oscillators: Basic Models
⌧3 k34 k23
⌧2
⌧4
k24 k12
⌧1
Figure 14.1: Mechanical analog of a coupled oscillator network
14.2
Examples
14.2.1
Example #1: A spring network on a ring
This coupled-oscillator network consists of particles rotating around a unit-radius circle and assumed to possibly overlap without colliding. Each particle is subject to (1) a non-conservative torque τi , (2) a linear damping torque, and (3) a total elastic torque. Pairs of interacting particles i and j are coupled through elastic springs with stiffness kij > 0. The elastic energy stored by the spring between particles at angles θi and θj is  kij kij 2 distance = (cos θi − cos θj )2 + (sin θi − sin θj )2 2 2   = kij 1 − cos(θi ) cos(θj ) − sin(θi ) sin(θj ) = kij 1 − cos(θi − θj ) ,
Eij (θi , θj ) =
so that the elastic torque on particle i is −
∂ Eij (θi , θj ) = −kij sin(θi − θj ). ∂θi
In summary, Newton’s law applied to this rotating system implies that the network of spring-interconnected particles obeys the dynamics Xn Mi θ¨i + Di θ˙i = τi − kij sin(θi − θj ), j=1
where Mi and Di are inertia and damping coefficients. In the limit of small masses Mi and uniformly-high viscous damping D = Di , that is, Mi /D ≈ 0, the model simplifies to: Xn θ˙i = ωi − aij sin(θi − θj ), i ∈ {1, . . . , n}. j=1
with natural rotation frequencies ωi = τi /D and with coupling strengths aij = kij /D. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 14.2. Examples
173
!"#$%&'''%()(*%(+,-.,*%/012-3*%)0-4%5677*%899:
!"#$%&' 8
15
38
9
F
1
28
0
16 15 21
39
8
0
2
4
14 12
19 13
7 8
31
2
10 32
9
34
5
33
7
3
23 16
13
10
4
10 15
11 10
36
11
14
6
15
23
20
δi / rad
6
24
8 12
2
5
21
5
6
7
22
22
17
-5
35
31
4
18 4
9
6
3
1
27
3
35
17
29
2
5
6
24
36
28 27
10 02 03 04 05
26
1
18
9 38
25
39
26
25
2
1
30
10
δi / rad
30
37
10
29
7
8 37
10
32 3
33
34
5
06 07 08 09
19 20
4
5
0
Fig. 9. The New England test system [10], [11]. The system includes 10 synchronous generators and 39 buses. Most of the buses have constant active and reactive Coupled and swinggraph dynamicsrepresentation of 10 generators Figure 14.2:power Lineloads. diagram are studied in the case that a line-to-ground fault occurs at point F near bus 16.Generators are represented by   and load buses by .
•◦
test system can be represented 14.2.2 Example #2:byThe
-5 0
2
4
6
8
10
for a simplified model of the/ sNew England Power Grid. TIME Fig. 10. Coupled swing of phase angle δi in New England test system. The fault duration is 20 cycles of a 60-Hz sine wave. The result is obtained by numerical integration of eqs. (11).
“structure-preserving” power network model
   We in Figure 14.2,towith n buses including generators and  Hi consider an AC power 2network, visualized are provided discuss whether the instability in Fig. 10 ω ˙ = −D ω + P − G E − E E · (11)foroccurs i j models i iWe i present mi ii i simplified in the corresponding real power system. First, the load buses. two this network, a static power-balance model and a πfs   j=1,j!=i classical model with constant voltage behind impedance is  dynamic· {G continuous-time model. ij cos(δi − δj ) + Bij sin(δi − δj )}, used for first swing criterion of transient stability [1]. This is because second and multi swings may be affected by voltage The transmission network matrix Y ∈ Cn×n that is symmetric and where i = 2, . . . , 10. δi is the rotor angleisofdescribed generator i by withan admittance fluctuations, damping effects, controllers such as AVR, PSS, respect to bus 1, and ω the rotor speed deviation of generator sparse with line impedances Zij = Zji for each branch {i, j} ∈ E. The network admittance matrix is i i relative to system angular frequency (2πfs = 2π × 60 Hz). and governor. Second, the fault durations, which we fixed at sparse matrix with nonzero off-diagonal entries Yij20=cycles, −1/Zare each branch ∈ E;Last, the the diagonal normally less than{i, 10 j} cycles. load ij for δ1 is constant for the Pnabove assumption. The parameters condition used above is different from the original one in = Y assure zero row sums. fselements , Hi , Pmi ,YD , E , G , G , and B are in per unit i j=1,j6 ii =iij ij ij iii system except for Hi and Di in second, and for fs in Helz. [11]. We cannot hence argue that global instability occurs in the concepts. real system.Firstly, Analysis, however, does show a possibility The static model is described by the following according to Kirchhoff’s current The mechanical input power Pmi to generator i and the two of global instability in real adjacent power systems. magnitude of internal voltage in generator i are assumed law, theEcurrent injection at node i is balanced by the current flows from nodes: i to be constant for transient stability studies [1], [2]. Hi is IV. T OWARDS A C ONTROL FOR G LOBAL S WING the inertia constant of generator i, Di its damping coefficient, I NSTABILITY n n X 1 and X and they are constant. Gii is the internal conductance, Global instability is related to the undesirable phenomenon I = (V − V ) = Y V . Gij + jBij the transfer impedance between i generators i i j ij j Zij that should be avoided by control. We introduce a key and j; They are the parameters which change with j=1network j=1 topology changes. Note that electrical loads in the test system mechanism for the control problem and discuss control strategies for preventing or avoiding the instability. are modeled as passive impedance [11]. Here, Ii and V are the phasor representations of the nodal current injections and nodal voltages, e.g., i A. Internal Resonance as Another Mechanism B. Numerical √ Experiment −1θ i VCoupled e corresponds to the signal |V cos(ω0 tInspired + θi ). by The complex i = |Vi | swing i = Vi · I i [12], we here power describeinjection the globalSinstability dynamics of 10 generators in i | the systems theory close to internal resonance test systemz are simulated. Ei and the initial condition (where denotes the complex conjugate of z ∈ C) with thendynamical satisfies the power balance equation (δi (0), ωi (0) = 0) for generator i are fixed through power [23], [24]. Consider collective dynamics in the system (5). flow calculation. Hi is fixed at the original values in [11]. nFor the system (5) with small parameters pm and b, the set n X {(δ, ω) ∈ S 1 × R√ | ω = 0} of states in the phase plane is Pmi and constant power loads are assumed to be 50% at their X −1(θi −θj ) Y ij V j = called Y ij resonant |Vi ||Vj |esurface . its neighborhood resonant [23], and ratings [22]. The damping Di is S 0.005 i = sVfor i · all generators. band. The phase plane is decomposed into the two parts: Gii , Gij , and Bij are also based on the original line data j=1 j=1 in [11] and the power flow calculation. It is assumed that resonant band and high-energy zone outside of it. Here the the test system is in a steady operating condition at t = 0 s, initial conditions of local and mode disturbances in Sec. II inside equations the resonantat band. collective Secondly, for a lossless network real bus part16ofatthe indeed powerexist balance eachThe node is motion that a line-to-ground fault occurs at pointthe F near t = 1 s−20/(60 Hz), and that line 16–17 trips at t = 1 s. The before the onset of coherent growing is trapped near the resonant band. On the other hand, after the coherent growing, fault duration is 20 cycles of a 60-Hz sine wave. The n fault is simulated by adding a small impedance (10−7 j)X between it escapes from the resonant band as shown in Figs. 3(b), Picoupled swings = aij · 4(b), sin(θ5, θj )8(b), andi ∈ . . .trapped , n}, motion is almost (14.1) (c).{1, The i −and bus 16 and ground. Fig. 10 shows of rotor |{z} | integrable {z } is regarded as a captured state in resonance and angle δi in the test system. The figure indicates that all rotor j=1 active power injection active power flow j to i the integrable motion may be interrupted [23]. Atfrom a moment, angles start to grow coherently at about 8 s. The coherent by small kicks that happen during the resonant band. That is, growing is global instability. the so-called release from resonance [23] happens, and the Lectures on Network Systems, motion F. Bullocrosses the homoclinic orbit in Figs. 3(b), C. Remarks collective Version v0.81 Janthe 2016). circulation. Copyright 2012-16. 4(b), 5, and 8(b) and (c),©and hence it goes away from It was confirmed that the system (11)(4 in NewDraft Eng-not for land test system shows global instability. A few comments the resonant band. It is therefore said that global instability δ˙i = ωi ,
10 !
(')$ Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 10, 2009 at 14:48 from IEEE Xplore. Restrictions apply.
 174
Chapter 14. Coupled Oscillators: Basic Models
where aij = |Vi ||Vj ||Yij | denotes the maximum power transfer over the transmission line {i, j}, and Pi = <(Si ) is the active power injection into the network at node i, which is positive for generators and negative for loads. The systems of equations (14.1) are the so-called (balanced) active power flow equations. Next, we discuss a simplified dynamic model. Many appropriate dynamic models have been proposed for each network node: zeroth order (for so-called constant power loads), first-order models (for so-called frequency-dependent loads and inverter-based generators), and second and higher order for generators; see (Bergen and Hill 1981). For extreme simplicity here, we assume that every node is described by a first-order integrator with the following intuition: node i speeds up (i.e., θi increases) when the power balance at node i is positive, and slows down (i.e., θi decreases) when the power balance at node i is negative. In other words, we assume θ˙i = Pi −
n X j=1
aij sin(θi − θj ).
(14.2)
The systems of equations (14.2) are a first-order simplified version of the so-called coupled swing equations. Note that, when every node is connected to every other node with identical connections of strength K > 0, our simplified model of power network is identical to the so-called Kuramoto oscillators model: n X ˙θi = ωi − K sin(θi − θj ). n
(14.3)
j=1
(Here ωi = Pi and aij = K/n for all i, j.) Let us remark that a more realistic model of power network would necessarily include higher-order dynamics for the generators, uncertain load models, mixed resistive-inductive lines, and the modelling of reactive power.
14.2.3
Example #3: Flocking, schooling, and vehicle coordination
Consider a set of n particles in the plane R2 , which we identify with the complex plane C. Each particle i ∈ {1, . . . , n} is characterized by its position ri ∈ C, its heading angle θi ∈ S1 , and a steering control law ui (r, θ) depending on the position and heading of itself and other vehicles, see Figure 14.3.(a). For simplicity, we assume that all particles have unit speed. The particle kinematics are then given by r˙i = eiθi , θ˙i = ui (r, θ) ,
(14.4)
√ for i ∈ {1, . . . , n} and i = −1 is the imaginary unit. If no control is applied, then particle i travels in a straight line with orientation θi (0), and if ui = ωi ∈ R is a nonzero constant, then particle i traverses a circle with radius 1/|ωi |. The interaction among the particles is modeled by a interaction graph G = ({1, . . . , n}, E, A) determined by communication and sensing patterns. As shown by Vicsek et al. (1995), interesting motion patterns emerge if the controllers use only relative phase information between neighboring Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 14.3. Coupled phase oscillator networks
175
particles. As discussed in the previous chapter, we may adopt potential functions-based gradient control strategies (i.e., negative gradient flows) to coordinate the relative heading angles θi (t) − θj (t). As shown in Example #1, an intuitive extension of the quadratic Hookean spring potential to the circle is the function Uij : S1 → R defined by Uij (θi , θj ) = aij (1 − cos(θi − θj )), for each edge {i, j} ∈ E. Notice that the potential Uij (θi , θj ) achieves its unique minimum if the heading angles θi and θj are synchronized, and it achieves its maximum when θi and θj are out of phase by angle π. These considerations motivate the gradient-based control strategy Xn ∂ X Uij (θi − θj ) = ω0 − K θ˙i = ω0 − K aij sin(θi − θj ) , j=1 ∂θi {i,j}∈E
i ∈ {1, . . . , n} .
(14.5)
to synchronize the heading angles of the particles for K > 1 (gradient descent), respectively, to disperse the heading angles for K < 1 (gradient ascent). The term ω0 can induce additional rotations (for ω0 6= 0) or translations (for ω0 = 0). A few representative trajectories are illustrated in Figure 14.3. The controlled phase dynamics (14.5) give rise to elegant and useful coordination patterns that mimic animal flocking behavior and fish schools. Inspired by these biological phenomena, scientists have studied the controlled phase dynamics (14.5) and their variations in the context of tracking and formation controllers in swarms of autonomous vehicles (Paley et al. 2007). (c)
(b)
(a) !" #! ! x ! ! !r! = ! ! y !
(x, y)
(d)
(e)
eθiθi
θ
Figure 14.3: Panel (a) illustrates the particle kinematics (14.4). Panels (b)-(e) illustrate the controlled dynamics (14.4)(14.5) with n = 6 particles, a complete interaction graph, and identical and constant natural frequencies: ω0 (t) = 0 in panels (b) and (c) and ω0 (t) = 1 in panels (d) and (e). The values of K are K = +1 in panel (b) and (d) and K = −1 in panel (c) and (e). The arrows depict the orientation, the dashed curves show the long-term position dynamics, and the solid curves show the initial transient position dynamics. As illustrated, the resulting motion displays synchronized or dispersed heading angles for K = ±1, and translational motion for ω0 = 0, respectively circular motion for ω0 = 1.
14.3
Coupled phase oscillator networks
Given a connected, weighted, and undirected graph G = ({1, . . . , n}, E, A), consider the coupled oscillator model Xn θ˙i = ωi − aij sin(θi − θj ), i ∈ {1, . . . , n}. (14.6) j=1
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 176
Chapter 14. Coupled Oscillators: Basic Models
A special case of the coupled oscillator (14.6) is the so-called Kuramoto model (Kuramoto 1975) with a complete homogeneous network (i.e., with identical edge weights aij = K/n): K Xn θ˙i = ωi − sin(θi − θj ), j=1 n
14.3.1
i ∈ {1, . . . , n}.
(14.7)
The geometry of the circle and the torus
Parametrization The unit circle is S1 . The torus Tn is the set consisting of n-copies of the circle. We parametrize the circle S1 by assuming (i) angles are measured counterclockwise, (ii) the 0 angle is the intersection of the unit circle with the positive horizontal axis, and (iii) angles take value in [−π, π[. Geodesic distance The clockwise arc-length from θi to θj is the length of the clockwise arc from θi to θj . The counterclockwise arc-length is defined analogously. The geodesic distance between θi and θj is the minimum between clockwise and counterclockwise arc-lengths and is denoted by |θi − θj |. In the parametrization: distcc (θ1 , θ2 ) = mod((θ2 − θ1 ), 2π),
distc (θ1 , θ2 ) = mod((θ1 − θ2 ), 2π)
|θ1 − θ2 | = min{distc (θ1 , θ2 ), distcc (θ1 , θ2 )}.
Rotations Given the angle α ∈ [−π, π[, the rotation of the n-tuple θ = (θ1 , . . . , θn ) ∈ Tn by α, denoted by rotα (θ), is the counterclockwise rotation of each entry (θ1 , . . . , θn ) by α. For θ =∈ Tn , we also define its rotation set to be [θ] = {rotα (θ) ∈ Tn | α ∈ [−π, π[}. The coupled oscillator model (14.6) is invariant under rotations, that is, given a solution θ : R≥0 → Tn to the coupled oscillator model, a rotation of rotα (θ(t)) by any angle α is again a solution. Arc subsets of the n-torus Given a length γ ∈ [0, 2π[, the arc subset Γarc (γ) ⊂ Tn is the set of n-tuples (θ1 , . . . , θn ) such that there exists an arc of length γ containing all θ1 , . . . , θn . The set Γarc (γ) is the interior of Γarc (γ). For example, θ ∈ Γarc (π) implies all angles θ1 , . . . , θn belong to a closed half circle. Note: (i) If (θ1 , . . . , θn ) ∈ Γarc (γ), then |θi − θj | ≤ γ for all i and j. The converse is not true in general. For example, {θ ∈ Tn | |θi − θj | ≤ π for all i, j} is equal to the entire Tn . However, the converse statement is true in the following form (see also Exercise E14.2): if |θi − θj | ≤ γ for all i and j and (θ1 , . . . , θn ) ∈ Γarc (π), then (θ1 , . . . , θn ) ∈ Γarc (γ).
(ii) If θ = (θ1 , . . . , θn ) ∈ Γarc (π), then average(θ) is well posed. (The average of n angles is ill-posed in general. For example, there is no reasonable definition of the average of two diametrically-opposed points.) Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 14.3. Coupled phase oscillator networks
14.3.2
177
Synchronization notions
Consider the following notions of synchronization for a solution θ : R≥0 → Tn : Frequency synchrony: A solution θ : R≥0 → Tn is frequency synchronized if θ˙i (t) = θ˙j (t) for all time t and for all i and j. Phase synchrony: A solution θ : R≥0 → Tn is phase synchronized if θi (t) = θj (t) for all time t and for all i and j. Phase cohesiveness: A solution θ : R≥0 → Tn is phase cohesive with respect to γ > 0 if one of the following conditions hold for all time t: (i) θ(t) ∈ Γarc (γ); (ii) |θi (t) − θj (t)| ≤ γ for all edges (i, j) of a graph of interest; or qP n 2 (iii) i,j=1 |θi (t) − θj (t)| /2 < γ.
Asymptotic notions: We will also talk about solutions that asymptotically achieve certain synchronization properties. For example, a solution θ : R≥0 → Tn achieves phase synchronization if limt→∞ |θi (t) − θj (t)| = 0. Analogous definitions can be given for asymptotic frequency synchronization and asymptotic phase cohesiveness. Finally, notice that phase synchrony is the extreme case of all phase cohesiveness notions with γ = 0.
14.3.3
Preliminary results
We have the following result on the synchronization frequency. Lemma 14.1 (Synchronization frequency). If a solution of the coupled oscillator model (14.6) achieves frequency synchronization, then it does so with a constant synchronization frequency equal to n
ωsync ,
1X ωi = average(ω). n i=1
Proof. This fact is obtained by summing all equations (14.6) for i ∈ {1, . . . , n}.
Lemma 14.1 implies that, by expressing each angle with respect to a rotating frame with frequency ωsync and by replacing ωi by ωi − ωsync , we obtain ωsync = 0 or, equivalently, ω ∈ 1⊥ n . In this rotating frame a frequency-synchronized solution is an equilibrium. Due to the rotational invariance of the coupled oscillator model (14.6), it follows that if θ∗ ∈ Tn is an equilibrium point, then every point in the rotation set [θ∗ ] = {θ ∈ Tn | rotα (θ∗ ) , α ∈ [−π, π[} is also an equilibrium. Notice that the set [θ∗ ] is a connected circle in Tn , and we refer to it as an equilibrium set. Figure 14.4 for the two-dimensional case. We have the following important result on local stability properties of equilibria. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 178
Chapter 14. Coupled Oscillators: Basic Models
[✓⇤ ] 12 ✓⇤ |✓1
✓2 | < ⇡/2
Figure 14.4: Illustration of the state space T2 , the equilibrium set [θ∗ ] associated to a phase-synchronized equilibrium θ∗ (dotted blue line), the (meshed red) phase cohesive set |θ2 − θ1 | < π/2, and the tangent space with translation vector 12 at θ∗ arising from the rotational symmetry.
Lemma 14.2 (Linearization). Assume the frequencies satisfy ω ∈ 1⊥ n and G is connected with incidence matrix B. The following statements hold: (i) Jacobian: the Jacobian of the coupled oscillator model (14.6) at θ ∈ Tn is J(θ) = −B diag({aij cos(θi − θj )}{i,j}∈E )B > , (ii) Local stability: if there exists an equilibrium θ∗ such that |θi∗ − θj∗ | < π/2 for all {i, j} ∈ E, then (a) −J(θ∗ ) is a Laplacian matrix; and (b) the equilibrium set [θ∗ ] is locally exponentially stable. Proof. We start with statements (i) and (ii)a. Given θ ∈ Tn , we define the undirected graph Gcosine (θ) with the same nodes and edges as G and with edge weights aij cos(θi − θj ). Next, we compute Xn Xn  ∂ ωi − aij sin(θi − θj ) = − aij cos(θi − θj ), j=1 j=1 ∂θi Xn  ∂ ωi − aik sin(θi − θk ) = aij cos(θi − θj ). k=1 ∂θj
Therefore, the Jacobian is equal to minus the Laplacian matrix of the (possibly negatively weighted) graph Gcosine (θ) and statement (i) follows from Lemma 8.2. Regarding statement (ii)a, if |θi∗ − θj∗ | < π/2 for all {i, j} ∈ E, then cos(θi∗ − θj∗ ) > 0 for all {i, j} ∈ E, so that Gcosine (θ) has strictly nonnegative weights and all usual properties of Laplacian matrices hold. To prove statement (ii)b notice that J(θ∗ ) is negative semidefinite with the nullspace 1n arising from the rotational symmetry, see Figure 14.4. All other eigenvectors are orthogonal to 1n and have negative eigenvalues. We now restrict our analysis to the orthogonal complement of 1n : we define a coordinate transformation matrix Q ∈ R(n−1)×n with orthonormal rows orthogonal to 1n , Q1n = 0n−1
and
QQ> = In−1 ,
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 14.3. Coupled phase oscillator networks
179
and we note that QJ(θ∗ )Q> has negative eigenvalues. Therefore, in the original coordinates, the zero eigenspace 1n is exponentially stable. By Theorem 13.5, the corresponding equilibrium set [θ∗ ] is locally exponentially stable.  Corollary 14.3 (Frequency synchronization). If a solution of the coupled oscillator model (14.6) satisfies the phase cohesiveness properties |θi (t) − θj (t)| ≤ γ for some γ ∈ [0, π/2[ and for all t ≥ 0, then the coupled oscillator model (14.6) achieves exponential frequency synchronization. Proof. Let xi (t) = θ˙i (t) be the frequency. Then x(t) ˙ = J(θ(t))x(t) is a time-varying averaging system. The associated undirected graph has time-varying yet strictly positive weights aij cos(θi (t) − θj (t)) ≥ aij cos(γ) > 0 for each {i, j} ∈ E. Hence, the weighted graph is connected for each t ≥ 0. From the analysis of time-varying averaging systems in Theorem 11.6, the exponential convergence of x(t) to average(x(0))1n follows. Equivalently, the frequencies synchronize. 
14.3.4
The order parameter and the mean field model
An alternative synchronization measure (besides phase cohesiveness) is the magnitude of the order parameter reiψ =
1 Xn eiθj . j=1 n
(14.8)
The order parameter (14.8) is the centroid of all oscillators represented as points on the unit circle in C1 . The magnitude r of the order parameter is a synchronization measure: • if the oscillators are phase-synchronized, then r = 1; • if the oscillators are spaced equally on the unit circle, then r = 0; and • for r ∈ ]0, 1[ and oscillators contained in a semi-circle, the associated configuration of oscillators satisfy a certain level of phase cohesiveness; see Exercise E14.3. By means of the order parameter reiψ the all-to-all Kuramoto model (14.7) can be rewritten in the insightful form θ˙i = ωi − Kr sin(θi − ψ) , i ∈ {1, . . . , n} . (14.9)
(We ask the reader to establish this identity in Exercise E14.4.) Equation (14.9) gives the intuition that the oscillators synchronize because of their coupling to a mean field represented by the order parameter reiψ , which itself is a function of θ(t). Intuitively, for small coupling strength K each oscillator rotates with its distinct natural frequency ωi , whereas for large coupling strength K all angles θi (t) will entrain to the mean field reiψ , and the oscillators synchronize. The transition from incoherence to synchrony occurs at a critical threshold value of the coupling strength, denoted by Kcritical .
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 180
14.4 E14.1
Chapter 14. Coupled Oscillators: Basic Models
Exercises Simulating coupled oscillators. Simulate in your favorite programming language and software package the coupled Kuramoto oscillators in equation (14.3). Set n = 10, define a vector ω ∈ R10 with entries deterministically uniformly-spaced between −1 and 1. Select random initial phases.
(i) Simulate the resulting differential equations for K = 10 and K = 0.1. (ii) Find the approximate value of K at which the qualitative behavior of the system changes from asynchrony to synchrony.
Turn in your code, a few printouts (as few as possible), and your written responses. E14.2
E14.3
Phase cohesiveness and arc length. Pick γ < 2π/3 and n ≥ 3. Show the following statement: if θ ∈ Tn satisfies |θi − θj | ≤ γ for all i, j ∈ {1, . . . , n}, then there exists an arc of length γ containing all angles, that is, θ ∈ Γarc (γ).
Order parameter and arc length. Given n ≥ 2 and θ ∈ Tn , the shortest arc length γ(θ) is the length of the shortest arc containing all angles, i.e., the smallest γ(θ) such that θ ∈ Γarc (γ(θ)). Given θ ∈ Tn , the order parameter is the centroid of (θ1 , . . . , θn ) understood as points on the unit circle in the complex plane C: √ √ 1 Xn r(θ) e −1 ψ(θ) := e −1 θj j=1 n
Prove the following statements:
(i) if γ(θ) ∈ [0, π], then r(θ) ∈ [cos(γ(θ)/2), 1]; and (ii) if θ ∈ Γarc (π), then γ(θ) ∈ [2 arccos(r(θ)), π].
The order parameter magnitude r is known to measure synchronization. Show the following statements: (iii) if all oscillators are phase-synchronized, then r = 1, and (iv) if all oscillators are spaced equally on the unit circle (the so-called splay state), then r = 0. E14.4
Order parameter and mean-field dynamics. Show that the Kuramoto model (14.7) is equivalent to the so-called mean-field model (14.9) with the order parameter r defined in (14.8).
E14.5
Uniqueness of Kuramoto equilibria. A common misconception in the literature is that the Kuramoto model has a unique equilibrium set in the phase cohesive set {θ ∈ Tn | |θi − θj | < π/2 for all {i, j} ∈ E}. Consider now the example of a Kuramoto oscillator network defined over a symmetric ring graph with identical unit weights and zero natural frequencies. The equilibria are determined by 0 = sin(θi − θi−1 ) + sin(θi − θi+1 ) , where i ∈ {1, . . . , n} and all indices are evaluated modulo n. Show that for n > 4 there are at least two disjoint equilibrium sets in the phase cohesive set {θ ∈ Tn | |θi − θj | < π/2 for all {i, j} ∈ E}.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Chapter 15
Networks of Coupled Oscillators 15.1
Synchronization of identical oscillators
We start our discussion with the following insightful lemma. Lemma 15.1. Consider the coupled oscillator model (14.6). If ωi 6= ωj for some distinct i, j ∈ {1, . . . , n}, then the oscillators cannot achieve phase synchronization. Proof. We prove the lemma by contraposition. Assume that all oscillators are in phase synchrony θi (t) = θj (t) for all t ≥ 0 and all i, j ∈ {1, . . . , n}. Then by equating the dynamics, θ˙i (t) = θ˙j (t), it follows necessarily that ωi = ωj .  Motivated by Lemma 15.1, we consider oscillators with identical natural frequencies, ωi = ω ∈ R for all i ∈ {1, . . . , n}. By working in a rotating frame with frequency ω, we have ω = 0. Thus, we consider the model Xn θ˙i = − aij sin(θi − θj ), i ∈ {1, . . . , n}. (15.1) j=1
Notice that phase synchronization is an equilibrium of the this model. Conversely, phase synchronization cannot be an equilibrium of the original coupled oscillator model (14.6) if ωi 6= ωj .
15.1.1
An averaging-based approach
Let us first analyze the coupled oscillator model (15.1) with initial conditions restricted to an open semi-circle, θ(0) ∈ Γarc (γ) for some γ ∈ [0, π[. In this case, the oscillators remain in a semi-circle at least for small times t > 0 and the two coordinate transformations xi (t) = tan(θi (t)) (with xi ∈ R),
and
yi (t) = θi (t) (with yi ∈ R)
are well-defined and bijective (at least for small times). In the xi -coordinates, the coupled oscillator model reads as the time-varying continuous-time averaging system Xn x˙ i (t) = − bij (t)(xi (t) − xj (t)), (15.2) j=1
181
 182
Chapter 15. Networks of Coupled Oscillators
p where bij (t) = aij (1 + xi (t))2 /(1 + xj (t))2 and bij (t) ≥ aij cos(γ/2); see Exercise E15.3 for a derivation. Similarly, in the yi -coordinates, the coupled oscillator model reads as Xn y˙ i (t) = − cij (t)(yi (t) − yj (t)), (15.3) j=1
where cij (t) = aij sinc(yi (t) − yj (t)) and cij (t) ≥ aij sinc(γ). Notice that both averaging formulations (15.2) and (15.3) are well-defined as long as the the oscillators remain in a semi-circle Γarc (γ) for some γ ∈ [0, π[. Theorem 15.2 (Phase cohesiveness and synchronization in open semicircle). Consider the coupled oscillator model (15.1) with a connected, undirected, and weighted graph G = ({1, . . . , n}, E, A). The following statements hold: (i) phase cohesiveness: for each γ ∈ [0, π[ each solution orginating in Γarc (γ) remains in Γarc (γ) for all times; (ii) asymptotic phase synchronization: each trajectory originating in Γarc (γ) for γ ∈ [0, π[ achieves exponential phase synchronization, that is, kθ(t) − average(θ(0))1n k2 ≤ kθ(0) − average(θ(0))1n k2 eλps t ,
(15.4)
where λps = −λ2 (L) cos(γ/2). Proof. Consider the averaging formulations (15.2) and (15.3) with initial conditions θ(0) ∈ Γarc (γ) for some γ ∈ [0, π[. By continuity, for small positive times t > 0, the oscillators remain in a semi-circle, the time-varying weights bij (t) ≥ aij (cos(γ/2) and cij (t) ≥ aij sinc(γ) are strictly positive for each {i, j} ∈ E, the associated time-dependent graph is connected. As one establishes in the proof of Theorem 11.9, the max-min functions Vmax-min (x) = Vmax-min (y) =
max xi −
i∈{1,...,n}
max yi −
i∈{1,...,n}
i∈{1,...,n}
i∈{1,...,n}
min
xi ,
min
yi
are strictly decreasing for the time-varying consensus systems (15.2) and (15.3) until consensus is reached. Thus, the oscillators remain in Γarc (γ) phase synchronization exponentially fast. Since the graph is undirected, we can also conclude convergence to the average phase. Finally, the explicit convergence estimate (15.4) follows, for example, by analyzing (15.2) with the disagreement Lyapunov function and using bij (t) ≥ aij cos(γ/2). 
15.1.2
The potential landscape, convergence and phase synchronization
The consensus analysis in Theorem 15.2 leads to a powerful result but is inherently restricted to a semi-circle. To overcome this limitation, we use potential functions as an analysis tool. Inspired by Examples #1 and #3, define the potential function U : Tn → R by X  U (θ) = aij 1 − cos(θi − θj ) . (15.5) {i,j}∈E
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 15.1. Synchronization of identical oscillators
183
Then the coupled oscillator model (14.6) (with all ωi = 0) can be formulated as the gradient flow >
∂U (θ) θ˙ = − . ∂θ
(15.6)
Among the many critical points of the potential function (15.5), the set of phase-synchronized angles is the global minimum of the potential function (15.5). This can be easily seen since each summand in (15.5) is bounded in [0, 2aij ] and the lower bound is reached only if neighboring oscillators are phase-synchronized. This global minimum is locally exponentially stable. Theorem 15.3 (Phase synchronization). Consider the coupled oscillator model (15.1) with a connected, undirected, and weighted graph G = ({1, . . . , n}, E, A). Then (i) Global convergence: For all initial conditions θ(0) ∈ Tn , the phases θi (t) converge to the set of critical points {θ ∈ Tn | ∂U (θ)/∂θ = 0> n }; and
(ii) Local stability: Phase synchronization is a locally exponentially stable equilibrium set. Proof. The derivative of the potential function U (θ) along trajectories of (15.6) is 
 ∂U (θ) > 2 
U˙ (θ) = − 
 .
 ∂θ 
Since the potential function and its derivative are smooth and the dynamics are bounded in a compact forward invariant set (Tn ), we can apply the Invariance Principle in Theorem 13.3 to arrive at statement (i). Statement (ii) follows from the Jacobian result in Lemma 14.2 and Theorem 13.5.  Theorem 15.3 together with Theorem 15.2 gives a fairly complete picture of the convergence and phase synchronization properties of the coupled oscillator model (15.1).
15.1.3
Phase balancing
Applications in neuroscience, vehicle coordination, and central pattern generators for robotic locomotion motivate the study of coherent behaviors with synchronized frequencies where the phases are not synchronized, but rather dispersed in appropriate patterns. While the phase-synchronized state can be characterized by the order parameter r achieving its maximal (unit) magnitude, we say that a solution θ : R≥0 → Tn to the coupled oscillator model (14.6) achieves phase balancing if all phases θi asymptotically converge to the set Xn  θ ∈ Tn | r(θ) = eiθj /n = 0 , j=1
that is, asymptotically the oscillators are uniformly distributed over the unit circle S1 so that their centroid converges to the origin. For a complete homogeneous graph with coupling strength aij = K/n, i.e., for the Kuramoto model (14.7), we have a remarkable identity between the magnitude of the order parameter r and the potential function U (θ)  Kn U (θ) = 1 − r2 . (15.7) 2 Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 184
Chapter 15. Networks of Coupled Oscillators
(We ask the reader to establish this identity in Exercise E15.1.) For the complete graph, the correspondence (15.7) shows that the global minimum of the potential function U (θ) = 0 (for r = 1) corresponds to phase-synchronization and the global maximum U (θ) = Kn/2 (for r = 0) corresponds to phase balancing. This motivates the following gradient ascent dynamics to reach phase balancing: ∂U (θ) θ˙ = + ∂θ
>
=
n X j=1
aij sin(θi − θj ) .
(15.8)
Theorem 15.4 (Phase balancing). Consider the coupled oscillator model (15.8) with a connected, undirected, and weighted graph G = ({1, . . . , n}, E, A). Then (i) Global convergence: For all initial conditions θ(0) ∈ Tn , the phases θi (t) converge to the set of critical points {θ ∈ Tn | ∂U (θ)/∂θ = 0> n }; and
(ii) Local stability: For a complete graph with uniform weights aij = K/n, phase balancing is the global maximizer of the potential function (15.7) and is a locally asymptotically stable equilibrium set.
Proof. The proof statement (i) is analogous to the proof of statement (i) in Theorem 15.3. To prove statement (ii), notice that, for a complete graph, the phase balanced set characterized by  2 . By Theorem 13.6, local r = 0 achieves the global maximum of the potential U (θ) = Kn 1 − r 2 maxima of the potential are locally asymptotically stable for the gradient ascent dynamics (15.8). 
15.2
Synchronization of heterogeneous oscillators
In this section we analyze non-identical oscillators with ωi 6= ωj . As shown in Lemma 15.1, these oscillator networks cannot achieve phase synchronization. On the other hand frequency synchronization with a certain degree of phase cohesiveness can be achieved provided that the natural frequencies satisfy certain bounds relative to the network coupling. We start off with the following necessary conditions. Lemma 15.5. Necessary synchronization condition Consider the coupled Pn oscillator model (14.6) with graph G = ({1, . . . , n}, E, A), frequencies ω ∈ 1⊥ , and nodal degree deg = n j=1 aij for each node i ∈ {1, . . . , n}. i If there exists a frequency-synchronized solution satisfying the phase cohesiveness |θi − θj | ≤ γ for all {i, j} ∈ E and for some γ ∈ [0, π/2], then the following conditions hold: (i) Absolute bound: For each node i ∈ {1, . . . , n},
degi sin(γ) ≥ |ωi | .
(ii) Incremental bound: For distinct i, j ∈ {1, . . . , n},
(degi + degj ) sin(γ) ≥ |ωi − ωj | .
(15.9)
(15.10)
Proof. Statement (i) follows directly from the fact that synchronized solutions must satisfy the equilibrium equation θ˙i = 0. Since Pnthe sinusoidal interaction terms in equation (14.6) are upper bounded by the nodal degree degi = j=1 aij , condition (15.9) is necessary for the existence of an equilibrium. Statement (ii) follows from the fact that frequency-synchronized solutions must satisfy θ˙i − θ˙j = 0. By analogous arguments, we arrive at the necessary condition (15.10).  Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 15.2. Synchronization of heterogeneous oscillators
15.2.1
185
Synchronization of heterogeneous oscillators over complete homogeneous graphs
Consider the Kuramoto model over a complete homogeneous graph: K Xn θ˙i = ωi − sin(θi − θj ), j=1 n
i ∈ {1, . . . , n}.
(15.11)
As discussed in Subsection 14.3.4, the Kuramoto model synchronizes provided that the coupling gain K is larger than some critical value Kcritical . The necessary condition (15.10) delivers a lower bound for Kcritical given by   n K≥ max ωi − min ωi . i i 2(n − 1)
Here we evaluated the left-hand side of (15.10) for aij = K/n, for the maximum γ = π/2, and for all distinct i, j ∈ {1, . . . , n}. Perhaps surprisingly, the lower necessary bound (15.2.1) is a factor 1/2 away from the upper sufficient bound.
Theorem 15.6 (Synchronization test for all-to-all Kuramoto model). Consider the Kuramoto model (15.11) with natural frequencies ω ∈ 1⊥ n and coupling strength K. Assume K > Kcritical , max ωi − min ωi , i
i
(15.12)
and define the arc lengths γmin ∈ [0, π/2[ and γmax ∈ ]π/2, π] as the unique solutions to sin(γmin ) = sin(γmax ) = Kcritical /K.
max
Kcritical /K
min
The following statements hold: (i) phase cohesiveness: each solution starting in Γarc (γ), for γ ∈ [γmin , γmax ], remains in Γarc (γ) for all times; (ii) asymptotic phase cohesiveness: each solution starting in Γarc (γmax ) asymptotically reaches the set Γarc (γmin ); and (iii) asymptotic frequency synchronization: each solution starting in Γarc (γmax ) achieves frequency synchronization. Moreover, the following converse statement is true: Given an interval [ωmin , ωmax ], the coupling strength K satisfies K > ωmax − ωmin if, for all frequencies ω supported on [ωmin , ωmax ] and for the arc length γmax computed as above, the set Γarc (γmax ) is positively invariant. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 186
Chapter 15. Networks of Coupled Oscillators
Proof. We start with statement (i). Define the function W : Γarc (π) → [0, π[ by W (ψ) = max{|ψi − ψj | | i, j ∈ {1, . . . , n}}. The arc containing all angles ψ has two boundary points: a counterclockwise maximum and a counterclockwise minimum. If Umax (ψ) (resp. Umin (ψ)) denotes the set indices of the angles ψ1 , . . . , ψn that are equal to the counterclockwise maximum (resp. the counterclockwise minimum), then W (ψ) = |ψm0 − ψk0 |, for all m0 ∈ Umax (ψ) and k 0 ∈ Umin (ψ). We now assume θ(0) ∈ Γarc (γ), for γ ∈ [γmin , γmax ], and aim to show that θ(t) ∈ Γarc (γ) for all times t > 0. By continuity, Γarc (γ) is positively invariant if and only if W (θ(t)) does not increase at any time t such that W (θ(t)) = γ. In the next equation we compute the maximum possible amount of infinitesimal increase of W (θ(t)) along system (15.11). We do this in a loose way here and refer to (Lin et al. 2007, Lemma 2.2) for a rigorous treatment. The statement is: D+ W (θ(t)) := lim sup ∆t→0+
W (θ(t + ∆t)) − W (θ(t)) = θ˙m (t) − θ˙k (t), h
where m ∈ Umax (θ(t)) and k ∈ Umin (θ(t)) have the property that θ˙m (t) = max{θ˙m0 (t) | m0 ∈ Umax (θ(t))} and θ˙k (t) = min{θ˙k0 (t) | k 0 ∈ Umin (θ(t))}. In components n  K X sin(θm (t) − θj (t)) + sin(θj (t) − θk (t)) . D W (θ(t)) = ωm − ωk − n +
j=1
x−y The trigonometric identity sin(x) + sin(y) = 2 sin( x+y 2 ) cos( 2 ) leads to
    n  KX θm (t) − θk (t) θm (t) − θi (t) θi (t) − θk (t) D W (θ(t)) = ωm − ωk − 2 sin cos − . n 2 2 2 +
i=1
Measuring angles counterclockwise and modulo 2π, the equality W (θ(t)) = γ implies θm (t) − θk (t) = γ, θm (t) − θi (t) ∈ [0, γ], and θi (t) − θk (t) ∈ [0, γ]. Moreover,     θ m − θ i θi − θk θm − θi θi − θk = cos(γ/2), min cos − = cos max − θ θ 2 2 2 2 
so that
n γ   γ  K X D W (θ(t)) ≤ ωm − ωk − 2 sin cos . n 2 2 +
i=1
Applying the reverse identity 2 sin(x) cos(y) = sin(x − y) + sin(x + y), we obtain n KX D W (θ(t)) ≤ ωm − ωk − sin(γ) ≤ (max ωi − min ωi ) − K sin(γ) . i i n +
i=1
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 15.2. Synchronization of heterogeneous oscillators
187
Hence, the W (θ(t)) does not increase at all t such that W (θ(t)) = γ if K sin(γ) ≥ Kcritical = maxi ωi − mini ωi . Given the structure of the level sets of γ 7→ K sin(γ), there exists an open interval of arc lengths γ ∈ [0, π] satisfying K sin(γ) ≥ maxi ωi − mini ωi if and only if equation (15.12) is true with the strict equality sign at γ ∗ = π/2, that is, if K > Kcritical . Additionally, if K > Kcritical , there exists a unique γmin ∈ [0, π/2[ and a unique γmax ∈ ]π/2, π] that satisfy equation (15.12) with the equality sign. In summary, for every γ ∈ [γmin , γmax ], if W (θ(t)) = γ, then the arc-length W (θ(t)) is non-increasing. This concludes the proof of statement (i). Moreover, pick ε  γmax − γmin . For all γ ∈ [γmin + ε, γmax − ε], there exists a positive δ(ε) with the property that, if W (θ(t)) = γ, then D+ W (θ(t)) ≤ −δ(ε). Hence, each solution θ : R≥0 → Tn starting in Γarc (γmax − ε) must satisfy W (θ(t)) ≤ γmin − ε after time at most (γmax − γmin )/δ(ε). This proves statement (ii). Regarding statement (iii), we just proved that for every θ(0) ∈ Γarc (γmax ) and for all γ ∈ ]γmin , γmax ] there exists a finite time T ≥ 0 such that θ(t) ∈ Γarc (γ) for all t ≥ T and for some γ < π/2. It follows that |θi (t) − θj (t)| ≤ γ < π/2 for all {i, j} ∈ E and for all t ≥ T . We now invoke Corollary 14.3 to conclude the proof of statement (iii). The converse statement can be established by noticing that all of the above inequalities and estimates are exact for a bipolar distribution of natural frequencies ωi ∈ {ω, ω} for all i ∈ {1, . . . , n}. The full proof is in (Dörfler and Bullo 2011) 
15.2.2
Synchronization of heterogeneous oscillators over weighted undirected graphs
Consider the coupled oscillator model over a weighted undirected graph: Xn θ˙i = ωi − aij sin(θi − θj ), i ∈ {1, . . . , n}. j=1
Adopt the following shorthands: r
 1 Xn
ω = (ωi − ωj )2 , 2, pairs i,j=1 2
and 
 
θ = 2, pairs
r
(15.13)
1 Xn |θi − θj |2 . i,j=1 2
Theorem 15.7 (Synchronization test I). Consider the coupled oscillator model (15.13) with frequencies ω ∈ 1⊥ n defined over a weighted undirected graph with Laplacian matrix L. Assume λ2 (L) > λcritical , kωk2, pairs ,
(15.14)
and define γmax ∈ ]π/2, π] and γmin ∈ [0, π/2[ as the solutions to (π/2) · sinc(γmax ) = sin(γmin ) = λcritical /λ2 (L). The following statements hold:  (i) phase cohesiveness: each solution starting  in θ ∈ Γarc (π) | kθk2, pairs ≤ γ , for γ ∈ [γmin , γmax ], remains in θ ∈ Γarc (π) | kθk2, pairs ≤ γ for all times,  (ii) asymptotic phase cohesiveness: each solution starting in θ ∈ Γarc (π) | kθk2, all pairs < γmax  asymptotically reaches the set θ ∈ Γarc (π) | kθk2, all pairs ≤ γmin ; and (iii) asymptotic frequency synchronization: each solution starting in  θ ∈ Γarc (π) | kθk2, pairs < γmax achieves frequency synchronization.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 188
Chapter 15. Networks of Coupled Oscillators
The proof of Theorem 15.7 follows the reasoning of the proof of Theorem 15.6 using the quadratic
 2 Lyapunov function θ 2, pairs . The full proof is in (Dörfler and Bullo 2012, Appendix B).
15.2.3
Appendix: alternative theorem
Notice that the parametric condition (15.14) of the above theorem is very conservative since the left-hand side is at most n (for a complete graph), and the right hand side is a sum of n2 terms. In the following we partially improve upon this conservativeness. Adopt the following shorthands: rX rX
 
 2
ω 
θ (ωi − ωj ) , and = |θi − θj |2 . = 2, edges 2, edges {i,j}∈E
{i,j}∈E
Theorem 15.8 (Synchronization test II). Consider the coupled oscillator model (15.13) with frequencies ω ∈ 1⊥ n defined over a weighted undirected graph with Laplacian matrix L. Assume λ2 (L) > λcritical , kωk2, edges ,
(15.15)
and define γmin ∈ [0, π/2[ as the solution to sin(γmin ) = λcritical /λ2 (L). Then there exists a locally exponentially stable equilibrium set [θ∗ ] satisfying |θi∗ − θj∗ | ≤ γmin for all {i, j} ∈ E. Proof. Lemma 14.2 guarantees local exponential stability an equilibrium set [θ∗ ] satisfying |θi∗ − θj∗ | ≤ γ for all {i, j} ∈ E and for some γ ∈ [0, π/2[. In the following we establish conditions for existence of equilibria this particular set ∆(γ) = {θ ∈ Tn | |θi − θj | ≤ γ, for all {i, j} ∈ E}. The equilibrium equations can be written as ω = L(B > θ)θ , (15.16) where L(B > θ) = B diag({aij sinc(θi − θj )}{i,j}∈E )B > is the Laplacian matrix associated with the graph ˜ with nonnegative edge weights a G = ({1, . . . , n}, E, A) ˜ij = aij sinc(θi − θj ) ≥ aij sinc(γ) > 0 for {i, j} ∈ E and θ ∈ ∆(γ). Since for any weighted Laplacian matrix L, we have that L · L† = L† · L = > > † In − (1/n)1n 1> n , a multiplication of equation (15.16) from the left by B L(B θ) yields B > L(B > θ)† ω = B > θ .
(15.17)
Note that the left-hand side of equation (15.17) is a continuous1 function for θ ∈ ∆(γ). Consider the formal substitution x = B > θ, the compact and convex set S∞ (γ) = {x ∈ Img(B > ) | kxk∞ ≤ γ} (corresponding to ∆(γ)), and the continuous map f : S∞ (γ) → R given by f (x) = B > L(x)† ω. Then equation (15.17) is equivalent to the fixed-point equation f (x) = x. We invoke the Brouwer’s Fixed Point Theorem which states that every continuous map from a compact and convex set to itself has a fixed point, see for instance (Spanier 1994, Section 7, Corollary 8). 1
> The continuity can be established when re-writing equations (15.16) and (15.17) in the quotient space 1⊥ n , where L(B θ) is nonsingular, and using the fact that the inverse of a nonsingular matrix is a continuous function of its elements. See also (Rakočević 1997, Theorem 4.2) for a necessary and sufficient conditions for continuity of the Moore-Penrose inverse requiring that L(B > θ) has constant rank for θ ∈ ∆(γ).
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 15.2. Synchronization of heterogeneous oscillators
189
Since the analysis of the map f in the ∞-norm is very hard in the general case, we resort to a 2-norm analysis and restrict ourselves to the set S2 (γ) = {x ∈ image(B > ) | kxk2 ≤ γ} ⊆ S∞ (γ). The set S2 (γ) n corresponds to the set {θ ∈ T | θ 2, edges ≤ γ} in θ-coordinates. By Brouwer’s Fixed Point Theorem,
there exists a solution x ∈ S2 (γ) to the equation x = f (x) if and only if kf (x)k2 ≤ γ for all x ∈ S2 (γ), or equivalently if and only if 
max B > L(x)† ω ≤ γ . (15.18) x∈S2 (γ)
2
After some bounding (see (Dörfler and Bullo 2012, Appendix C) for details), we arrive at 
 
max B > L(x)† ω ≤ ω 2, edges / (λ2 (L) · sinc(γ)) . x∈S2 (γ)
2
The term on the right-hand side of the above inequality has to be less or equal than γ. In summary, we
 conclude that there is a locally exponentially stable synchronization set [θ∗ ] ∈ {θ ∈ Tn | θ 2, edges ≤ γ} ⊆ ∆(γ) if 
 λ2 (L) sin(γ) ≥ ω 2, edges .
(15.19)
Since the left-hand side of (15.19) is a concave function of γ ∈ [0, π/2[, there exists an open set of γ ∈ [0, π/2[ satisfying equation (15.19) if and only if equation (15.19) is true with the strict equality sign at γ ∗ = π/2, which corresponds to condition (15.15). Additionally, if these two equivalent statements are true, then there exists
 a unique γmin ∈ [0, π/2[ that satisfies equation (15.19) with the equality sign, namely sin(γmin ) = ω 2, edges /λ2 (L). This concludes the proof. 
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 190
15.3 E15.1 E15.2
Chapter 15. Networks of Coupled Oscillators
Exercises  P Potential and order parameter. Recall U (θ) = {i,j}∈E aij 1−cos(θi −θj ) . Prove U (θ) = for a complete homogeneous graph with coupling strength aij = K/n.
Kn 2 2 (1−r )
Analysis of the two-node case. Present a complete analysis of a system of two coupled oscillators: θ˙1 = ω1 − a12 sin(θ1 − θ2 ) , θ˙2 = ω2 − a21 sin(θ2 − θ1 ) ,
where a12 = a21 and ω1 + ω2 = 0. When do equilibria exist? What are their stability properties and their basins of attraction? E15.3
E15.4
Averaging analysis of coupled oscillators in a semi-circle. Consider the coupled oscillator model (15.1) with θ ∈ Γarc (γ) for some γ < π. Show that the coordinate transformations xi = tan(θi ) (with xi ∈ R) gives the averaging system (15.2) with bij ≥ aij cos(γ/2). Phase synchronization in spring network. Consider the spring network from Example #1 with identical oscillators and a connected, undirected, and weighted graph: Mi θ¨i + Di θ˙i +
n X j=1
aij sin(θi − θj ) ,
i ∈ {1, . . . , n} .
Prove the phase synchronization result (in Theorem 15.3) for this spring network.
E15.5
Pn Synchronization on acyclic graphs. Consider the coupled oscillator model θ˙i = − j=1 aij sin(θi −θj ) Pn with i=1 ωi = 0 and defined over an acyclic interaction graph, that is, the adjacency matrix A with elements aij = aji ∈ {0, 1} induces an undirected, connected, and acyclic graph. Show that in this case the following exact synchronization condition holds: there exists a locally stable
 frequency-synchronized
solution in the set {θ ∈ Tn | |θi − θj | < π/2 for all {i, j} ∈ E} if and only if B > L† ω ∞ < 1, where B and L are the network incidence and Laplacian matrices. Hint: Follow the derivation in Example 8.12.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Chapter 16
Virus Propagation: Basic Models In this chapter and the next we present simple models for the diffusion and propagation of infectious diseases. The proposed models may be relevant also in the context of propagation of information/signals in a communication network and diffusion of innovations in competitive economic networks. Other interesting propagation phenomena include failures in power networks and wildfires in forests. In this chapter and the next, we are interested in (1) models (lumped vs network, deterministic vs stochastic), (2) asymptotic behaviors (vanishing infection, steady-state epidemic, full contagion), and (3) the transient propagation of epidemics starting from small initial fractions of infected nodes (possible epidemic outbreak as opposed to monotonically vanishing infection). In the interest of clarity, we begin with “lumped” variables, i.e., variables which represent an entire “well-mixed” population of nodes. The next chapter will discuss “distributed” variable models, i.e., network models. We study three low-dimensional deterministic models in which nodes may be in one of two or three states; see Figure 16.1. Susceptible
Infected
Susceptible
Susceptible
Infected
Infected
Recovered
Figure 16.1: The three basic models SI, SIS and SIR for the propagation of an infectious desease
We say that an epidemic outbreak takes place if a small initial fraction of infected individuals leads to the contagion of a significant fraction of the population. We say the system displays an epidemic threshold if epidemic outbreaks occur when some combined value of parameters and initial conditions are above critical values.
16.1
The SI model
Given a population, let x(t) denote the fraction of infected individuals at time t ∈ R≥0 . Similarly, let s(t) denote the fraction of susceptible individuals. Clearly, x(t) + s(t) = 1 at all times. We model propagation 191
 192
Chapter 16. Virus Propagation: Basic Models
via the following first-order differential equation, called the susceptible–infected (SI) model (16.1)
x(t) ˙ = βs(t)x(t) = β(1 − x(t))x(t),
where β > 0 is the infection rate. We will see distributed and stochastic versions of this model later in the chapter. A simple qualitative analysis of this equation can be performed by plotting x˙ over x, see Figure 16.2. 0.4 0.3 0.2 0.1 0.0 0.2
0.4
0.6
0.8
1.0
-0.1 -0.2
Figure 16.2: Phase portrait of the (lumped deterministic) SI model (β = 1).
Remark 16.1 (Heuristic modeling assumptions and derivation). Over the interval (t, t+∆t), pairwise meetings between individuals in the population take place in the following fashion: assume the population has n individuals, pick a meeting rate βm > 0, and assume that nβm ∆t individuals will meet other nβm ∆t individuals. Assuming meetings involve uniformly-selected individuals, over the interval (t, t + ∆t), there are s(t)2 nβm ∆t meetings between a susceptible and another susceptible individual; these meetings, as well as meetings between infected individuals result in no epidemic propagation. However, there will also be s(t)x(t)nβm ∆t + x(t)s(t)nβm ∆t meetings between a susceptible and an infected individual. We assume a fraction βi ∈ [0, 1], called transmission rate, of these meetings results in the successful transmission of the infection:   βi s(t)x(t)nβm ∆t + x(t)s(t)nβm ∆t = 2βi βm x(t)s(t)n∆t. In summary, the fraction of infected individuals satisfies
x(t + ∆t) = x(t) + 2βi βm x(t)s(t)∆t, and the SI model (16.1) is the limit at ∆t → 0+ , where the infection parameter β is twice the product of meeting rate βm and infection transmission fraction βi . Lemma 16.2 (Dynamical behavior of the SI model). Consider the SI model (16.1). The solution from initial condition x(0) = x0 ∈ [0, 1] is x(t) =
x0 eβt . 1 − x0 + x0 eβt
(16.2)
From all positive initial conditions 0 < x0 < 1, the solution x(t) is monotonically increasing and converges to the unique equilibrium 1 as t → ∞. It is easy to see that the SI model (16.1) results in an evolution akin to a logistic curve; see Figure 16.3. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 16.2. The SIR model
193
1.0
x(t)
0.8
0.6
0.4
0.2
t 5
10
15
20
Figure 16.3: Evolution of the (lumped deterministic) SI model (β = 1) from small initial fraction of infected individuals.
16.2
The SIR model
Next, we study a model in which individuals recover from the infection and are not susceptible to the epidemics after one round of infection. In other words, we assume the population is divided into three distinct groups: s(t) denotes the fraction of susceptible individuals, x(t) denotes the fraction of infected individuals, and r(t) denotes the fraction of recovered individuals. Clearly, s(t) + x(t) + r(t) = 1. We model the recovery process via a constant recovery rate γ and write our (susceptible–infected–recovered) SIR model as s(t) ˙ = −βs(t)x(t),
x(t) ˙ = βs(t)x(t) − γx(t),
(16.3)
r(t) ˙ = γx(t).
Heuristic modeling assumptions and derivation. One can show that the constant recovery rate assumption corresponds to assuming a so-called Poisson recovery rate for the stochastic version of the SI model. This is arguably not a very realistic assumption. Lemma 16.3 (Dynamical behavior of the SIR model). Consider the SIR model (16.3). From each initial condition with x(0) > 0 and s(0) > 0, the resulting trajectory t 7→ (s(t), x(t), r(t)) has the following properties: (i) if s(0), x(0), r(0) ∈ [0, 1], then s(t), x(t), r(t) ∈ [0, 1] for all t ≥ 0;
(ii) t 7→ s(t) is monotonically decreasing and t 7→ r(t) is monotonically increasing;
(iii) if βs(0)/γ > 1, then t 7→ x(t) first monotonically increases to a maximum value and then decreases to zero as t → ∞; (we describe this case as epidemic outbreak, that is, an exponential growth of t 7→ x(t) for small times); (iv) if βs(0)/γ < 1, then t 7→ x(t) monotonically and exponentially decreases to zero as t → ∞; (v) limt→∞ (s(t), x(t), r(t)) = (s∞ , 0, r∞ ), where r∞ is the unique solution to the equality  β  1 − r∞ = s(0) exp − (r∞ − r(0)) . γ Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
(16.4)
 194
Chapter 16. Virus Propagation: Basic Models
1.0 1.0
0.8
r(t)
s(0)e
0.8
s(t)
( / )r1
, / = 1/4
0.6
0.6
1 0.4
0.4
x(t) s(0)e
0.2
0.2
r1
( / )r1
, / =4
0.4
0.6
t 5
10
15
20
0.2
0.0
0.8
1.0
r1
Figure 16.4: Left figure: evolution of the (lumped deterministic) SIR model from small initial fraction of infected individuals (and zero recovered); parameters β = 2, γ = 1/4 (case (iv) in Lemma 16.3). Right figure: intersection between the two curves in equation (16.4) with s(0) = 0.95, r(0) = 0 and β/γ ∈ {1/4, 4}.
Proof. We first prove statement (i). For any fixed point in time t ≥ 0 and for any values of s(t), x(t), r(t) ∈ [0, 1], we have that r(t) is strictly increasing for x(t) ∈ [0, 1] and r(t) ˙ = 0 for x(t) = 0 or x(t) = 1 (due to the conserved quantity 1 = s(t) + x(t) + r(t)). Hence, irrespective of s(t), x(t) ∈ [0, 1], it follows that r(t) ∈ [0, 1] for all t ≥ 0. We next eliminate s(t) = 1 − x(t) − r(t) and rewrite the SIR model (16.3) as x(t) ˙ = β(1 − x(t) − r(t))x(t) − γx(t), r(t) ˙ = γx(t).
(16.5a) (16.5b)
For any fixed point in time t ≥ 0 and for any values of x(t), r(t) ∈ [0, 1], we investigate the right-hand side of (16.5a). The vector field (16.5a) has two roots x∗1 = 0 and x∗2 = x∗2 (r(t)). For γ sufficiently large we have that x∗2 ≤ x∗1 = 0. In this case, x(t) ˙ < 0 for x(t) ∈ ]0, 1]. Otherwise, x∗2 ∈ [0, 1], and a further inspection of the vector field (16.5a) shows that x(t) ˙ > 0 for x(t) ∈ ]0, x∗2 [ and x(t) ˙ < 0 for x(t) ∈ ]x∗2 , 1]. Since the vector field (16.5a) at the boundaries is not pointing outside the interval [0, 1], it follows that x(t) necessarily remains in [0, 1]. This proves statement (i). Statement (ii) is an immediate consequence of s(t) ˙ = −βs(t)x(t) ≤ 0 and r(t) ˙ = γx(t) ≥ 0. We leave the proof of statement (iii) and (iv) to the reader. We next focus on statement (v). For the SIR model (16.3) the signals s(t) and −r(t) are monotonically non-increasing. Because they are also lower bounded, their two limits exist: limt→∞ s(t) = s∞ and limt→∞ r(t) = r∞ . Moreover, the equality s(t) + x(t) + r(t) = 1 implies that the third limit also must exist, that is, limt→∞ x(t) = x∞ . We now claim that x∞ = 0. By contradiction, assume x∞ > 0. But, then, also limt→∞ r(t) ˙ = γx∞ > 0 and this contradicts the fact that r(t) ≤ 1. Next, consider the SIR model (16.3). If s(0) = 0, then clearly r∞ = 1. If instead s(0) > 0, then s(t) remains strictly positive for sufficiently small time t. Given t sufficiently small, we note a useful equality and integrate it from 0 to t: s(t) ˙ β = −βx(t) = − r(t) ˙ s(t) γ
=⇒ =⇒
s(t) β = − (r(t) − r(0)) s(0) γ β  s(t) = s(0) exp (r(t) − r(0)) . γ ln
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 16.3. The SIS model
195
The last equality implies that s(t) is strictly positive for all t. Equation (16.4) follows by taking the limit as t → ∞ and noting that for all time 1 = s(t) + x(t) + r(t); in particular, 1 = s∞ + r∞ . The uniqueness of the solution r∞ to equation (16.4) follows from showing there exists a unique intersection between left and right hand side, as illustrated in Figure 16.4. 
16.3
The SIS model
As third and final lumped deterministic model, we study the setting in which individuals recover from the infection, but are susceptible to being re-infected. As in the SI model, the population is divided into two fractions with with s(t) + x(t) = 1. We model infection, recovery and possible re-infection with the SIS model: x˙ = βsx − γx = (β − γ − βx)x, (16.6)
where β is the infection rate and γ is the recovery rate. Note that the first term is the same infection term as in the SI model and the second term is the same recovery term as in the SIR model. A simple qualitative analysis of this equation can be performed by plotting x˙ over x for β < γ, β = γ, and β > γ; see Figure 16.5. 0.5
0.0
0.2
0.4
0.6
0.8
1.0
0.4
-0.5
0.2
-1.0
0.0
-1.5
-0.2
-2.0
-0.4
0.2
0.4
0.6
0.8
1.0
Figure 16.5: Phase portrait of the (lumped deterministic) SIS model for β = 1 < γ = 3/2 and for β = 1 > γ = 1/2.
Lemma 16.4 (Dynamical behavior of the SIS model). For the SIS model (16.6): (i) the closed form solution to equation (16.6) from initial condition x(0) = x0 ∈ [0, 1], for β 6= γ, is x(t) =
βx0 −
(β − γ)x0 , − β(1 − x0 ))
e−(β−γ)t (γ
(16.7)
(ii) if β ≤ γ, all trajectories converge to the unique equilibrium x = 0 (i.e., the epidemic disappears), and
(iii) if β > γ, then, from all positive initial conditions x(0) > 0, all trajectories converge to the unique exponentially stable equilibrium x = (β − γ)/β < 1 (epidemic outbreak and steady-state epidemic contagion). We illustrate these results in Figure 16.6.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 196
Chapter 16. Virus Propagation: Basic Models
0.5
x(t)
0.4
0.3
0.2
0.1
t 5
10
15
20
Figure 16.6: Evolution of the (lumped deterministic) SIS model from small initial fraction of infected individuals; β = 1 > γ = .5.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 16.4. Exercises
16.4
197
Exercises
E16.1
Closed-form solutions for SI and SIS models. Verify the correctness of the closed-form solutions for SI and SIS models given in equations (16.2) and (16.7).
E16.2
Dynamical behavior of the SIS model. Prove Lemma 16.4.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 198
Chapter 16. Virus Propagation: Basic Models
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Chapter 17
Virus Propagation in Contact Networks In this chapter we continue our discussion about the diffusion and propagation of infectious diseases. Starting from the basic lumped models discussed in Chapter 16, we now focus on network models as well as we discuss some stochastic modelling aspects. We borrow ideas from the lecture notes by Zampieri (2013). A detailed survey about infectious diseases is (Hethcote 2000). A very early work on epidemic models over networks, the spectral radius of the adjacency matrix and the epidemic threshold is Lajmanovich and Yorke (1976). Later works on similar models include (Wang et al. 2003) and Mieghem (2011); Mieghem et al. (2009). Our stochastic analysis is based on the approach in (Mei and Bullo 2014). Recent extensions and general proofs for the deterministic SIS network model are given by Khanafer et al. (2015). A related book chapter is (Newman 2010, Chapter 17).
17.1
The stochastic network SI model
In this section we consider epidemics models that are richer, more general and complex than the lumped deterministic models consider before. We extend our treatment in two ways: we consider a stochastic model of the propagation phenomenon and we imagine the population is distributed over a network. The stochastic model
The stochastic network SI model, illustrated in Figure 17.1, is defined as follows:
(i) We consider a group of n individuals. The state of each individual is either S for susceptible or I for infected. (ii) The n individuals are in pairwise contact, as specified by an undirected graph G with adjacency matrix A (without self-loops). The edge weights represent the frequency of contact among two individuals. (iii) Each individual in susceptible status can transition to infected as follows: given an infection rate β > 0, if a susceptible individual i is in contact with an infected individual j for time ∆t, the probability of infection is aij β∆t. Each individual can be infected by any neighboring individual: these random events are independent. 199
 200
Chapter 17. Virus Propagation in Contact Networks
Susceptible
(infection rate)
Infected
Figure 17.1: In the stochastic network SI model, each susceptible individual (blue) becomes infected by contact with infected individuals (red) in its neighborhood according to an infection rate β.
An approximate deterministic model We define the infection variable at time t for individual i by ( 1, if node i is in state I at time t, Yi (t) = 0, if node i is in state S at time t, and the expected infection, which turns out to be equal to the probability of infection, of individual i by xi (t) = E[Yi (t) = 1] · P[Yi (t) = 1] + 0 · P[Yi (t) = 0] = P[Yi (t) = 1].
In what follows it will be useful to approximate P[Yi (t) = 0 | Yj (t) = 1] with P[Yi (t) = 0], that is, to require Yi and Yj to be independent for arbitrary i and j. We claim this approximation is acceptable over certain graphs with large numbers n of individuals. The final model, which we obtain below based on the Independence Approximation, is an upper bound on the true model because P[Yi (t) = 0] ≥ P[Yi (t) = 0 | Yj (t) = 1]. Definition 17.1 (Independence Approximation). For any two individuals i and j, the infection variables Yi and Yj are independent. Theorem 17.2 (From the stochastic to the deterministic network SI model). Consider the stochastic network SI model with infection rate β over a contact graph with adjacency matrix A. The probabilities of infection satisfy n
X d P[Yi (t) = 1] = β aij P[Yi (t) = 0, Yj (t) = 1]. dt j=1
Moreover, under the Independence Approximation 17.1, the probabilities of infection xi (t) = P[Yi (t) = 1], i ∈ {1, . . . , n}, satisfy (deterministic) network SI model defined by x˙ i (t) = β(1 − xi (t))
n X
aij xj (t).
j=1
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 17.1. The stochastic network SI model
201
We study the deterministic network SI model in the next section. Proof. In what follows, we define the random variables Y−i (t) = (Y1 (t), . . . , Yi−1 (t), Yi+1 (t), . . . , Yn (t)), and, similarly, Y−i−j (t), for i, j ∈ {1, . . . , n}. We are interested in the events that a susceptible individual remains susceptible or becomes infected over the interval of time [t, t + ∆t], for small ∆t. We start by computing the probability of non-infection for time ∆t, conditioned upon Y−i (t): P[Yi (t + ∆t) = 0 | Yi (t) = 0, Y−i (t)] =
n Y
j=1
n X  1 − aij Yj (t)β∆t = 1 − aij Yj (t)β∆t + O(∆t2 ), j=1
where O(∆t2 ) is a function upper bounded by a constant times ∆t2 . The complementary probability, i.e., the probability of infection for time ∆t is: P[Yi (t + ∆t) = 1 | Yi (t) = 0, Y−i (t)] =
n X
aij Yj (t)β∆t + O(∆t2 ).
j=1
We are now ready to study the random variable Yi (t + ∆t) − Yi (t), given Y−i (t): E[Yi (t+∆t) − Yi (t) | Y−i (t)]
= 1 · P[Yi (t + ∆t) = 1, Yi (t) = 0 | Y−i (t)]   + 0 · P (Yi (t + ∆t) = Yi (t) = 0) or (Yi (t + ∆t) = Yi (t) = 1) | Y−i (t) (by def. expectation)
(by conditional prob.)
= P[Yi (t + ∆t) = 1 | Yi (t) = 0, Y−i (t)] · P[Yi (t) = 0 | Y−i (t)] n X  = aij Yj (t)β∆t + O(∆t2 ) · P[Yi (t) = 0 | Y−i (t)]. j=1
We now remove the conditioning upon Y−i (t) and study:   E[Yi (t + ∆t) − Yi (t)] = E E[Yi (t + ∆t) − Yi (t) | Y−i (t)] n X    = aij β∆t · E Yj (t) · P[Yi (t) = 0 | Y−i (t)] + O(∆t2 ), j=1
and therefore we compute (where y is an arbitrary realization of the random variable Y ):   E Yj (t) · P[Yi (t) = 0 | Y−i (t)] X = yj · P[Yi (t) = 0 | Y−i (t) = y−i ] · P[Y−i (t) = y−i ] (by def. expectation) y−i X = 1 · P[Yi (t) = 0 | Y−i−j (t) = y−i−j , Yj (t) = 1] y−i−j
=
X
× P[Y−i−j (t) = y−i−j , Yj (t) = 1]
P[Yi (t) = 0, Y−i−j (t) = y−i−j , Yj (t) = 1]
(because yj ∈ {0, 1})
(by conditional prob.)
y−i−j
= P[Yi (t) = 0, Yj (t) = 1], Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 202
Chapter 17. Virus Propagation in Contact Networks
where, for example, the first summation is taken over all possible values y−i that the variable Y−i (t) takes. In summary, we know E[Yi (t + ∆t) − Yi (t)] =
n X j=1
aij β∆t · P[Yi (t) = 0, Yj (t) = 1] + O(∆t2 ),
so that, also recalling P[Yi (t) = 1] = E[Yi (t)], n
X d E[Yi (t + ∆t) − Yi (t)] P[Yi (t) = 1] = lim =β aij P[Yi (t) = 0, Yj (t) = 1]. dt ∆t ∆t→0+ j=1
The final step is an immediate consequence of the Independence Approximation: P[Yi (t) = 0, Yj (t) = 1] = P[Yi (t) = 0 | Yj (t) = 1] · P[Yj (t) = 1] ≈ (1 − P[Yi (t) = 1]) · P[Yj (t) = 1]. 
17.2
The network SI model
In this and the following sections we consider deterministic network models for the propagation of epidemics. Two interpretations of the provided models are possible: if node i is a population of individuals at location i, then xi can be interpreted as the infected fraction of that population. If node i is a single individual, then xi can be interpreted as the probability that the individual is infected: xi (t) = P[individual i is infected at time t]. Susceptible
(infection rate)
Infected
Figure 17.2: In the (deterministic) network SI model, each node is described by a probability of infection taking value between 0 (blue) and 1 (red). The rate at which individuals become increasingly infected is parametrized by the infection rate β.
Consider an undirected weighted graph G = (V, E) of order n with adjacency matrix A and degree matrix D = diag(A1n ). Let xi (t) ∈ [0, 1] denote the fraction of infected individuals at node i ∈ V at time t ∈ R≥0 . The network SI model is x˙ i (t) = β(1 − xi (t)) or, in equivalent vector form,
n X
aij xj (t),
j=1
 x˙ = β In − diag(x) Ax.
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
(17.1)
 17.2. The network SI model
203
Alternatively, in terms the fractions of susceptibile individuals s = 1n − x, the network SI model reads s˙ = −β diag(s)A(1n − s).
(17.2)
Theorem 17.3 (Dynamical behavior of the network SI model). Consider the network SI model (17.1). Assume G is connected so that A is irreducible; let D denote the degree matrix. The following statements hold: (i) if x(0), s(0) ∈ [0, 1]n , then x(t), s(t) ∈ [0, 1]n for all t ≥ 0;
(ii) there are two equilibrium points: 0n (no epidemics), and 1n (full contagion); (iii) the linearization of model (17.1) about the equilibrium point 0n is x˙ = βAx and it is exponentially unstable; (iv) the linearization of model (17.2) about the equilibrium 0n is s˙ = −βDs and it is exponentially stable;
(v) each trajectory with initial condition x(0) 6= 0n converges asymptotically to 1n , that is, the epidemics spreads to the entire network.
Proof. Statement (i) can be proved by evaluating the vector field (17.1) at the boundaries of the admissible state space that is for x ∈ [0, 1]n such that at least one entry i satisfies xi ∈ {0, 1}. We leave the detailed proof of statement (i) to the reader. We now prove statement (ii). The point x is an equilibrium point if and only if:  In − diag(x) Ax = 0n
⇐⇒
Ax = diag(x)Ax.
Clearly, 0n and 1n are equilibrium points. Hence we just need to show that no other points can be equilibria. First, suppose that there exists an equilibrium point x with 0n ≤ x < 1n . But then In −diag(x) has strictly positive diagonal and therefore x must satisfy Ax = 0n . Note that Ax = 0n implies also Pn−1 k Pn−1 k k=1 A x = 0n . Recall from Proposition 4.3 that, if A is irreducible, then k=1 A has all off-diagonal terms strictly positive. Because xi ∈ [0, 1[, the only possible solution to Ax = 0n is therefore x = 0n . This is a contradiction. Next, suppose there exists an equilibrium point x = (x1 , x2 ) with 0n1 ≤ x1 < 1n1 , x2 = 1n2 , and n1 + n2 = n. The equality Ax = diag(x)Ax implies Ax = diag(x)k Ax for all k ∈ N and, in turn, Ax = lim diag(x)k Ax = k→∞
 0n1 ×n1 0n2 ×n1
 0n1 ×n2 Ax. In2
By partitioning A in corresponding blocks, the previous equality implies A11 x1 + A12 x2 = 0n1 . Because x2 = 1n2 we know that A12 = 0n1 ×n2 and, therefore, that A is reducible. This contradiction concludes the proof of statement (ii). Statements (iii) and (iv) are straightforward computations:  x˙ = In − diag(x) Ax = Ax − diag(x)Ax ≈ Ax, β s˙ = − diag(s)A(1n − s) = − diag(s)A1n + diag(s)As = −Ds + diag(s)As ≈ −Ds, β Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 204
Chapter 17. Virus Propagation in Contact Networks
where we used the equality diag(y)z = diag(z)y for y, z ∈ Rn . Exponential stability of the linearization s˙ = −βDs is obvious, and the Perron–Frobenius Theorem 2.15 for irreducible matrices implies the existence of the unstable positive eigenvalue ρ(A) > 0 for the linearization x˙ = Ax. To show statement (v), consider the function V (x) = 1> n (1n − x); this is a smooth function defined  over the compact and forward invariant set [0, 1]n (see (i)). We compute V˙ = −β1> n In − diag(x) Ax and note that V˙ ≤ 0 for all x, and V˙ (x) = 0 if and only if x ∈ {0n , 1n }. Because of these facts, the LaSalle Invariance Principle in Theorem 13.3 implies all trajectories with x(0) converge asymptotically to either 1n or 0n . Additionally, note that 0 ≤ V (x) ≤ n for all x ∈ [0, 1]n , that V (x) = 0 if and only if x = 1n and that V (x) = n if and only if x = 0n . Therefore, all trajectories with x(0) 6= 0n converge asymptotically to either 1n .  Before proceeding, we review the notion of dominant eigenvector and introduce some notation. Let λmax = ρ(A) be the dominant eigenvalue of the adjacency matrix A and let vmax be the corresponding positive eigenvector normalized to satisfy 1> n vmax = 1. (Recall that these definitions are well posed because of the Perron–Frobenius Theorem 2.15 for irreducible matrices.) Additionally, let vmax , v2 . . . , vn denote an orthonormal set of eigenvectors with corresponding eigenvalues λmax > λ2 ≥ · · · ≥ λn for the symmetric adjacency matrix A. Consider now the onset of an epidemics in a large population characterized by a small initial infection x(0) = x0  1n . So long as x(t)  1n , the system evolution is approximated by x˙ = βAx. This “initial-times” linear evolution satisfies n X  βλmax t  > x(t) = vmax x0 e vmax + vi> x0 eβλi t vi βλmax t
=e
> vmax x0
i=2
 vmax + o(t) ,
(17.3)
where o(t) is a function exponentially vanishing as t → ∞. In other words, the epidemics initially experiences exponential growth with rate βλmax and with distribution among the nodes given by the eigenvector vmax .
17.3
The network SIS model
As previously, consider an undirected weighted graph G = (V, E) of order n with adjacency matrix A. Let xi (t) ∈ [0, 1] denote the fraction of infected individuals at node i ∈ V at time t ∈ R≥0 . Given an infection rate β and a recovery rate γ, the network SIS model is x˙ i (t) = β(1 − xi (t)) or, in equivalent vector form
n X j=1
aij xj (t) − γxi (t),
 x˙ = β In − diag(x) Ax − γx.
(17.4)
(17.5)
We start our analysis with useful preliminary notions. We define the monotonically-increasing functions f+ (y) = y/(1 + y), and f− (z) = z/(1 − z) Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 17.3. The network SIS model
205
for y ∈ R≥0 and z ∈ [0, 1]. One can easily verify that f+ (f− (z)) = z for all z ∈ [0, 1]. For vector variables y ∈ Rn≥0 and z ∈ [0, 1]n , we write F+ (y) = (f+ (y1 ), . . . , f+ (yn )), and F− (z) = (f− (z1 ), . . . , f− (zn )). Denoting Aˆ = βA/γ and assuming x < 1n , the model (17.5) is rewritten as:  ˆ − F− (x) , x˙ = F (x) = γ diag(1n − x) Ax
so that
F (x) ≥ 0
⇐⇒
ˆ ≥ F− (x) Ax
⇐⇒
ˆ ≥ x. F+ (Ax)
ˆ ∗ = F− (x∗ ) or, equivalently, if and Moreover, x∗ is an equilibrium point (F (x∗ ) = 0) if and only if Ax ˆ ∗ ) = x∗ . We are now ready to present our results in two theorems. only if F+ (Ax Theorem 17.4 (Dynamical behavior of the network SIS model: below the threshold). Consider the network SIS model (17.4) over an undirected graph G with infection rate β and a recovery rate γ. Assume G is connected, let A be its adjacency matrix with dominant eigenvalue λmax . If βλmax /γ < 1, then (i) there exists a unique equilibrium point 0n , (ii) the linearization of model (17.4) about the equilibrium 0n is x˙ = (βA − γIn )x and it is exponentially stable; and > x(t) is monotonically and exponentialy (iii) from any initial condition x(0) 6= 0n , the weighted average t 7→ vmax decreasing, so that all trajectories converge to 0n .
ˆ ≤ Ax ˆ because f+ (z) ≤ z. Compute Proof. Regarding statement (i), for x ∈ [0, 1]n \ {0n }, note F+ (Ax) ˆ 2 ≤ kAxk ˆ 2 ≤ kAk ˆ 2 · kxk2 < kxk2 , kF+ (Ax)k
ˆ 2 = ρ(A), ˆ because A is symmetric, and from ρ(A) ˆ = where the last inequality follows from kAk ˆ = x. βλmax /γ < 1. Therefore, no x 6= 0n can satisfy F+ (Ax) Regarding statement (ii), the linearization of equation (17.5) is verified by dropping the second-order terms. The eigenvalues of βA − γIn are βλi − γ, where λ1 = λmax > λ2 ≥ · · · ≥ λn are the eigenvalues of A. The linearized system is exponentially stable at 0n for βλmax − γ < 0.  > x(t), note I − diag(z) v Finally, regarding statement (iii), define y(t) = vmax n max ≤ vmax for any z ∈ [0, 1]n , and compute  > > y(t) ˙ = βvmax In − diag(x(t)) Ax(t) − γvmax x(t) > > ≤ βvmax Ax(t) − γvmax x(t)
≤ (βλmax − γ)y(t).
By the Gronwall’s Lemma, this inequality implies that t 7→ y(t) is monotonically decreasing and satisfies t 7→ y(t) ≤ y(0) e(βλmax −γ)t from all initial conditions y(0). This concludes our proof of statement (iii) since vmax > 0.  Theorem 17.5 (Dynamical behavior of the network SIS model: above the threshold). Consider the network SIS model (17.4) over an undirected graph G with infection rate β and a recovery rate γ. Assume G is connected, let A be its adjacency matrix with dominant eigenpair (λmax , vmax ) and with degree vector d = A1n . Define the shorthand: δ := βλmax /γ − 1 and d−1 = (1/d1 , . . . , 1/dn )> . If βλmax /γ > 1, then Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 206
Chapter 17. Virus Propagation in Contact Networks
(i) 0n is an equilibrium point, the linearization of system (17.5) at 0n is unstable with dominant unstable eigenvalue βλmax − γ and with dominant eigenvector vmax , i.e., there will be an epidemic outbreak;
(ii) besides the equilibrium 0n , there exists a unique other equilibrium point x∗ such that (a) (b) (c) (d)
x∗ x∗ x∗ x∗
> 0, = δvmax + O(δ 2 ), as δ → 0+ , = 1n − (γ/β)d−1 + O(γ 2 /β 2 ), at fixed A as β/γ → ∞, = limk→∞ y(k), where the monotonically-increasing {y(k)}k∈Z≥0 ⊂ [0, 1]n is defined by yi (k + 1) := f+
n β X
γ
j=1
 aij yj (k) ,
y(0) :=
δ vmax , (1 + δ)2
(iii) if x(0) 6= 0n , then x(t) → x∗ as t → ∞. Moreover, if x(0) < x∗ (resp. x(0) > x∗ ), then t 7→ x(t) is monotonically increasing (resp. decreasing). Note: statement (i) means that, near the onset of an epidemic outbreak, the exponential growth rate is βλmax − γ and the outbreak tends to align with the dominant eigenvector vmax — as in the discussion leading up to the approximate evolution (17.3). Proof of selected statements in Theorem 17.5. Statement (i) follows from the same analysis of the linearized system as in the proof of Theorem 17.4(ii). ˆ We next focus on the statements (ii). We begin by establishing two properties of the map x 7→ F+ (Ax). ˆ ˆ First, we claim that, y > z ≥ 0n implies F+ (Ay) > F+ (Az). Indeed, note that G being connected implies that the adjacency matrix A has at least one strictly positive entry in each row. Hence, y − z > 0n ˆ > F+ (Az). ˆ ˆ − z) > 0n and, since f+ is monotonically increasing, Ay ˆ > Az ˆ implies F+ (Ay) implies A(y n ˆ ˆ Second, we claim that there exists an x ∈ [0, 1] satisfying F+ (Ax) > x. Indeed, let λmax = ˆ = βλmax (A)/γ > 1 and compute for any η > 0 λmax (A)    ˆ max vmax,i ) > ηvmax,i λ ˆ max − η λ ˆ 2 vmax,i , ˆ max )) = f+ (η λ F+ (A(ηv max i
ˆ max − 1)/λ ˆ 2 and recalling where we used the inequality y/(1 + y) > y(1 − y), for y > 0. For η = (λ max vmax,i < 1 for each i, compute ˆ max − η λ ˆ 2 vmax,i = λ ˆ max − (λ ˆ max − 1)vmax,i > λ ˆ max − (λ ˆ max − 1) = 1. λ max ˆ max − 1)/λ ˆ 2 . Simple calculations show ˆ max ) > ηvmax , for η = (λ This concludes our proof that F+ (Aηv max that η = δ/(1 + δ)2 so that ηvmax = y(0). These two properties allow us to analyze the iteration defined in the theorem statement. We just ˆ ˆ proved that y(2) = F+ (Ay(1)) > y(1) = (δ/(1 + δ)2 )vmax . This inequality implies F+ (Ay(2)) > ˆ ˆ ˆ F+ (Ay(1)) and, by induction, F+ (Ay(k + 1)) > y(k + 1) = F+ (Ay(k)). Each sequence {yi (k)}k∈N , i ∈ {1, . . . , n}, is monotonically increasing and upper bounded by 1. Hence, the sequence {y(k)}k∈N ˆ ∗ ) = x∗ . This proves the existence of an converge and it converges to a point x∗ > 0 such that F+ (Ax ∗ equilibrium point x = limk→∞ y(k) > 0, as claimed in statements (ii)d and (ii)a. Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 17.4. The network SIR model
207
Regarding the statement (ii)b, we claim there exists a bounded sequence {w(k, δ)}k∈Z≥0 ⊂ Rn such that the sequence {y(k)}k∈Z≥0 satisfies y(k) = δvmax + δ 2 w(k, δ). The statement x∗ = δvmax + O(δ 2 ) is then an immediate consequence of this claim and of the limit limk→∞ y(k) = x∗ . We prove the claim by induction. Because δ/(1+δ)2 = δ−2δ 2 +O(δ 3 ), the claim is true for k = 0 with w(0, δ) = −2vmax +O(δ). We now assume the claim is true at k and show it true at k + 1:  ˆ max + δ 2 w(k, δ)) yi (k + 1) = F+ A(δv  ˆ = F+ δ(1 + δ)vmax + δ 2 Aw(k, δ)  ˆ = F+ δvmax + δ 2 (Aw(k, δ) + vmax ) ˆ = δvmax + δ 2 (Aw(k, δ) + vmax )    ˆ ˆ − diag δvmax + δ 2 (Aw(k, δ) + vmax ) δvmax + δ 2 (Aw(k, δ) + vmax ) + O(δ 3 )  ˆ = δvmax + δ 2 Aw(k, δ) + vmax − diag(vmax )vmax + O(δ) , where we used the Taylor expansion F+ (y) = y − diag(y)y + O(kyk3 ). Hence, the claim is true if the sequence {w(k, δ)}k∈Z≥0 defined by ˆ w(k + 1, δ) = Aw(k, δ) + vmax − diag(vmax )vmax + O(δ) is bounded. But the sequence is bounded because the spectral radius of Aˆ equals βλmax /γ < 1. This concludes the proof of statement (ii)b. The proof of statement (ii)c is analogous: it suffices to show the existence of a bounded sequence {w(k)} such that y(k) = 1n − (γ/β)d−1 + (γ/β)2 w(k). To complete the proof of statement (ii) we establish the uniqueness of the equilibrium x∗ ∈ [0, 1]n \ {0n }. First, we claim that an equilibrium point with an entry equal to 0 must be 0n . Indeed, y∗ Passume n ∗ ∗ is an equilibrium point and assume yi = 0 for some i ∈ {1, . . . , n}. The equality yi = f+ ( j=1 aij yj∗ ) implies that also any node j with aij > 0 must satisfy yj∗ = 0. Because G is connected, all entries of y ∗ must be zero. Second, by contradiction, we assume there exists another equilibrium point y ∗ > 0 distinct from x∗ . Without loss of generality, assume there exists i such that yi∗ < x∗i . Let α ∈ (0, 1) satisfy y ∗ ≥ αx∗ > 0 and yi∗ = αx∗i . Note:   ˆ ∗ ) − y ∗ = f+ (Ay ˆ ∗ )i − αx∗i F+ (Ay i  ˆ ∗ )i − αx∗i ≥ f+ α(Ax (because Aˆ ≥ 0)  ˆ ∗ )i − αx∗i > αf+ (Ax (because f+ (αy) > αf+ (y) for α < 1)  ∗ ∗ ˆ ) − x = 0. = α F+ (Ax (because x∗ is an equilibrium) i
 ∗
ˆ ∗) − y Therefore F+ (Ay > 0 and this is a contradiction. i Regarding statement (iii) we refer to (Fall et al. 2007; Khanafer et al. 2015; Lajmanovich and Yorke 1976) in the interest of brevity. 
17.4
The network SIR model
As previously, consider an undirected weighted graph G = (V, E) of order n with adjacency matrix A. Let si (t), xi (t), ri (t) ∈ [0, 1] denote the fractions of susceptibile, infected and recovered individuals at Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 208
Chapter 17. Virus Propagation in Contact Networks
node i ∈ V at time t ∈ R≥0 . The network SIR model is Xn aij xj (t), s˙ i (t) = −βsi (t) j=1 Xn aij xj (t) − γxi (t), x˙ i (t) = βsi (t) j=1
r˙i (t) = γxi (t),
where β > 0 is the infection rate and γ > 0 is the recovery rate. Note that the third equation is redundant because of the constraint si (t) + xi (t) + ri (t) = 1 and that, therefore, we regard the dynamical system as described by the first two equations and write it in vector form as s˙ = −β diag(s)Ax,
x˙ = β diag(s)Ax − γx.
(17.6)
Theorem 17.6 (Dynamical behavior of the network SIR model). Consider the network SIR model (17.6) over an undirected graph G with infection rate β and a recovery rate γ. Assume G is connected and let A be its adjacency matrix. Let (λmax,0 , vmax,0 ) be the dominant eigenpair for the nonnegative matrix A diag(s(0)). The following statements hold: (i) t 7→ s(t) is monotonically decreasing and t 7→ r(t) is monotonically increasing;
(ii) the set of equilibrium points is the set of pairs (s∗ , 0n ), for any s∗ ∈ [0, 1]n ,
> (iii) if βλmax,0 /γ < 1 and x(0) 6= 0n , then the weighted average t 7→ vmax,0 x(t) monotonically and exponentialy decreases to zero and each trajectory x(t) → 0n as t → ∞, > (iv) if βλmax,0 /γ > 1 and x(0) 6= 0n , then, for small time, the weighted average t 7→ vmax,0 x(t) grows exponentially fast with rate βλmax,0 − γ, i.e., an epidemic outbreak will develop,
(v) each trajectory with initial condition (s(0), x(0)) with x(0) 6= 0n converges asymptotically to an equilibrium point, that is, the epidemics finally disappears.
Proof of selected statements in Theorem 17.6. Regarding statement (ii), a point (s∗ , x∗ ) is an equilibrium if 0n = −β diag(s∗ )Ax∗ ,
0n = β diag(s∗ )Ax∗ − γx∗ . It easy to see that each point of the form (s∗ , 0n ) is an equilibrium. On the other hand, summing the last two equality we obtain 0n = −γx∗ , hence x∗ must be the zero vector. Regarding statement (iii), note that s˙ i ≤ 0 for each i implies s(t) ≤ s(0) and, in turn, diag(s(t))vmax ≤ > diag(s(0))vmax for any s(0) ∈ [0, 1]n . As previously, define y(t) = vmax,0 x(t) and compute > > y(t) ˙ = βvmax,0 diag(s(t))Ax(t) − γvmax,0 x(t)
> > ≤ βvmax,0 diag(s(0))Ax(t) − γvmax,0 x(t)
≤ (βλmax,0 − γ)y(t),
where we used the equality A diag(s(0))vmax,0 = λmax,0 vmax,0 . Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 17.5. Exercises
17.5 E17.1
209
Exercises Network SI model in digraphs. Generalize Theorem 17.3 to the setting of strongly-connected directed graphs: (i) what are the equilibrium points? (ii) what are their convergence properties?
E17.2
Initial evolution of network SIS model. Consider the network SIS model with initial fraction x(0) = εx0 , where we take x0  1n and ε  1. Show that in the time scale t(ε) = ln(1/ε)/(βλmax ), the linearized evolution satisfies   > lim+ x t(ε) = vmax x0 vmax . ε→0
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 210
Chapter 17. Virus Propagation in Contact Networks
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Bibliography J. A. Acebrón, L. L. Bonilla, C. J. P. Vicente, F. Ritort, and R. Spigler. The Kuramoto model: A simple paradigm for synchronization phenomena. Reviews of Modern Physics, 77(1):137–185, 2005. 171 D. Acemoglu and A. Ozdaglar. Opinion dynamics and learning in social networks. Dynamic Games and Applications, 1(1):3–49, 2011. 144 R. P. Agaev and P. Y. Chebotarev. The matrix of maximum out forests and its applications. Automation and Remote Control, 61(9):1424–1450, 2000. 74 B. D. B. Anderson, C. Yu, B. Fidan, and J. M. Hendrickx. Rigid graph control architectures for autonomous formations. IEEE Control Systems Magazine, 28(6):48–63, 2008. 151 A. Arenas, A. Díaz-Guilera, J. Kurths, Y. Moreno, and C. Zhou. Synchronization in complex networks. Physics Reports, 469(3):93–153, 2008. 171 L. Asimow and B. Roth. The rigidity of graphs, II. Journal of Mathematical Analysis and Applications, 68 (1):171–190, 1979. 164 H. Bai, M. Arcak, and J. Wen. Cooperative Control Design, volume 89. Springer, 2011. 97 P. Barooah. Estimation and Control with Relative Measurements: Algorithms and Scaling Laws. PhD thesis, University of California at Santa Barbara, July 2007. 95 P. Barooah and J. P. Hespanha. Estimation from relative measurements: Algorithms and scaling laws. IEEE Control Systems Magazine, 27(4):57–74, 2007. 95 M. Benzi, G. H. Golub, and J. Liesen. Numerical solution of saddle point problems. Acta Numerica, 14: 1–137, 2005. 91 A. R. Bergen and D. J. Hill. A structure preserving model for power system stability analysis. IEEE Transactions on Power Apparatus and Systems, 100(1):25–35, 1981. 174 A. Berman and R. J. Plemmons. Nonnegative Matrices in the Mathematical Sciences. SIAM, 1994. 115 D. S. Bernstein. Matrix Mathematics. Princeton University Press, 2 edition, 2009. 91 N. Biggs. Algebraic Graph Theory. Cambridge University Press, 2 edition, 1994. ISBN 0521458978. 45, 69, 95, 101 211
 212
Bibliography
V. D. Blondel and A. Olshevsky. How to decide consensus? a combinatorial necessary and sufficient condition and a proof that consensus is decidable but np-hard. SIAM Journal on Control and Optimization, 52(5):2707–2726, 2014. 132 B. Bollobás. Modern Graph Theory. Springer, 1998. ISBN 0387984887. 35 S. Bolognani, S. Del Favero, L. Schenato, and D. Varagnolo. Consensus-based distributed sensor calibration and least-square parameter identification in WSNs. International Journal of Robust and Nonlinear Control, 20(2):176–193, 2010. 95, 100 P. Bonacich. Factoring and weighting approaches to status scores and clique identification. Journal of Mathematical Sociology, 2(1):113–120, 1972. 61 S. P. Borgatti and M. G. Everett. A graph-theoretic perspective on centrality. Social Networks, 28(4): 466–484, 2006. 63 S. Boyd, P. Diaconis, and L. Xiao. Fastest mixing Markov chain on a graph. SIAM Review, 46(4): 667–689, 2004. 53 S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Randomized gossip algorithms. IEEE Transactions on Information Theory, 52(6):2508–2530, 2006. 143 U. Brandes. Centrality: concepts and methods. Slides, May 2006. The International Workshop/School and Conference on Network Science. 64 U. Brandes and T. Erlebach. Network Analysis: Methodological Foundations. Springer, 2005. 60 L. Breiman. Probability, volume 7 of Classics in Applied Mathematics. SIAM, 1992. ISBN 0-89871-296-3. Corrected reprint of the 1968 original. 144 S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 30:107–117, 1998. 63 E. Brown, P. Holmes, and J. Moehlis. Globally coupled oscillator networks. In E. Kaplan, J. E. Marsden, and K. R. Sreenivasan, editors, Perspectives and Problems in Nonlinear Science: A Celebratory Volume in Honor of Larry Sirovich, pages 183–215. Springer, 2003. 171 A. M. Bruckstein, N. Cohen, and A. Efrat. Ants, crickets, and frogs in cyclic pursuit. Technical Report CIS 9105, Center for Intelligent Systems, Technion, Haifa, Israel, July 1991. Available at http://www.cs.technion.ac.il/tech-reports. 7 J. Buck. Synchronous rhythmic flashing of fireflies. II. Quarterly Review of Biology, 63(3):265–289, 1988. 171 F. Bullo, J. Cortés, and S. Martínez. Distributed Control of Robotic Networks. Princeton University Press, 2009. ISBN 978-0-691-14195-4. 3, 48, 129, 156 R. Carli, F. Fagnani, A. Speranzon, and S. Zampieri. Communication constraints in the average consensus problem. Automatica, 44(3):671–684, 2008. 126 Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Bibliography
213
R. Carli, F. Garin, and S. Zampieri. Quadratic indices for the analysis of consensus algorithms. In Information Theory and Applications Workshop, pages 96–104, San Diego, CA, USA, Feb. 2009. 123, 124, 126 H. Caswell. Matrix Population Models. John Wiley & Sons, 2001. 51 N. D. Charkes, P. T. M. Jr, and C. Philips. Studies of skeletal tracer kinetics. I. digital-computer solution of a five-compartment model of [18f ] fluoride kinetics in humans. Journal of Nuclear Medicine, 19 (12):1301–1309, 1978. 108 A. Cherukuri and J. Cortés. Asymptotic stability of saddle points under the saddle-point dynamics. In American Control Conference, Chicago, IL, USA, July 2015. To appear. 89, 91 S. M. Crook, G. B. Ermentrout, M. C. Vanier, and J. M. Bower. The role of axonal delay in the synchronization of networks of coupled cortical oscillators. Journal of Computational Neuroscience, 4 (2):161–172, 1997. 171 H. Daido. Quasientrainment and slow relaxation in a population of oscillators with random and frustrated interactions. Physical Review Letters, 68(7):1073–1076, 1992. 171 P. J. Davis. Circulant Matrices. American Mathematical Society, 2 edition, 1994. ISBN 0828403384. 125 T. A. Davis and Y. Hu. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software, 38(1):1–25, 2011. 41 M. H. DeGroot. Reaching a consensus. Journal of the American Statistical Association, 69(345):118–121, 1974. 4 P. M. DeMarzo, D. Vayanos, and J. Zwiebel. Persuasion bias, social influence, and unidimensional opinions. The Quarterly Journal of Economics, 118(3):909–968, 2003. 53, 57 R. Diestel. Graph Theory, volume 173 of Graduate Texts in Mathematics. Springer, 2 edition, 2000. 35 F. Dörfler and F. Bullo. On the critical coupling for Kuramoto oscillators. SIAM Journal on Applied Dynamical Systems, 10(3):1070–1099, 2011. 171, 187 F. Dörfler and F. Bullo. Exploring synchronization in complex oscillator networks, Sept. 2012. Extended version including proofs. Available at http://arxiv.org/abs/1209.1335. 188, 189 F. Dörfler and F. Bullo. Synchronization in complex networks of phase oscillators: A survey. Automatica, 50(6):1539–1564, 2014. 171 F. Dörfler and B. Francis. Geometric analysis of the formation problem for autonomous robots. IEEE Transactions on Automatic Control, 55(10):2379–2384, 2010. 151 G. Droge, H. Kawashima, and M. Egerstedt. Proportional-integral distributed optimization for networked systems. arXiv preprint arXiv:1309.6613, 2013. 89, 91 C. L. DuBois. UCI Network Data Repository, 2008. URL http://networkdata.ics.uci.edu. 41 Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 214
Bibliography
F. Fagnani and S. Zampieri. Randomized consensus algorithms over large scale networks. IEEE Journal on Selected Areas in Communications, 26(4):634–649, 2008. 143, 145, 146 A. Fall, A. Iggidr, G. Sallet, and J.-J. Tewa. Epidemiological models and Lyapunov functions. Mathematical Modelling of Natural Phenomena, 2(1):62–68, 2007. 207 L. Farina and S. Rinaldi. Positive Linear Systems: Theory and Applications. John Wiley & Sons, 2000. 107 M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 23(2):298–305, 1973. 73 D. M. Foster and J. A. Jacquez. Multiple zeros for eigenvalues and the multiplicity of traps of a linear compartmental system. Mathematical Biosciences, 26(1):89–97, 1975. 74 L. R. Foulds. Graph Theory Applications. Universitext. Springer, 1995. ISBN 0387975993. 95 P. Frasca. Quick convergence proof for gossip consensus. Personal communication, 2012. 143, 145 J. R. P. French. A formal theory of social power. Psychological Review, 63(3):181–194, 1956. 4 N. E. Friedkin and E. C. Johnsen. Social influence networks and opinion change. In E. J. Lawler and M. W. Macy, editors, Advances in Group Processes, volume 16, pages 1–29. JAI Press, 1999. 66 P. A. Fuhrmann and U. Helmke. The Mathematics of Networks of Linear Systems. Springer, 2015. ISBN 3319166468. 3 C. Gao, J. Cortés, and F. Bullo. Notes on averaging over acyclic digraphs and discrete coverage control. Automatica, 44(8):2120–2127, 2008. 78 F. Garin and L. Schenato. A survey on distributed estimation and control applications using linear consensus algorithms. In A. Bemporad, M. Heemels, and M. Johansson, editors, Networked Control Systems, LNCIS, pages 75–107. Springer, 2010. 3, 9, 143 B. Gharesifard and J. Cortes. Distributed continuous-time convex optimization on weight-balanced digraphs. IEEE Transactions on Automatic Control, 59(3):781–786, 2014. 89, 91 A. K. Ghosh, B. Chance, and E. K. Pye. Metabolic coupling and synchronization of NADH oscillations in yeast cell populations. Archives of Biochemistry and Biophysics, 145(1):319–331, 1971. 171 D. F. Gleich. Pagerank beyond the Web. SIAM Review, 57(3):321–363, 2015. 60 C. D. Godsil and G. F. Royle. Algebraic Graph Theory, volume 207 of Graduate Texts in Mathematics. Springer, 2001. ISBN 0387952411. 45, 69, 95 M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx, Oct. 2014. 127 W. H. Haddad, V. Chellaboina, and Q. Hui. Nonnegative and Compartmental Dynamical Systems. Princeton University Press, 2010. 107 Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Bibliography
215
F. Harary. A criterion for unanimity in French’s theory of social power. In D. Cartwright, editor, Studies in Social Power, pages 168–182. University of Michigan, 1959. 4, 24 J. M. Hendrickx. Graphs and Networks for the Analysis of Autonomous Agent Systems. PhD thesis, Université Catholique de Louvain, Belgium, Feb. 2008. 3, 129, 133, 136, 137 J. M. Hendrickx and J. N. Tsitsiklis. Convergence of type-symmetric and cut-balanced consensus seeking systems. IEEE Transactions on Automatic Control, 58(1):214–218, 2013. 139 J. P. Hespanha. Linear Systems Theory. Princeton University Press, 2009. ISBN 0691140219. 88, 137 H. W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42(4):599–653, 2000. 107, 199 F. C. Hoppensteadt and E. M. Izhikevich. Synchronization of laser oscillators, associative memory, and optical neurocomputing. Physical Review E, 62(3):4010–4013, 2000. 171 R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1985. ISBN 0521386322. 15, 16 Y. Hu. Efficient, high-quality force-directed graph drawing. Mathematica Journal, 10(1):37–71, 2005. 42 C. Huygens. Horologium Oscillatorium. Paris, France, 1673. 171 H. Ishii and R. Tempo. The pagerank problem, multiagent consensus, and web aggregation: A systems and control viewpoint. IEEE Control Systems Magazine, 34(3):34–53, 2014. 62, 63 M. O. Jackson. Social and Economic Networks. Princeton University Press, 2010. 53 J. A. Jacquez and C. P. Simon. Qualitative theory of compartmental systems. SIAM Review, 35(1):43–79, 1993. 107 A. Jadbabaie, J. Lin, and A. S. Morse. Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Transactions on Automatic Control, 48(6):988–1001, 2003. 141 G. Jongen, J. Anemüller, D. Bollé, A. C. C. Coolen, and C. Perez-Vicente. Coupled dynamics of fast spins and slow exchange interactions in the XY spin glass. Journal of Physics A: Mathematical and General, 34(19):3957–3984, 2001. 171 L. Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39–43, 1953. 61 H. K. Khalil. Nonlinear Systems. Prentice Hall, 3 edition, 2002. ISBN 0130673897. 154, 156 A. Khanafer, T. Başar, and B. Gharesifard. Stability of epidemic models over directed graphs: A positive systems approach. Automatica, 2015. to appear. 199, 207 G. Kirchhoff. Über die Auflösung der Gleichungen, auf welche man bei der Untersuchung der linearen Verteilung galvanischer Ströme geführt wird. Annalen der Physik und Chemie, 148(12):497–508, 1847. 72 Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 216
Bibliography
I. Z. Kiss, Y. Zhai, and J. L. Hudson. Emerging coherence in a population of chemical oscillators. Science, 296(5573):1676–1678, 2002. 171 M. S. Klamkin and D. J. Newman. Cyclic pursuit or "the three bugs problem". American Mathematical Monthly, 78(6):631–639, 1971. 7 D. J. Klein, P. Lee, K. A. Morgansen, and T. Javidi. Integration of communication and control using discrete time Kuramoto models for multivehicle coordination over broadcast networks. IEEE Journal on Selected Areas in Communications, 26(4):695–705, 2008. 171 G. Kozyreff, A. G. Vladimirov, and P. Mandel. Global coupling with time delay in an array of semiconductor lasers. Physical Review Letters, 85(18):3809–3812, 2000. 171 D. Krackhardt. Cognitive social structures. Social Networks, 9(2):109–134, 1987. 55 L. Krick, M. E. Broucke, and B. Francis. Stabilization of infinitesimally rigid formations of multi-robot networks. International Journal of Control, 82(3):423–439, 2009. 151 J. Kunegis. KONECT: the Koblenz network collection. In International Conference on World Wide Web Companion, pages 1343–1350, 2013. 41 Y. Kuramoto. Self-entrainment of a population of coupled non-linear oscillators. In H. Araki, editor, Int. Symposium on Mathematical Problems in Theoretical Physics, volume 39 of Lecture Notes in Physics, pages 420–422. Springer, 1975. ISBN 978-3-540-07174-7. 171, 176 Y. Kuramoto. Chemical Oscillations, Waves, and Turbulence. Springer, 1984. ISBN 0387133224. 171 A. Lajmanovich and J. A. Yorke. A deterministic model for gonorrhea in a nonhomogeneous population. Mathematical Biosciences, 28(3):221–236, 1976. 199, 207 P. H. Leslie. On the use of matrices in certain population mathematics. Biometrika, 3(3):183–212, 1945. 51 Z. Lin, B. Francis, and M. Maggiore. Necessary and sufficient graphical conditions for formation control of unicycles. IEEE Transactions on Automatic Control, 50(1):121–127, 2005. 43 Z. Lin, B. Francis, and M. Maggiore. State agreement for continuous-time coupled nonlinear systems. SIAM Journal on Control and Optimization, 46(1):288–307, 2007. 139, 186 C. Liu, D. R. Weaver, S. H. Strogatz, and S. M. Reppert. Cellular construction of a circadian clock: period determination in the suprachiasmatic nuclei. Cell, 91(6):855–860, 1997. 171 S. Łojasiewicz. Sur les trajectoires du gradient d’une fonction analytique. Seminari di Geometria 1982-1983, pages 115–117, 1984. Istituto di Geometria, Dipartimento di Matematica, Università di Bologna, Italy. 158 D. G. Luenberger. Introduction to Dynamic Systems: Theory, Models, and Applications. John Wiley & Sons, 1979. 31, 107 Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Bibliography
217
D. G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley, 2 edition, 1984. 81 J. A. Marshall, M. E. Broucke, and B. A. Francis. Formations of vehicles in cyclic pursuit. IEEE Transactions on Automatic Control, 49(11):1963–1974, 2004. 7 A. Mauroy, P. Sacré, and R. J. Sepulchre. Kick synchronization versus diffusive synchronization. In IEEE Conf. on Decision and Control, pages 7171–7183, Maui, HI, USA, Dec. 2012. 171 W. Mei and F. Bullo. Modeling and analysis of competitive propagation with social conversion. In IEEE Conf. on Decision and Control, pages 6203–6208, Los Angeles, CA, USA, Dec. 2014. 199 R. Merris. Laplacian matrices of a graph: A survey. Linear Algebra and its Applications, 197:143–176, 1994. 69 M. Mesbahi and M. Egerstedt. Graph Theoretic Methods in Multiagent Networks. Princeton University Press, 2010. 3, 83, 84, 156 C. D. Meyer. Matrix Analysis and Applied Linear Algebra. SIAM, 2001. ISBN 0898714540. 26, 29 D. C. Michaels, E. P. Matyas, and J. Jalife. Mechanisms of sinoatrial pacemaker synchronization: a new hypothesis. Circulation Research, 61(5):704–714, 1987. 171 P. V. Mieghem. The N -intertwined SIS epidemic network model. Computing, 93(2-4):147–169, 2011. 199 P. V. Mieghem, J. Omic, and R. Kooij. Virus spread in networks. IEEE/ACM Transactions on Networking, 17(1):1–14, 2009. 199 B. Mohar. The Laplacian spectrum of graphs. In Y. Alavi, G. Chartrand, O. R. Oellermann, and A. J. Schwenk, editors, Graph Theory, Combinatorics, and Applications, volume 2, pages 871–898. John Wiley & Sons, 1991. ISBN 0471532452. 69 L. Moreau. Stability of continuous-time distributed consensus algorithms. In IEEE Conf. on Decision and Control, pages 3998–4003, Nassau, Bahamas, 2004. 138, 139 L. Moreau. Stability of multiagent systems with time-dependent communication links. IEEE Transactions on Automatic Control, 50(2):169–182, 2005. 43, 132, 137 Z. Néda, E. Ravasz, T. Vicsek, Y. Brechet, and A.-L. Barabási. Physics of the rhythmic applause. Physical Review E, 61(6):6987–6992, 2000. 171 M. E. J. Newman. Networks: An Introduction. Oxford University Press, 2010. ISBN 0199206651. 60, 199 I. Noy-Meir. Desert ecosystems: environment and producers. Annual Review of Ecology and Systematics, pages 25–51, 1973. 6, 107 K.-K. Oh, M.-C. Park, and H.-S. Ahn. A survey of multi-agent formation control: Position-, displacement-, and distance-based approaches. Technical Report Technical Report, Number: GIST DCASL TR 2012-02, Gwangju Institute of Science and Technology, Korea, 2012. 167 Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 218
Bibliography
K.-K. Oh, M.-C. Park, and H.-S. Ahn. A survey of multi-agent formation control. Automatica, 53: 424–440, 2015. 151 R. Olfati-Saber. Flocking for multi-agent dynamic systems: Algorithms and theory. IEEE Transactions on Automatic Control, 51(3):401–420, 2006. 151 R. Olfati-Saber, E. Franco, E. Frazzoli, and J. S. Shamma. Belief consensus and distributed hypothesis testing in sensor networks. In P. J. Antsaklis and P. Tabuada, editors, Network Embedded Sensing and Control. (Proceedings of NESC’05 Worskhop), Lecture Notes in Control and Information Sciences, pages 169–182. Springer, 2006. ISBN 3540327940. 10 A. Olshevsky and J. N. Tsitsiklis. On the nonexistence of quadratic Lyapunov functions for consensus algorithms. IEEE Transactions on Automatic Control, 53(11):2642–2645, 2008. 132 R. W. Owens. An algorithm to solve the Frobenius problem. Mathematics Magazine, 76(4):264–275, 2003. 51 L. Page. Method for node ranking in a linked database, Sept. 2001. US Patent 6,285,999. 63 D. A. Paley, N. E. Leonard, R. Sepulchre, D. Grunbaum, and J. K. Parrish. Oscillator models and collective motion. IEEE Control Systems Magazine, 27(4):89–105, 2007. 171, 175 J. Pantaleone. Stability of incoherence in an isotropic gas of oscillating neutrinos. Physical Review D, 58 (7):073002, 1998. 171 G. Piovan, I. Shames, B. Fidan, F. Bullo, and B. D. O. Anderson. On frame and orientation localization for relative sensing networks. Automatica, 49(1):206–213, 2013. 95 V. H. Poor. An Introduction to Signal Detection and Estimation. Springer, 1994. 10 V. Rakočević. On continuity of the Moore-Penrose and Drazin inverses. Matematichki Vesnik, 49(3-4): 163–172, 1997. 188 B. S. Y. Rao and H. F. Durrant-Whyte. A decentralized Bayesian algorithm for identification of tracked targets. IEEE Transactions on Systems, Man & Cybernetics, 23(6):1683–1698, 1993. 10 W. Ren, R. W. Beard, and E. M. Atkins. Information consensus in multivehicle cooperative control: Collective group behavior through local interaction. IEEE Control Systems Magazine, 27(2):71–82, 2007. 83, 84, 88 R. Sepulchre, D. A. Paley, and N. E. Leonard. Stabilization of planar collective motion: All-to-all communication. IEEE Transactions on Automatic Control, 52(5):811–824, 2007. 171 P. N. Shivakumar, J. J. Williams, Q. Ye, and C. A. Marinov. On two-sided bounds related to weakly diagonally dominant m-matrices with application to digital circuit dynamics. SIAM Journal on Matrix Analysis and Applications, 17(2):298–312, 1996. 111 O. Simeone, U. Spagnolini, Y. Bar-Ness, and S. H. Strogatz. Distributed synchronization in wireless networks. IEEE Signal Processing Magazine, 25(5):81–97, 2008. 171 Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 Bibliography
219
J. W. Simpson-Porco, F. Dörfler, and F. Bullo. Droop-controlled inverters are Kuramoto oscillators. In IFAC Workshop on Distributed Estimation and Control in Networked Systems, pages 264–269, Santa Barbara, CA, USA, Sept. 2012. 171 S. L. Smith, M. E. Broucke, and B. A. Francis. A hierarchical cyclic pursuit scheme for vehicle networks. Automatica, 41(6):1045–1053, 2005. 7 E. H. Spanier. Algebraic Topology. Springer, 1994. 188 S. H. Strogatz. From Kuramoto to Crawford: Exploring the onset of synchronization in populations of coupled oscillators. Physica D: Nonlinear Phenomena, 143(1):1–20, 2000. 171 A. Tahbaz-Salehi and A. Jadbabaie. A necessary and sufficient condition for consensus over random networks. IEEE Transactions on Automatic Control, 53(3):791–795, 2008. 143, 145 Y. Takeuchi. Global Dynamical Properties of Lotka-Volterra Systems. World Scientific Publishing, 1996. ISBN 9810224710. 168 H. G. Tanner, A. Jadbabaie, and G. J. Pappas. Flocking in fixed and switching networks. IEEE Transactions on Automatic Control, 52(5):863–868, 2007. 151 P. A. Tass. A model of desynchronizing deep brain stimulation with a demand-controlled coordinated reset of neural subpopulations. Biological Cybernetics, 89(2):81–88, 2003. 171 F. Varela, J. P. Lachaux, E. Rodriguez, and J. Martinerie. The brainweb: Phase synchronization and large-scale integration. Nature Reviews Neuroscience, 2(4):229–239, 2001. 171 T. Vicsek, A. Czirók, E. Ben-Jacob, I. Cohen, and O. Shochet. Novel type of phase transition in a system of self-driven particles. Physical Review Letters, 75(6-7):1226–1229, 1995. 174 T. J. Walker. Acoustic synchrony: two mechanisms in the snowy tree cricket. Science, 166(3907): 891–894, 1969. 171 G. G. Walter and M. Contreras. Compartmental Modeling with Networks. Birkhäuser, 1999. 107 J. Wang and N. Elia. Control approach to distributed optimization. In Allerton Conf. on Communications, Control and Computing, pages 557–561, Monticello, IL, USA, 2010. 89, 91 Y. Wang, D. Chakrabarti, C. Wang, and C. Faloutsos. Epidemic spreading in real networks: An eigenvalue viewpoint. In IEEE Int. Symposium on Reliable Distributed Systems, pages 25–34, Oct. 2003. 199 A. Watton and D. W. Kydon. Analytical aspects of the N -bug problem. American Journal of Physics, 37 (2):220–221, 1969. 7 A. T. Winfree. Biological rhythms and the behavior of populations of coupled oscillators. Journal of Theoretical Biology, 16(1):15–42, 1967. 171 J. Wolfowitz. Product of indecomposable, aperiodic, stochastic matrices. Proceedings of American Mathematical Society, 14(5):733–737, 1963. 55 Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.
 220
Bibliography
W. Xia and M. Cao. Sarymsakov matrices and asynchronous implementation of distributed coordination algorithms. IEEE Transactions on Automatic Control, 59(8):2228–2233, 2014. 132 L. Xiao and S. Boyd. Fast linear iterations for distributed averaging. Systems & Control Letters, 53:65–78, 2004. 119, 127 L. Xiao, S. Boyd, and S. Lall. A scheme for robust distributed sensor fusion based on average consensus. In Symposium on Information Processing of Sensor Networks, pages 63–70, Los Angeles, CA, USA, Apr. 2005. 9 R. A. York and R. C. Compton. Quasi-optical power combining using mutually synchronized oscillator arrays. IEEE Transactions on Microwave Theory and Techniques, 39(6):1000–1009, 2002. 171 S. Zampieri. Lecture Notes on Dynamics over Networks. Minicourse at UC Santa Barbara, Apr. 2013. 199 D. Zelazo. Graph-Theoretic Methods for the Analysis and Synthesis of Networked Dynamic Systems. PhD thesis, University of Washington, 2009. 101 D. Zelazo and M. Mesbahi. Edge agreement: Graph-theoretic performance bounds and passivity analysis. IEEE Transactions on Automatic Control, 56(3):544–555, 2011. 105
Lectures on Network Systems, F. Bullo Version v0.81 (4 Jan 2016). Draft not for circulation. Copyright © 2012-16.