Real World Graphs

How do we analyze graphs?

How do different graphs compare?

What makes graphs "special"?

Goals

  • Examine the characteristics of five different real world graphs
  • These graphs all come from different domains
    • Most were not designed to have specific properties
    • All grew organically
    • You would not expect them to have structure or be similar to each other
  • Evaluation metrics
    • Degree distribution, assortativity
    • Clustering coefficient
    • Shortest paths distances, eccentricity
    • Resiliency against node removal
  • Compare against a synthetic baseline

Cast of Characters

  • Facebook: Snapshot of Mexico regional network
    • Nodes: 598140, Edges: 4552493, Collected: 2009
  • Web: Web graph subsample from Google
    • Nodes: 875713, Edges: 5105039, Collected: 2002
  • Email: Complete email communication network from Enron
    • Nodes: 36692, Edges: 183831, Collected: 2004
  • P2P: Complete Gnutella peer to peer network
    • Nodes: 62586, Edges: 147892, Collected: August 31 2002
  • Citations: Complete Arxiv high energy physics citation graph
    • Nodes: 34546, Edges: 421578, Collected: 2003

Choosing a Baseline for Comparison

Which graph type best corresponds with real world graphs?
 
Complete graph
Ring
Star
 
Tree
Bipartite
Random

Random Graphs

  • Graphs we will study are considered to be random graphs
    • Very sparse (i.e. not complete)
    • Not "designed" to have structure (i.e. not a ring or star)
    • Not heirarchical (i.e. not a tree)
    • Not divided into classes of nodes (i.e. not bipartite)
  • Random graph generation
    • Known as an Erdős-Rényi graphs, or binomial graphs
    • \(G_{n, p}\): Choose number of nodes \(n\) and probability \(p\)
    • Form each edge \((u, v) \in G\) with probability \(p\)
  • Graph used in these experiments
    • \(G_{10000, 0.001}\) -- Nodes: 10000, Edges: 100000

How to read PDFs, CDFs, and CCDFs

Degree Distributions (CCDF)

Degree Distributions (CDF)

Clustering Coefficient

\(k_{nn}\) and Assortativity

Shortest Path Distances

Eccentricity

Resiliency

Takeaways So Far...

  • Real world graphs are significantly different from random graphs
    • Degree distributions have long tails
      • Many, many low degree nodes...
      • But also a small core of high-degree super-nodes
    • Small-world phenomenon
      • More clustering than a random graph...
      • But also relatively short average path lengths
      • Known as the tightly-clustered fringe
  • Significant variations of characteristics among real world graphs
  • No single metric tells the whole story about a graph

What is so important about the degree distribution?

  • Great deal of focus in the literature on the degree distribution
    • Especially power-law degree distributions
  • Are other metrics "linked" to the degree distribution?
  • Conduct an experiment
    • Take real graphs and re-wire them
    • Each node maintains its original degree...
    • But the endpoints of all the edges change

Degree Distributions

Shortest Path Distances

Clustering Coefficient

Takeaways

  • Long tailed degree distributions are prevelent in real world graphs
    • We would not expect real world graphs to have this feature
    • That so many graphs do tells us this is an important, emergent characteristic
  • But degree distribution is not the whole story
    • Clustering, path lengths, assortativity, etc. are equally important
    • These metrics are not dependent on the degree distribution

Modeling Real World Graphs

Understanding Emergent Graph Properties

  • Real world graphs have many unexpected features
    • Long tailed degree distributions
    • Tightly clustered fringes
    • Short average path lengths
    • Resiliency against random destruction
  • What natural process creates graphs with these characteristics?
    • Understanding this process can lead to great insight about the natural world
    • Applicable to many domains, e.g. biology, sociology, computer science, etc.

Graph Models

  • Key idea:
    • Create a simple model that generates graphs with desired characteristics
    • Intuition behind algorithm (hopefully) reflects real world processes
  • Example from physics: \(F = m * a\)
    • Extremely simple model, only three variables
    • Enables us to predict projectile motion, celestial orbits, etc.
    • Imparts fundamental understanding about the laws and relationships in nature

Erdős-Rényi Model

  • Introduced in 1959
  • Generate a uniformly random graph
    • \(G_{n, p} = (V, E)\)
    • \(n = |V|\)
    • Form each edge \((u, v) \in E\) with probability \(p\)
  • Not a good fit for real world graphs
    • Short-tailed degree distribution
    • Zero clustering
    • Assortativity is zero
    • Path lengths are too short

Watts-Strogatz Model

  • Introduced in 1998
  • Key ideas
    • Start with a uniform, tightly clustered graph (a ring lattice)
    • Randomly rewire edges to introduce "shortcuts"
    • Resulting graph is still highly clustered, but also has short path lengths
  • Model parameters
    • \(G_{n, k, p} = (V, E)\)
    • \(n = |V|\)
    • Connect each node to its \(k\) nearest neighbors in the ring
    • Rewire each edge \((u, v) \in E\) to \((u, v')\) where \(v' \in V\) with probability \(p\)
  • Resulting graphs is small-world
    • But does not have a power-law degree distribution

Example Watts-Strogatz Graph

Parameters: \(G_{30, 4, 0}\)
 
 
Avg. Path Len: 4.14
Avg. Clustering: 0.5

Average Path Length vs. Clustering Coefficient

Barabási-Albert Model

  • Introduced in 1999
  • Sometimes called Preferential Attachment
    • Exhibits a rich-get-richer pattern
  • Model parameters
    • \(G_{n, m} = (V, E)\)
    • \(n = |V|\)
    • Connect each node to \(m\) other nodes
    • Probability of connecting to node \(i\) with degree \(k_i\): \(\Pi(k_i) = \frac{k_i}{\sum\limits_{j \in V} k_j}\)
  • Resulting graphs has:
    • Power-law degree distribution \(P(k) \sim k^{-\gamma}\)
    • Scale-free behavior

Power-laws In Action

Nearest Neighbor Model

  • Introduced in 2003
  • Based on intuition about social dynamics
    • Your friends are likely to be friends with each other
  • Model parameters
    • \(G_{n, u} = (V, E)\)
    • \(n = |V|\)
    • With probability \(u\), add a new node and connect it to a random node
    • Otherwise, randomly close a triangle in the graph
  • Resulting graphs has:
    • Power-law degree distribution with \(\gamma > 2\)
    • Tighly-clustered fringe

Many, Many Graph Models

  • Random Walk Model (2003)
    • Emulates pattern of friend discovery in social networks
    • Add a new node \(v\), and begin a random walk starting at a random node
    • At each step of the walk, connect \(v\) to that node with probability \(q_v\)
  • Forest Fire Model (2005)
    • Builds graphs with diameters that shrink as they grow larger
    • Add a new node \(v\), and randomly connect it to a node \(w\)
    • With probability \(p\), "burn" (i.e. connect) \(v\) to each of \(w\)'s neighbors
    • Continue this process recursively from each burned node

Fitting Models to Real World Graphs

  • d\(k\) Model (2006)
    • Precisely captures real world graphs using joint degree distributions
      • d\(k\)-1: degree distribution
      • d\(k\)-2: joint degree distribution
      • d\(k\)-3: tri-degree distribution (captures clustering)
      • etc.
    • Very accurate, but very costly
      • State-space (i.e. memory) explodes as \(k\) increases
      • Graph generators for \(k \ge 3\) do not currently exist
  • Kronecker Graphs (2007)
    • Uses Kronecker multiplication to recursively "evolve" an initiator graph
    • Use MLE to fit the evolved graph to a real world graphs

Microscopic Model

  • Introduced in 2008
  • Models dynamic graphs that grow over time
  • Model parameters: \(N()\), \(\lambda\), \(\alpha\), \(\beta\)
    • Node arrival function \(N()\), typically a quadratic over time \(t\)
    • On arrival, node \(v\) samples its lifetime \(a_v = \lambda \mathrm{e}^{-\lambda}\)
    • Attach \(v\) using preferential attachment
    • Node \(v\) with degree \(d_v\) samples it sleep-time from \(p_v = d_v^{-\alpha} * \beta d_v \mathrm{e}^{-\beta d_v}\)
    • When \(v\) wakes up, if its lifetime has not expired, close a triangle that includes \(v\)
  • Complicated model, but produces power-law, tightly clustered graphs
  • One of very few models that models dynamic graphs over time

Microscopic Model In Action

Comparing Preferential Attachment (PA) and Microscopic Model (RR) to the actual Flickr social graph

Discussion

  • Which model produces the most realistic graphs?
    • That depends on what kind of graph you want
    • Different models produce graphs that emphasize different metrics
      • Power-law degree distribution
      • Clustered fringe
      • Shrinking diameter
      • etc.
  • How do you get the best, most realistic graphs from models?
    • Most models have lots of parameters. How do you choose the right values?
    • Some models are designed to fit real graphs, but these models are very expensive