Today's Topic: User Interactions

Data Gathering Methodology

Crawling Facebook

  • Complete BFS crawl of the 22 largest regional networks in 2008
    • Facebook had ~67M total users, 66% of which belonged to a regional network
    • Examples: London, Australia, Turkey, France, Toronto, NYC, etc.
    • By default, full profile and friend list visible to other users in your region
  • Dataset summary
    • 10.6M total users, 408M total edges
    • ~50% coverage of each region: could not crawl private users
  • Gathered visible interactions as well as social edges
    • Complete history of incoming wall posts and photo comments for each user
    • Why couldn't we crawl outgoing interactions?

Crawling Renren

  • Renren is the largest and oldest OSN in China
    • Launched in 2005
    • Today, boasts over 200 million users
    • At the time (2009), Renren was virtually identical to Facebook
  • Conducted two separate crawls of Renren
    • First crawl: complete BFS crawl of Renren
      • All friend lists are publicly visible, no way to make them private
      • 42M total nodes, 1.6B total edges
    • Second crawl: focused crawl of the PKU network
      • We were able to make many accounts in the Peking University network
      • By default, these accounts can view interactions of other PKU users
      • Crawled 61K users (out of 100K possible)
      • Complete history of wall posts, status updates, blog entries, photos/tags, etc.

Gathering Latent Interactions

  • Each user profile on Renren has a list of the 9 most recent visitors
  • Unique source of data not available on other OSNs
  • Challenge: how to gather continuous history of visitors to profiles?
    • Revisit each profile at regular intervals and grab visitor list

Sequencing Latent Interactions

Prioritized Crawling

  • Issue: visitor list only holds last 9 users
    • Need to crawl frequently to avoid missed visits
    • Limited HTTP requests per day
  • Solution: build a prioritized crawler
    • Visit popular users more frequently
    • Most users can safely be crawled once per day
  • Problem: how do we know we have captured all visitors?

Validating the Methodology

  • Experiment: crawl a subset of users every 15 minutes
  • Extrapolate what visits would have been missed at different crawl frequencies

Social Graph Analysis

Degree Distribution

Clustering Distribution

Visible Interactions on Facebook

Interactions Over Social Graphs

  • Analysis of visible interactions on Facebook
    • Data drawn from wall posts and photo comments
    • All interactions are incoming, e.g. friends who write on your wall
    • Keep in mind: in 2008, there was no news-feed!
  • Questions
    • How are interactions distributed amongst your friends?
    • Are all users equally interactive
    • Is there a relationship between social degree and interactivity?

Interactions Over Friends

Q: How are interactions distributed over friends?
  • A: Interactions are highly skewed towards a small number of friends
  • Almost nobody interacts with >50% of their friends

Interaction Distribution

Q: Are some users more interactive than others?
  • A: Interaction distribution is highly skewed towards a small number of very active users

Interactivity vs. Social Degree

Q: Is there a relationship between social degree and interactivity?
  • A: High degree users are more likely to be highly interactive as well (positive correlation)

Latent Interactions on Renren

Profile Visits, Popularity, and Consumption

  • Switching from Facebook data to Renren data
    • Focus on latent interaction, i.e. profile visits
  • Analysis of latent interactions
    • How is popularity distributed?
    • Is there a relationship between popularity and consumption?
    • Who visits your profile?
  • Comparing latent and visible interactions

Popularity Distribution

Q: How is popularity (total profile views) distributed?
  • A: Highly skewed distribution
  • 60% of users receive <100 total profile views

Popularity vs. Consumption

Q: Are the users who receive the most visits also the users who do the most visiting?
  • A: In general, consumers are not highly correlated with popularity
  • One exception: small contigent of extremely popular and active users

Repeat Profile Visits

Q: How many profile visits come from repeat visitors?
  • A: 70% of users have <50% repeat visitors
  • One exception: 10% of users have 100% repeat visitors

Reciprocity of Profile Views

Q: If you visit someones profile, will they visit you back?
  • A: Latent interactions are not reciprocal
  • Surprising result, since Renren users can see who has visited their profile

Visitor Composition

Q: Who is browsing profiles: friends, fofs, or strangers?
  • A: Around 20% of visits come from fofs, and 20% come from strangers
  • Disturbing finding: most people don't consider all the strangers browsing their profile

Interactions Over Friends

Q: How are interactions distributed over friends?
  • Like FB, Renren users receive visible interacts from <30% of their friends...
  • However, users receive latent interactions from many more friends

Interaction Distribution

Q: Are some users more interactive than others?
  • A: Latent and visible interactions are both skewed...
  • ...but latent interactions are much less skewed (i.e. more people generate them)

Interaction Graphs

Limits of Social Graphs

  • Existing work focuses on social graphs, assumes all edges are equally important
    • Clearly, this assumption is not correct
  • How does graph structure change if we focus on interactive edges?
  • Methodology: construct new graphs where edges correspond to interactions
    • Visible interaction graphs are a subset of the social graph
      • On Facebook and Renren, you can only visibly interact with friends
    • Latent interaction graphs are a partially-overlapping a superset of the social graph
      • Recall: strangers (i.e. non-friends) can browse profiles

Constructing Interaction Graphs

  • Parameters
    • \(n\): minimum number of interactions necessary to form an edge
    • \(t\): range of time during which interactions must have occured
  • Directed vs. Undirected graphs
    • Interactions are inherently directed
    • For simplicity, we formulate visible interaction graphs as undirected
      • >60% of visible interactions are reciprocated
    • Formulate latent interaction graphs as undirect
      • <30 of latent interactions are reciprocated
  • Alternative construction: weighted graphs

Parameterizing Interaction Graphs

Q: How does the size of interaction graphs vary as the edge requirements change?
  • A: As expected, more restrictive parameters drastically reduce the number of nodes in the SCC

Degree Distributions

Interaction Graph Properties

Interaction Degree vs. Social Degree

Q: What is the relationship between social and interaction degree?
  • Social friends are "free"; social degree essentially unbounded
  • Interactions incur a cost; interaction degree is bounded

Latent Interaction Graphs

Q: How do latent interaction graphs compare to visible interaction and social graphs?
  • A: Fall between the two; less edges than social graph, more than interaction

Clustering on Interaction Graphs

Resiliency and Core Size

Discussion

  • Why do we care about interactions on social networks?
  • Which graphs are better: social or interaction graphs?
  • What practical applications could leverage interaction graphs?
  • This data is 4-5 years old. How might interaction graphs be different today?