What can a social network tell us about the underlying latent "social structure," the way in which individuals are similar or dissimilar? Much of social network analysis is, implicitly or explicitly, predicated on the assumption that individuals tend to be more similar to their friends than to strangers. Having explicit access to similarity information instead of merely the noisy signal provided by the presence or absence of edges could improve analysis significantly. We study the following natural question:Given a social network - reflecting the underlying social distances between its nodes - how accurately can we reconstruct the social structure?
It is tempting to model similarities and dissimilarities as distances, so that the social structure is a metric space. However, observed social networks are usually multiplex, in the sense that different edges originate from similarity in one or more among a number of different categories, such as geographical proximity, professional interactions, kinship, or hobbies.
Since close proximity in even one category makes the presence of edges much more likely, an observed social network is more accurately modeled as a union of separate networks. In general, it is a priori not known which network a given edge comes from. While generative models based on a single metric space have been analyzed previously, a union of several networks individually generated from metrics is structurally very different from (and more complex than) networks generated from just one metric.
We begin to address this reconstruction problem formally. The latent "social structure" consists of several metric spaces. Each metric space gives rise to a "distance-based random graph," in which edges are created according to a distribution that depends on the underlying metric space and makes long-range edges less likely than short ones. For a concrete model, we consider Kleinberg's small-world model and some variations thereof. The observed social network is the union of these graphs. All edges are unlabeled, in the sense that the existence of an edge does not reveal which random graph it comes from. Our main result is a near-linear time algorithm which reconstructs from this unlabeled union each of the individual metrics with provably low distortion.
(Joint work with Ittai Abraham Shiri Chechik, and Aleksandrs Slivkins.)
David Kempe received his Ph.D. from Cornell University in 2003, and has been on the faculty in the computer science department at USC since the Fall of 2004, where he is currently an Associate Professor. During the Spring of 2012, he is visiting Microsoft Research, New England.
His primary research interests are in computer science theory and the design and analysis of algorithms, with a particular emphasis on social networks, algorithms for feature selection, and game-theoretic and pricing questions. He is a recipient of the NSF CAREER award, the VSoE Junior Research Award, the ONR Young Investigator Award, and a Sloan Fellowship, in addition to several USC Mentoring awards.