The composition of a group determines much of its behavior (are people old or young, PhDs or illiterate, artists or scientists?). As a result, organizations, governments, and companies are deeply interested in being able to quickly learn the makeup of groups. In order to approach this problem, we’ve been developing technologies for inferring the demographics of Twitter populations from the textual content and networks that the users in them produce. Our methods stand out as the most accurate in the literature. In this talk, I’m going to give an overview of the latent attribute inference problem, discuss the advances that we’ve made in solving it, and highlight some of the big issues that still need to be tackled.
Derek Ruths is an assistant professor of Computer Science at McGill University. He joined the faculty in 2009 after completing his PhD in Computer Science at Rice University. A major research direction in his group considers the problem of characterizing and predicting the large-scale dynamics of human behavior in online social platforms. His ongoing work in this area includes quantitatively modeling how communities change over time, measuring and predicting group demographics from unstructured user-generated content, and computational methods for assessing discussion topics within a collection of users. His work has been published in top-tier journals and conferences including Science, EMNLP, ICWSM, and PLoS Computational Biology. His research is currently funded by a wide array of organizations including NSERC, SSHRC, tech companies, and the US National Science Foundation – underscoring the broad, interdisciplinary nature of his work.