Pinning Down “Privacy” in Statistical Databases

  • Date
    February 28, 2012
  • Time
    11:30 AM
  • Location
    366 WVH


Consider an agency holding a large database of sensitive personal information — medical records, census survey answers, web search records, or genetic data, for example. The agency would like to discover and publicly release global characteristics of the data (say, to inform policy and business decisions) while protecting the privacy of individuals’ records. This problem is known variously as “statistical disclosure control”, “privacy-preserving data mining” or simply “database privacy”.

In this talk, I will describe “differential privacy”, a notion which emerged from a recent line of work in theoretical computer science that seeks to formulate and satisfy rigorous definitions of privacy for such statistical databases. Satisfactory definitions had previously proved elusive largely because of the difficulty of reasoning about “side information” — knowledge available to an attacker through other channels. Differential privacy provides a meaningful notion of privacy in the presence of arbitrary side information. After explaining some attacks that motivate our approach, I will sketch some of the basic techniques for achieving differential privacy as well as recent results on differentially private statistical analysis and learning.

Brief Biography

Adam Smith is an associate professor in the Department of Computer Science and Engineering at the Pennsylvania State University. His research interests lie in cryptography, data privacy and their connections to information theory, quantum computing and statistics.

He received his Ph.D. from MIT in 2004 and was subsequently a visiting scholar at the Weizmann Institute of Science and UCLA. He is the recipient of an NSF CAREER award and a Presidential Early Career Award for Scientists and Engineers (PECASE).