# Computer Science Theory at Northeastern

## Theory Seminar

This semester the theory seminar will be every Thursday from 3:30-5:00PM in West Village H 366 (map). The Theory Seminar will occur every week whether or not we have a speaker. If we do not have a speaker, we will have short presentations from students or faculty. Write to Jonathan Ullman with suggestions for speakers or food!

## Spring Semester, 2017

 Jan 19, 20173:30PM Stratis Ioannidis, Northeastern Secure Function Evaluation at Scale 366 WVH Abstract Secure Function Evaluation (SFE) allows an interested party to evaluate a function over private data without learning anything about the inputs other than the outcome of this computation. This offers a strong privacy guarantee: SFE enables, e.g., a medical researcher, a statistician, or a data analyst, to conduct a study over private, sensitive data, without jeopardizing the privacy of the study's participants (patients, online users, etc.). Nevertheless, applying SFE to "big data" poses several challenges. First, beyond any computational overheads due to encryptions and decryptions, executing an algorithm securely may lead to a polynomial blowup in the total work compared to execution in the clear. For large datasets, even going from linear to quadratic complexity is prohibitive. Second, secure evaluations of algorithms should also maintain parallelizability: an algorithm that is easy to parallelize in the clear should also maintain this property in its SFE version, if its execution is to scale. This poses a challenge, as communication patterns between processors may reveal a lot about the private inputs. In this talk, we show that several machine learning and data mining algorithms can be executed securely while leading to only (a) a logarithmic blow-up in total work and (b) a logarithmic increase in the execution time when executed in parallel. Our techniques generalize to any algorithm that is graph-parallel, i.e., can be expressed through a sequence of scatter, gather, and apply operations. This includes several algorithms of practical interest, including page rank, matrix factorization, and training neural networks. Jan 12, 20173:30PM Jonathan Ullman, Northeastern Algorithmic Stability for Adaptive Data Analysis 366 WVH Abstract Adaptivity is an important feature of data analysis - the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated a general formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution $P$ and a set of $n$ independent samples $x$ is drawn from $P$. We seek an algorithm that, given $x$ as input, accurately answers a sequence of adaptively chosen "queries" about the unknown distribution $P$. How many samples $n$ must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? We give new upper bounds on the number of samples n that are needed to answer statistical queries. The bounds improve and simplify the work of Dwork et al. (STOC, 2015), and have been applied in subsequent work by those authors (Science, 2015; NIPS, 2015). As in Dwork et al., our algorithms are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that the stability notion guaranteed by differential privacy implies low generalization error. Joint work with Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, and Uri Stemmer. Jan 03, 20172:00PM Li-Yang Tan, TTI-Chicago Theory Seminar 166 WVH Abstract We consider the fundamental derandomization problem of deterministically finding a satisfying assignment to a CNF formula that has many satisfying assignments. We give a deterministic algorithm which, given an $n$-variable $poly(n)$-clause CNF formula $F$ that has $|F-1(1)| /eq \epsilon 2n$, runs in time $nO^~(loglogn)2$ for $\epsilon \geq 1/\polylog(n)$ and outputs a satisfying assignment of $F$. Prior to our work the fastest known algorithm for this problem was simply to enumerate over all seeds of a pseudorandom generator for CNFs; using the best known PRGs for CNFs [DETT10], this takes time $n\Omega^~(logn)$ even for constant $\epsilon$. Joint work with Rocco Servedio.