# Computer Science Theory at Northeastern

## Seminar

Click here to subscribe to our mailing list.
Click here to subscribe to our Google Calendar.

## Theory Seminar

This semester the theory seminar will be every Thursday from 3:30-5:00PM in West Village H 366 (map). The Theory Seminar will occur every week whether or not we have a speaker. If we do not have a speaker, we will have short presentations from students or faculty. Write to Jonathan Ullman with suggestions for speakers or food!

## Spring Semester, 2017

 Feb 21, 20173:30PM Jalaj Upadhyay, Penn State Fast and Space-Optimal Differentially-Private Low-Rank Factorization in the General Turnstile Update Model 366 WVH Abstract The problem of low-rank factorization of an m x n matrix $A$ requires outputting a singular value decomposition: an m x k matrix $U$, an n x k matrix $V$, and a k x k diagonal matrix $D$ such that $U D V^T$ approximates the matrix $A$ in the Frobenius norm. In this talk, we study releasing differentially-private low-rank factorization of a matrix in the general turnstile update model. We give a differentially-private algorithm instantiated with respect to a privacy level stronger than privacy levels for this and related problems studied in previous works, namely that of Blocki et al.(FOCS 2012), Dwork et al. (STOC 2014), Hardt and Roth (STOC 2012, STOC 2013), and Hardt and Price (NIPS 2014). We consider two matrices $A$ and $A'$ as neighboring if $A - A'$ can be represented as an outer product of two unit vectors. Our private algorithm with respect to this privacy level incurs optimal additive error. We also prove a lower bound that shows that the space required by this algorithm is optimal up to a logarithmic factor. Based on the paper arXiv:1604.01429 Feb 16, 20173:30PM Adam Sealfon, MIT Network Oblivious Transfer 366 WVH Abstract Motivated by the goal of improving the concrete efficiency of secure multiparty computation (MPC), we study the possibility of implementing an infrastructure for MPC. We propose an infrastructure based on oblivious transfer (OT), which would consist of OT channels between some pairs of parties in the network. We devise information-theoretically secure protocols that allow additional pairs of parties to establish secure OT correlations using the help of other parties in the network in the presence of a dishonest majority. Our main technical contribution is an upper bound that matches a lower bound of Harnik, Ishai, and Kushilevitz (Crypto 2007), who studied the number of OT channels necessary and sufficient for MPC. In particular, we characterize which n-party OT graphs G allow t-secure computation of OT correlations between all pairs of parties, showing that this is possible if and only if the complement of G does not contain as a subgraph the complete bipartite graph $K_{n-t,n-t}$. Joint work with Ranjit Kumaresan and Srinivasan Raghuraman. Feb 9, 20173:30PM Talk cancelled due to weather Feb 2, 20173:30PM Constantinos Daskalakis, MIT Ten Steps of EM Suffice for Mixtures of Two Gaussians 108 WVH Abstract We provide global convergence guarantees for the expectation-maximization (EM) algorithm applied to mixtures of two Gaussians with known covariance matrices. We show that EM converges geometrically to the correct mean vectors, and provide simple, closed-form expressions for the convergence rate. As a simple illustration, we show that in one dimension ten steps of the EM algorithm initialized at infinity result in less than 1% error estimation of the means. Joint work with Christos Tzamos and Manolis Zampetakis. Jan 26, 20173:30PM No talk Jan 19, 20173:30PM Stratis Ioannidis, Northeastern Secure Function Evaluation at Scale 366 WVH Abstract Secure Function Evaluation (SFE) allows an interested party to evaluate a function over private data without learning anything about the inputs other than the outcome of this computation. This offers a strong privacy guarantee: SFE enables, e.g., a medical researcher, a statistician, or a data analyst, to conduct a study over private, sensitive data, without jeopardizing the privacy of the study's participants (patients, online users, etc.). Nevertheless, applying SFE to "big data" poses several challenges. First, beyond any computational overheads due to encryptions and decryptions, executing an algorithm securely may lead to a polynomial blowup in the total work compared to execution in the clear. For large datasets, even going from linear to quadratic complexity is prohibitive. Second, secure evaluations of algorithms should also maintain parallelizability: an algorithm that is easy to parallelize in the clear should also maintain this property in its SFE version, if its execution is to scale. This poses a challenge, as communication patterns between processors may reveal a lot about the private inputs. In this talk, we show that several machine learning and data mining algorithms can be executed securely while leading to only (a) a logarithmic blow-up in total work and (b) a logarithmic increase in the execution time when executed in parallel. Our techniques generalize to any algorithm that is graph-parallel, i.e., can be expressed through a sequence of scatter, gather, and apply operations. This includes several algorithms of practical interest, including page rank, matrix factorization, and training neural networks. Jan 12, 20173:30PM Jonathan Ullman, Northeastern Algorithmic Stability for Adaptive Data Analysis 366 WVH Abstract Adaptivity is an important feature of data analysis - the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated a general formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution $P$ and a set of $n$ independent samples $x$ is drawn from $P$. We seek an algorithm that, given $x$ as input, accurately answers a sequence of adaptively chosen "queries" about the unknown distribution $P$. How many samples $n$ must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? We give new upper bounds on the number of samples n that are needed to answer statistical queries. The bounds improve and simplify the work of Dwork et al. (STOC, 2015), and have been applied in subsequent work by those authors (Science, 2015; NIPS, 2015). As in Dwork et al., our algorithms are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that the stability notion guaranteed by differential privacy implies low generalization error. Joint work with Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, and Uri Stemmer. Jan 03, 20172:00PM Li-Yang Tan, TTI-Chicago Theory Seminar 166 WVH Abstract We consider the fundamental derandomization problem of deterministically finding a satisfying assignment to a CNF formula that has many satisfying assignments. We give a deterministic algorithm which, given an $n$-variable $poly(n)$-clause CNF formula $F$ that has $|F-1(1)| /eq \epsilon 2n$, runs in time $nO^~(loglogn)2$ for $\epsilon \geq 1/\polylog(n)$ and outputs a satisfying assignment of $F$. Prior to our work the fastest known algorithm for this problem was simply to enumerate over all seeds of a pseudorandom generator for CNFs; using the best known PRGs for CNFs [DETT10], this takes time $n\Omega^~(logn)$ even for constant $\epsilon$. Joint work with Rocco Servedio.