%%.tex template for scribes
%Use \ell, not l. l is asking for trouble.
%Put punctuations (. , ; :) in equations as if it was text.
%Don't do multiple lines of $$ ... $$, using \begin{align} \\ \\ \end{align} for that
%\subset means <, if you mean \le use \subseteq
%Use \equation, \[ \], \align at will, but try not to use too fancy environments.
%Known limitations:
%Don't define any new commands, especially those for math mode.
%Doing math inside \text inside math screws things up,
%e.g. $ \text{this = $x$}$ won't work.
%don't use \qedhere
%\ref inside math doesn't work (why?). It remains as \ref{claim} in wordpress
\documentclass[12pt]{article}
\usepackage[latin9]{inputenc}
\usepackage{amsthm}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{hyperref}
\makeatletter
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Textclass specific LaTeX commands.
\theoremstyle{plain}
\newtheorem{thm}{\protect\theoremname}
\theoremstyle{plain}
\newtheorem{lem}[thm]{\protect\lemmaname}
\theoremstyle{plain}
\newtheorem{prop}[thm]{\protect\propositionname}
\theoremstyle{remark}
\newtheorem{claim}[thm]{\protect\claimname}
\theoremstyle{plain}
\newtheorem{conjecture}[thm]{\protect\conjecturename}
\theoremstyle{definition}
\newtheorem{problem}[thm]{\protect\problemname}
\theoremstyle{remark}
\newtheorem{rem}[thm]{\protect\remarkname}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% User specified LaTeX commands.
%lyx2wpost preable August 3 2017.
%This is an evolving preamble which works in tandem with myConfig5.cfg
%and the lyx2wpost script
\usepackage[english]{babel}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Textclass specific LaTeX commands.
\newtheorem{theorem}{Theorem}[section]
%\newtheorem{conjecture}[theorem]{Conjecture}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{proposition}[theorem]{Proposition}
%\newtheorem{claim}[theorem]{Claim}
\newtheorem{remark}[theorem]{Remark}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{construction}[theorem]{Construction}
%Theorem 1 instead of Theorem 0.1
\renewcommand{\thetheorem}{%
\arabic{theorem}}
\renewenvironment{theorem}
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Theorem \thetheorem.}}{\vspace{0.2cm} }
\ifcsname thm\endcsname
\renewenvironment{thm}%
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Theorem \thetheorem.}}{\vspace{0.2cm} }%
\fi
\ifcsname conjecture\endcsname
\renewenvironment{conjecture}
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Conjecture \thetheorem.}}{\vspace{0.2cm} }
\fi
\ifcsname corollary\endcsname
\renewenvironment{corollary}
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Corollary \thetheorem.}}{\vspace{0.2cm} }
\fi
\ifcsname proposition\endcsname
\renewenvironment{proposition}
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Proposition \thetheorem.}}{\vspace{0.2cm} }
\fi
\ifcsname prop\endcsname
\renewenvironment{prop}%
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Proposition \thetheorem.}}{\vspace{0.2cm} }%
\fi
\ifcsname claim\endcsname
\renewenvironment{claim}%
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Claim \thetheorem.}}{\vspace{0.2cm} }%
\fi
\ifcsname definition\endcsname
\renewenvironment{definition}
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Definition \thetheorem.}}{\vspace{0.2cm} }
\fi
\ifcsname lemma\endcsname
\renewenvironment{lemma}
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Lemma \thetheorem.}}{\vspace{0.2cm} }
\fi
\ifcsname lem\endcsname
\renewenvironment{lem}%
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Lemma \thetheorem.}}{\vspace{0.2cm} }%
\fi
\ifcsname remark\endcsname
\renewenvironment{remark}
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Remark \thetheorem.}}{\vspace{0.2cm} }
\fi
\ifcsname rem\endcsname
\renewenvironment{rem}%
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Remark \thetheorem.}}{\vspace{0.2cm} }%
\fi
\ifcsname problem\endcsname
\renewenvironment{problem}%
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Problem \thetheorem.}}{\vspace{0.2cm} }%
\fi
%\renewenvironment{proof}{\par \noindent \textbf{Proof:}}{\hfill $\square$ \newline \newline} %QED \par\vspace{0.5cm}}
\renewcommand{\qedsymbol}{$\square$}
\renewcommand{\paragraph}[1]{\textbf{#1}. }
\makeatother
\providecommand{\claimname}{Claim}
\providecommand{\conjecturename}{Conjecture}
\providecommand{\lemmaname}{Lemma}
\providecommand{\problemname}{Problem}
\providecommand{\propositionname}{Proposition}
\providecommand{\remarkname}{Remark}
\providecommand{\theoremname}{Theorem}
\begin{document}
\global\long\def\E{\mathrm{\mathbb{E}}}
%\global\long\def\e{\mathrm{\mathbb{\epsilon}}}
\noindent
\href{http://www.ccs.neu.edu/home/viola/classes/spepf17.html}{Special
Topics in Complexity Theory}, Fall 2017. Instructor:
\href{http://www.ccs.neu.edu/home/viola/}{Emanuele Viola}
%\href{http://www.ccs.neu.edu/home/viola/classes/spepf17.html}{For the
%class webpage click here.}
\section{Lectures 4-5, Scribe: Matthew Dippel}
These lectures cover some basics of small-bias distributions, and then a
more recent pseudorandom generator for read-once CNF
\cite{GopalanMRTV12}.
\section{Small bias distributions}
\begin{definition}[Small bias distributions]
\label{smallbias_dist}
A distribution $D$ over $\{0, 1\}^n$ has bias $\epsilon$ if no parity function can distinguish it from uniformly random strings with probability greater than $\epsilon$. More formally, we have:
$$\forall S \subseteq [n], S \neq \emptyset, \left \vert \mathbb{P}_{x \in D}\left [\bigoplus_{i \in S} x_i = 1 \right] - 1/2\right \vert \leq \epsilon. $$
\end{definition}
In this definition, the $1/2$ is simply the probability of a parity test being
$1$ or $0$ over the uniform distribution. We also note that whether we
change the definition to have the probability of the parity test being $0$ or
$1$ doesn't matter. If a test has probability $1/2 + \epsilon$ of being equal
to $1$, then it has probability $1 - (1/2 + \epsilon) = 1/2 - \epsilon$ of
being $0$, so the bias is independent of this choice.
This can be viewed as a distribution which fools tests $T$ that are
restricted to computing parity functions on a subset of bits.
Before we answer the important question of how to construct and
efficiently sample from such a distribution, we will provide one interesting
application of small bias sets to expander graphs.
\begin{theorem}[Expander construction from a small bias set]
\label{smallbias_expander} Let $D$ be a distribution over $\{0, 1\}^n$ with
bias $\epsilon$. Define $G = (V, E)$ as the following graph:
$$V = \{0, 1\}^n, E = \{(x,y) \vert x \oplus y \in \text{support}(D)\}.$$
Then, when we take the eigenvalues of the random walk matrix of $G$ in
descending order $\lambda_1, \lambda_2, ... \lambda_{2^n}$, we have
that:
$$\max \{|\lambda_2|, |\lambda_{2^n}|\} \leq \epsilon.$$
\end{theorem}
Thus, small-bias sets yields expander graphs. Small-bias sets also turn
out to be equivalent to constructing good linear codes. Although all these
questions have been studied much before the definition of small-bias sets
\cite{NaN90}, the computational perspective has been quite useful, even in
answering old questions. For example Ta-Shma used this perspective to
construct better codes \cite{Ta-Shma17}.
\iffalse
\begin{proof}
We'll first show a more general lemma which applies to all graphs that are
constructed with a similar template.
\begin{lemma}
\label{setdiff_graph}
Let $S$ be any subset of $\{0, 1\}^n$. Define the graph $G_S$ as $V = \{0, 1\}^n$, $E_S = \{(x, y) \vert x \oplus y \in S\}$. Let $M$ be the $1-0$ valued adjacency matrix of $G$, where we will implicitly index by $x \in \{0, 1\}^n$. Then $M$ has exactly as eigenvectors the $2^n$ vectors represented by the parity functions $\chi_T(x), T \subset [n]$, defined as:
$$\chi_T(x) = (-1)^{\sum_{i \in T} x_i}$$
That is, the entry of $\chi_T$ indexed by $x$ is simply the value of $\chi_T(x)$. Furthermore, the eigenvalue of $\chi_T$ is $\sum_{y \in S}\chi_T(y)$
\end{lemma}
\begin{proof}
For some vector $v$, the $x$th entry of $Mv$ is the real valued sum of the entries of $v$ indexed by the members of $\{x + y | y \in S\}$. Hence we have:
$$(M\chi_T)(x) = \sum_{y \in S}\chi_T(x + y)$$
$$ = \sum_{y \in S}\chi_T(x)\chi_T(y)$$
$$ = \chi_T(x) \sum_{y \in S}\chi_T(y)$$
Since $\sum_{y \in S}\chi_T(y)$ is independent of $x$, all entries of $\chi_T$ can be viewed as being scaled by $\sum_{y \in S}\chi_T(y)$ when the vector is multiplied by $M$. This completes the proof.
\end{proof}
Applying the above to the unnormalized adjacency matrix of the defined graph, we find that all eigenvectors have eigenvalue $\sum_{y \in \text{supp}{D}}\chi_T(y)$. When $T = \emptyset$, this is simply the all ones vector with eigenvalue $|D|$, which normalized to $1$ when we look at the random walk matrix. For all other sets $T$, we have by definition of the small bias distribution that $\left | \sum_{y \in \text{supp}{D}}\chi_T(y) \right | \leq \epsilon |D|$. Hence when normalized, is at most $\epsilon$.
\end{proof}
\fi
\section{Constructions of small bias distributions}
Just like our construction of bounded-wise independent distributions from
the previous lecture, we will construct small-bias distributions using
polynomials over finite fields.
\begin{theorem}[Small bias construction]
\label{smallbias_constructor} Let $\mathcal{F}$ be a finite field of size
$2^\ell$, with elements represented as bit strings of length $\ell$. We
define the generator $G : \mathcal{F}^2 \rightarrow \{0, 1\}^n$ as the
following:
$$G(a, b)_i = \left\langle a^i, b \right\rangle = \sum_{j \leq \ell} (a^i)_j b_j \mod 2.$$
In this notation, a subscript of $j$ indicates taking the $j$th bit of the
representation. Then the output of $G(a, b)$ over uniform $a$ and $b$
has bias $n / 2^{\ell}$.
\end{theorem}
\begin{proof}
Consider some parity test induced by a subset $S \subset [n]$. Then
when applied to the output of $G$, it simplifies as:
$$\sum_{i \in S}G(a, b)_i = \sum_{i \in S}\left\langle a^i, b \right\rangle = \left\langle \sum_{i \in S} a^i, b \right\rangle.$$
Note that $\sum_{i \in S} a^i$ is the evaluation of the polynomial $P_S (x)
:= \sum_{i \in S} x^i$ at the point $a$. We note that if $P_S(a) \neq 0$,
then the value of $\left\langle P_S(a), b \right\rangle $ is equally likely to
be $0$ or $1$ over the probability of a uniformly random $b$. This follows
from the fact that the inner product of any non-zero bit string with a
uniformly random bit string is equally likely to be $0$ or $1$. Hence in this
case, our generator has no bias.
In the case where $P_S(a) = 0$, then the inner product will always be $0$,
independent of the value of $b$. In these situations, the bias is $1/2$, but
this is conditioned on the event that $P_S(a) = 0$.
We claim that this event has probability $\leq n / 2^\ell$. Indeed, for non
empty $S$, $P_S(a)$ is a polynomial of degree $\leq n$. Hence it has at
most $n$ roots. But we are selecting $a$ from a field of size $2^\ell$.
Hence the probability of picking one root is $\le n / 2^\ell$.
Hence overall the bias is at most $n/2^\ell$.
\end{proof}
\iffalse
Using the above fact, we can more rigorously bound the bias for a specific
parity test $S$ as:
$$\mathbb{E}\left[\sum_{i \in S}G(a, b)_i \right]= \mathbb{E}\left[\left\langle P_S(a), b \right\rangle \right] $$
$$ = \mathcal{P}_a\left[P_S(a) = 0\right]\mathbb{E}_b\left[\left\langle P_S(a), b \right\rangle \right \vert P_S(a)] + \mathcal{P}_a\left[P_S(a) \neq 0\right]\mathbb{E}_b\left[\left\langle P_S(a), b \right\rangle \right \vert P_S(a)]$$
$$= 0 + (1 - p)(1/2)$$
where $p = \mathcal{P}_a\left[P_S(a) = 0\right]$. We already have $p \leq n / 2^l$, and we know the trivial $p \geq 0$. Combining these into the above expression shows that the bias of the above is at most $n / 2^{l + 1}$.
\fi
To make use of the generator, we need to pick a specific $\ell$. Note that
the seed length will be $|a| + |b| = 2\ell$. If we want to achieve bias
$\epsilon$, then we must have $\ell = \log \left(\frac{n}{\epsilon} \right)$. Al
the logarithms in this lecture are in base $2$. This gives us a seed length
of $2\log \left(\frac{n}{\epsilon} \right)$.
Small-bias are so important that a lot of attention has been devote to
optimizing the constant ``2'' above. A lower bound of $\log n + (2 - o(1))\log
(1 / \epsilon)$ on the seed length was known. Ta-Shma recently
\cite{Ta-Shma17} gave a nearly matching construction with seed length
$\log n + (2 + o(1))\log (1 / \epsilon)$.
We next give a sense of how to obtain different tradeoffs between $n$ and
$\epsilon$ in the seed length. We specifically focus on getting a nearly
optimal dependence on $n$, because the construction is a simple,
interesting ``derandomization'' of the above one.
\subsection{An improved small bias distribution via bootstrapping}
We will show another construction of small bias distributions that achieves
seed length $(1 + o(1))\log n + O(\log(1/\epsilon))$. It will make use of the
previous construction and proof.
The intuition is the following: the only time we used that $b$ was uniform
was in asserting that if $P_S(a) \neq 0$, then $\left\langle P_S(a), b
\right\rangle$ is uniform. But we don't need $b$ to be uniform for that.
What do we need from $b$? We need that it has small-bias!
Our new generator is $G(a, G'(a', b'))$ where $G$ and $G'$ are as before
but with different parameters. For $G$, we pick $a$ of length $\ell = \log
n/\epsilon$, whereas $G'$ just needs to be an $\epsilon$-biased generator
on $\ell$ bits, which can be done as we just saw with $O(\log
\ell/\epsilon)$ bits. This gives a seed length of $\log n + \log \log n + O(\log
1/\epsilon)$, as promised.
\iffalse
We still require $a$ and $b = G(a', b')$ to have length $l$. To
clarify between the two, we'll write $|a| = l_{outer}$, and $|a'| = |b'| =
l_{inner}$. We require that the output of $G(a', b')$ both matches the
expected length and has bias $n / 2^{l + 1}$. Then we have that the seed
length of the inner generator is:
$$l_{inner} = 2 \log \left(\frac{l_{outer}}{\epsilon} \right) = 2 \log \left( \frac{\log(n / \epsilon)}{\epsilon} \right)$$
Hence the entire seed length of $H(a, a', b')$ is:
$$l_{outer} + l_{inner} = \log \left(\frac{n}{\epsilon} \right) + 2 \log \left( \frac{\log(n / \epsilon)}{\epsilon} \right)$$
$$ = \log(n) + O(\log \log n) + O(\log(1/\epsilon)$$
\begin{theorem}
The generator $H$ described above has bias $\epsilon = n / 2^l$
\end{theorem}
\begin{proof}
The only event that changes in the generator is when $P_S(a) \neq 0$. Then we wish to figure out the bias of $\left\langle P_S(a), b' \right\rangle$, where $b'$ now has bias $n / 2^l$. But this can be viewed as simply being a parity test on $b'$. Hence by definition it's bias is at most $n / 2^l$ over the choice of $a$. Since the output of $H$ can be expressed as a choice between two latent events, each having bias at most $n / 2^l$, the overall bias is $n / 2^l$.
\end{proof}
It's worth noting that, in the game way we bootstrapped the outer $b = G(a', b')$, there's no reason we can't bootstrap the inner $b'$ as $G(a'', b'')$. However as we continue to do this, the constants start suffering.
\fi
We can of course repeat the argument but the returns diminish.
\section{Connecting small bias to k-wise independence}
We will show that using our small bias generators, we can create
distributions which are almost $k$-wise independent. That is, they are very
close to a $k$-wise independent distribution in statistical distance, while
having a substantially shorter seed length than what is required for
$k$-wise independence. In particular, we will show two results:
\begin{itemize}
\item Small bias distributions are themselves close to $k$-wise
independent.
\item We can improve the parameters of the above by feeding a small
bias distribution to the generator for $k$-wise independence from the
previous lectures. This will improve the seed length of simply using a
small bias distribution.
\end{itemize}
Before we can show these, we'll have to take a quick aside into some
fundamental theorems of Fourier analysis of boolean functions.
\subsection{Fourier analysis of boolean functions 101}
Let $f : \{-1, 1\}^n \rightarrow \{-1, 1\}$. Here the switch between $\{0, 1\}$
and $\{-1, 1\}$ is common, but you can think of them as being isomorphic.
One way to think of $f$ is as being a vector in $\{-1 , 1\}^{2^n}$. The $x$th
entry of $f$ indicates the value of $f(x)$. If we let $\bf{1}_S$ be the
indicator function returning $1$ iff $x = S$, but once again written as a
vector like $f$ is, then any function $f$ can be written over the basis of the
$\bf{1}_S$ vectors, as:
$$f = \sum_S f(s) \bf{1}_S.$$
This is the ``standard'' basis.
Fourier analysis simply is a different basis in which to write functions,
which is sometimes more useful. The basis functions are $\chi_S(x) : \{-1,
1\}^n \rightarrow \{-1, 1\} = \prod_{i \in S} x_i$. Then any boolean function
$f$ can be expressed as:
$$f(x) = \sum_{S \subseteq [n]}\hat{f}(S)\chi_S(x),$$
where the $\hat{f}(S)$, called the ``Fourier coefficients," can be derived as:
$$\hat{f}(S) = \E_{x ~ U_n} \left[f(x)\chi_S(x) \right],$$
where the expectation is over uniformly random $x$.
\begin{claim}
\label{fourier_sq} For any function $f$ with range $\{-1,1\}$, its Fourier
coefficients satisfy:
$$\sum_{S \subseteq [n]}\hat{f}(S)^2 = 1.$$
\end{claim}
\begin{proof}
We know that $\E[f(x)^2] = 1$, as squaring the function makes it $1$. We
can re-express this expectation as:
$$\E[f(x)f(x)] = \E\left[\sum_S \hat{f}(s)\chi_S(x) \cdot \sum_T \hat{f}(T)\chi_T(x)\right] = \E\left[\sum_{S, T} \hat{f}(s)\chi_S(x) \hat{f}(T)\chi_T(x)\right].$$
We make use of the following fact: if $S \neq T$, then
$\E[\chi_S(x)\chi_T(x)] = \E[\chi_{S \oplus T}(x)] = 0$. If they equal each
other, then their difference is the empty set and this function is $1$.
Overall, this implies that the above expectation can be simply rewritten as:
$$\sum_{S = T}\hat{f}(S)\hat{f}(T) = \sum_S \hat{f}(S)^2.$$
Since we already decided that the expectation is $1$, the claim follows.
\end{proof}
\section{Small bias distributions are close to $k$-wise independent}
Before we can prove our claim, we formally introduce what we mean for
two distributions to be close. We use the most common definition of
statistical difference, which we repeat here:
\begin{definition}
Let $D_1, D_2$ be two distributions over the same domain $H$. Then we
denote their statistical distance $\text{SD}(D_1, D_2)$, and sometimes
written as $\Delta(D_1, D_2)$, as
$$\Delta(D_1, D_2) = \max _{T \subseteq H} \left| \mathcal{P}[D_1 \in T] - \mathcal{P}[D_2 \in T]\right|.$$
Note that the probabilities are with respect to the individual distributions
$D_1$ and $D_2$. We may also say that $D_1$ is $\epsilon$-close to
$D_2$ if $\Delta(D_1, D_2) \leq \epsilon$.
\end{definition}
We can now show our result, which is known as \textbf{Vazirani's XOR
Lemma}:
\begin{theorem}
If a distribution $D$ over $\{0, 1\}^n$ has bias $\epsilon$, then $D$ is
$\epsilon 2^{n / 2}$ close to the uniform distribution.
\end{theorem}
\begin{proof}
Let $T$ be a test. To fit the above notation, we can think of $T$ as being
defined as the set of inputs for which $T(x) = 1$. Then we want to bound:
$$|\E[T(D)] - \E[T(U)]|.$$
Expanding $T$ in Fourier basis we rewrite this as
$$|\E[\sum_S \hat{T_S}\chi_S(D)] - \E[\sum_S \hat{T_S}\chi_S(U)]|= |\sum_S \hat{T_S}\left(\E[\chi_S(D)] - \E[\chi_S(U)]\right)|.$$
We know that $\E_U[\chi_S(x)] = 0$ for all non empty $S$, and $1$ when
$S$ is the empty set. We also know that $\E_D[\chi_S(x)] \leq \epsilon$
for all non empty $S$, and is $1$ when $S$ is the empty set. So the
above can be bounded as:
$$\leq \sum_{S \ne \emptyset} |\hat{T_S}| |\E_D[\chi_S(x)] - \E_U[\chi_S(x)]| \leq \sum_S |\hat{T_S}| \epsilon = \epsilon \sum_S |\hat{T_S}|.$$
\begin{lemma}
$\sum_S |\hat{T_S}| \leq 2^{n / 2}$
\end{lemma}
\begin{proof}
By Cauchy Schwartz:
$$\sum |\hat{T_S}| \leq 2^{n/2} \sqrt{\sum \hat{T_S} ^2} \leq 2^{n/2}$$
Where the last simplification follows from Claim \ref{fourier_sq}.
\end{proof}
Using the above lemma completes the upper bound and the proof of the
theorem.
\end{proof}
\begin{corollary}
Any $k$ bits of an $\epsilon$ biased distribution are $\epsilon 2^{k / 2}$
close to uniform.
\end{corollary}
Using the corollary above, we see that we can get $\epsilon$ close to a
$k$-wise independent distribution (in the sense of the corollary) by taking
a small bias distribution with $\epsilon' = \epsilon / 2^{k / 2}$. This requires
seed length $\ell = O(\log (n / \epsilon') = O(\log(2^{k/2}n / \epsilon) =
O(\log(n) + k + \log(1 / \epsilon)).$ Recall that for exact $k$-wise we
required seed length $k \log n$.
\subsection{An improved construction}
\begin{theorem}
Let $G : \{0, 1\}^{k\log n} \rightarrow \{0, 1\}^n$ be the generator previously
described that samples a $k$-wise independent distribution (or any linear
$G$). If we replace the input to $G$ with a small bias distribution of
$\epsilon' = \epsilon / 2^k$, then the output of $G$ is $\epsilon$-close to
being $k$-wise independent.
\end{theorem}
\begin{proof}
Consider any parity test $S$ on $k$ bits on the output of $G$. It can be
shown that $G$ is a linear map, that is, $G$ simply takes its seed and it
multiplies it by a matrix over the field GF(2) with two elements. Hence,
$S$ corresponds to a test $S'$ on the input of $G$, on possibly many bits.
The test $S'$ is not empty because $G$ is $k$-wise independent. Since
we fool $S'$ with error $\epsilon'$, we also fool $S$ with error $\epsilon$,
and the theorem follows by Vazirani's XOR lemma.
\end{proof}
Using the seed lengths we saw we get the following.
\begin{corollary}
There is a generator for almost $k$-wise independent distributions with
seed length $O(\log\log n + \log(1 / \epsilon) + k)$.
\end{corollary}
\section{Tribes Functions and the GMRTV Generator}
We now move to a more recent result. Consider the Tribes function,
which is a read-once CNF on $k \cdot w$ bits, given by the And of $k$
terms, each on $w$ bits. You should think of $n = k \cdot w$ where $w
\approx \log n$ and $k \approx n/\log n$.
We'd like a generator for this class with seed length $O(\log n/\epsilon)$.
This is still open! (This is just a single function, for which a generator is
trivial, but one can make this challenge precise for example by asking to
fool the Tribes function for any possible negation of the input variables.
These are $2^n$ tests and a generator with seed length $O(\log
n/\epsilon)$ is unknown.)
The result we saw earlier about fooling And gives a generator with seed
length $O(\log n)$, however the dependence on $\epsilon$ is poor.
Achieving a good dependence on $\epsilon$ has proved to be a challenge.
We now describe a recent generator \cite{GopalanMRTV12} which gives
seed length $O(\log n/\epsilon) (\log \log n)^{O(1)}$. This is incomparable
with the previous $O(\log n)$, and in particular the dependence on $n$ is
always suboptimal. However, when $\epsilon = 1/n$ the generator
\cite{GopalanMRTV12} gives seed length $O(\log n) \log \log n$ which is
better than previously available constructions.
The high-level technique for doing this is based on iteratively restricting
variables, and goes back about 30 years \cite{AjtaiW89}. This technique
seems to have been abandoned for a while, possibly due to the
spectacular successes of Nisan \cite{Nis91,Nis92}. It was revived in
\cite{GopalanMRTV12} (see also \cite{GavinskyLS12}) with an emphasis
on a good dependence on $\epsilon$.
A main tool is this claim, showing that small-bias distributions fool products
of functions with small variance. Critically, we work with non-boolean
functions (which later will be certain averages of boolean functions).
\begin{claim}
\label{var_bound} Let $f_1, f_2, ..., f_k : \{0, 1\}^w \rightarrow [0,1]$ be a
series of boolean functions. Further, let $D = (v_1, v_2, ..., v_k)$ be an
$\epsilon$-biased distribution over $wk$ bits, where each $v_i$ is $w$ bits
long. Then
$$\E_D[\prod_i f_i(v_i)] - \prod_i \E_U[f_i(U)] \leq \left(\sum_i \text{var}(f_i) \right)^d + (k2^w)^d\epsilon,$$
where $\text{var}(f) := \E[f^2] - \E^2[f]$ is variance of $f$ with respect to the
uniform distribution.
\end{claim}
This claim has emerged from a series of works, and this statement is from
a work in progress with Chin Ho Lee. For intuition, note that constant
functions have variance $0$, in which case the claim gives good bounds
(and indeed any distribution fools constant functions). By contrast, for
balanced functions the variance is constant, and the sum of the variances
is about $k$, and the claim gives nothing. Indeed, you can write Inner
Product as a product of nearly balanced functions, and it is known that
small-bias does not fool it. For this claim to kick in, we need each
variance to be at most $1/k$.
In the tribes function, the And fucntions have variance $2^{-w}$, and the
sum of the variances is about $1$ and the claim gives nothing. However,
if you perturb the Ands with a little noise, the variance drops polynomially,
and the claim is useful.
\begin{claim}
\label{g_var} Let $f$ be the AND function on $w$ bits. Rewrite it as $f(x,
y)$, where $|x| = |y| = w / 2$. That is, we partition the input into two sets.
Define $g(x)$ as:
$$g(x) = \E_y[f(x, y)],$$
where $y$ is uniform. Then $\text{var}(g) = \Theta(2^{-3w/2}).$
\end{claim}
\begin{proof}
$$\text{var}(g) = \E[g(x)^2] - \left(\E[g(x)]\right)^2 = \E_x[\E_y[f(x,y)]^2] - \left(\E_x[\E_y[f(x,y)]] \right)^2.$$
We know that $\left(\E_x[\E_y[f(x,y)]] \right)$ is simply the expected value
of $f$, and since $f$ is the AND function, this is $2^{-w}$, so the right term
is $2^{-2w}$.
We reexpress the left term as $\E_{x,y, y'}[f(x,y)f(x, y')]$. But we note that
this product is $1$ iff $x = y = y' = \bf{1}$. The probability of this happening
is $(2^{-w/2})^3 = 2^{-3w/2}$.
Thus the final difference is $2^{-3w/2}(1 - 2^{-w/2}) = \Theta(2^{-3w/2})$.
\end{proof}
We'll actually apply this claim to the Or function, which has the same
variance as And by De Morgan's laws.
We now present the main inductive step to fool tribes.
\begin{claim}
\label{replacement_bound} Let $f$ be the tribes function, where the first $t
\leq w$ bits of each of the terms are fixed. Let $w' = w - t$ be the free bits
per term, and $k' \leq k$ the number of terms that are non-constant (some
term may have become $0$ after fixing the bits).
Reexpress $f$ as $f(x, y) = \bigwedge_{k'} \left(\bigvee(x_i, y_i) \right)$,
where each term's input bits are split in half, so $|x_i| = |y_i| = w' / 2$.
Let $D$ be a small bias distribution with bias $\epsilon^c$ (for a big
enough $c$ to be set later). Then
$$\left\vert \E_{(x, y) \in U^2}[f(x,y)] - \E_{(x, y) \in (D,U)}[f(x,y)] \right\vert \leq \epsilon.$$
\end{claim}
That is, if we replace half of the free bits with a small bias distribution, then
the resulting expectation of the function only changes by a small amount.
To get the generator from this claim, we repeatedly apply Claim
\ref{replacement_bound}, replacing half of the bits of the input with another
small bias distribution. We repeat this until we have a small enough
remaining amount of free bits that replacing all of them with a small bias
distribution causes an insignificant change in the expectation of the output.
At each step, $w$ is cut in half, so the required number of repetitions to
reduce $w'$ to constant is $R = \log(w) = \log \log (n)$. Actually, as
explained below, we'll stop when $w = c' \log \log 1/\epsilon$ for a suitable
constant $c'$ (this arises from the error bound in the claim above, and
we).
After each replacement, we incur an error of $\epsilon$, and then we incur
the final error from replacing all bits with a small bias distribution. This
final error is negligible by a result which we haven't seen, but which is
close in spirit to the proof we saw that bounded independence fools AND.
The total accumulated error is then $\epsilon' = \epsilon \log \log(n)$. If we
wish to achieve a specific error $\epsilon$, we can run each small bias
generator with $\epsilon / \log \log (n)$.
At each iteration, our small bias distribution requires $O(\log (n /
\epsilon))$ bits, so our final seed length is $O(\log (n / \epsilon))
\text{poly}\log\log(n)$.
\begin{proof}[Proof of Claim \ref{replacement_bound}]
Define $g_i(x) = \E_y[\bigvee_i(x_i, y_i)]$, and rewrite our target
expression as:
$$\E_{x \in U}\left[\prod g_i(x_i)\right] - \E_{x \in D}\left[\prod g_i(x_i)\right].$$
This is in the form of Claim \ref{var_bound}. We also note that from Claim
\ref{g_var} that $\text{var}(g_i) = 2^{-3w'/2}$.
We further assume that $k' \leq 2^{w'} \log (1 / \epsilon)$. For if this is not
true, then the expectation over the first $2^{w'} \log (1 / \epsilon)$ terms is
$\leq \epsilon$, because of the calculation
$$(1 - 2^{-w'})^{2^{w'} \log (1 / \epsilon)} \leq \epsilon.$$
Then we can reason as in the proof that bounded independence fools AND
(i.e., we can run the argument just on the first $2^{w'} \log (1 / \epsilon)$
terms to show that the products are close, and then use the fact that it is
small under uniform, and the fact that adding terms only decreases the
probability under any distribution).
Under the assumption, we can bound the sum of the variances of $g$ as:
$$\sum \text{var}(g_i) \leq k' 2^{-3w' / 2} \leq 2^{-\Omega(w')}\log(1 / \epsilon).$$
If we assume that $w' \ge c \log \log(1 / \epsilon)$ then this sum is $\leq
2^{-\Omega(w')}$.
We can then plug this into the bound from Claim \ref{var_bound} to get
$$(2^{-\Omega(w')})^d + (k2^{w'})^d \epsilon^c = 2^{-\Omega(dw')} + 2^{O(dw')}\epsilon^c.$$
Now we set $d$ so that $\Omega(dw') = \log(1 / \epsilon)+1$, and the
bound becomes: $$\epsilon / 2 + (1 / \epsilon)^{O(1)}\epsilon^{c} \leq
\epsilon.$$ By making $c$ large enough the claim is proved.
\end{proof}
In the original paper, they apply these ideas to read-once CNF formulas.
Interestingly, this extension is more complicated and uses additional
ideas. Roughly, the progress measure is going to be number of terms in
the CNF (as opposed to the width). A CNF is broken up into a small
number of Tribes functions, the above argument is applied to each Tribe,
and then they are put together using a general fact that they prove, that if
$f$ and $g$ are fooled by small-bias then also $f \wedge g$ on disjoint
inputs is fooled by small-bias.
\bibliographystyle{alpha}
\bibliography{C:/home/krv/math/OmniBib}
\end{document}
\end{rem}