%%.tex template for scribes
%Use \equation, \[ \], \align at will, but try not to use too fancy environments.
%Known limitations:
%Don't define any new commands, especially those for math mode.
%Doing math inside \text inside math screws things up,
%e.g. $ \text{this = $x$}$ won't work.
%don't use \qedhere
\documentclass[12pt]{article}
\usepackage[latin9]{inputenc}
\usepackage{amsthm}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{hyperref}
\title{JustinThaler-Viola-Special-Topics-Lec10-}%%%%%%% remove before sending!!
\makeatletter
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Textclass specific LaTeX commands.
\theoremstyle{plain}
\newtheorem{thm}{\protect\theoremname}
\theoremstyle{plain}
\newtheorem{lem}[thm]{\protect\lemmaname}
\theoremstyle{plain}
\newtheorem{prop}[thm]{\protect\propositionname}
\theoremstyle{remark}
\newtheorem{claim}[thm]{\protect\claimname}
\theoremstyle{plain}
\newtheorem{conjecture}[thm]{\protect\conjecturename}
\theoremstyle{definition}
\newtheorem{problem}[thm]{\protect\problemname}
\theoremstyle{remark}
\newtheorem{rem}[thm]{\protect\remarkname}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% User specified LaTeX commands.
%lyx2wpost preable August 3 2017.
%This is an evolving preamble which works in tandem with myConfig5.cfg
%and the lyx2wpost script
\usepackage[english]{babel}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Textclass specific LaTeX commands.
\newtheorem{theorem}{Theorem}[section]
%\newtheorem{conjecture}[theorem]{Conjecture}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{proposition}[theorem]{Proposition}
%\newtheorem{claim}[theorem]{Claim}
\newtheorem{remark}[theorem]{Remark}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{construction}[theorem]{Construction}
%Theorem 1 instead of Theorem 0.1
\renewcommand{\thetheorem}{%
\arabic{theorem}}
\renewenvironment{theorem}
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Theorem \thetheorem.}}{\vspace{0.2cm} }
\ifcsname thm\endcsname
\renewenvironment{thm}%
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Theorem \thetheorem.}}{\vspace{0.2cm} }%
\fi
\ifcsname conjecture\endcsname
\renewenvironment{conjecture}
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Conjecture \thetheorem.}}{\vspace{0.2cm} }
\fi
\ifcsname corollary\endcsname
\renewenvironment{corollary}
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Corollary \thetheorem.}}{\vspace{0.2cm} }
\fi
\ifcsname proposition\endcsname
\renewenvironment{proposition}
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Proposition \thetheorem.}}{\vspace{0.2cm} }
\fi
\ifcsname prop\endcsname
\renewenvironment{prop}%
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Proposition \thetheorem.}}{\vspace{0.2cm} }%
\fi
\ifcsname claim\endcsname
\renewenvironment{claim}%
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Claim \thetheorem.}}{\vspace{0.2cm} }%
\fi
\ifcsname definition\endcsname
\renewenvironment{definition}
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Definition \thetheorem.}}{\vspace{0.2cm} }
\fi
\ifcsname lemma\endcsname
\renewenvironment{lemma}
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Lemma \thetheorem.}}{\vspace{0.2cm} }
\fi
\ifcsname lem\endcsname
\renewenvironment{lem}%
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Lemma \thetheorem.}}{\vspace{0.2cm} }%
\fi
\ifcsname remark\endcsname
\renewenvironment{remark}
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Remark \thetheorem.}}{\vspace{0.2cm} }
\fi
\ifcsname rem\endcsname
\renewenvironment{rem}%
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Remark \thetheorem.}}{\vspace{0.2cm} }%
\fi
\ifcsname problem\endcsname
\renewenvironment{problem}%
{\vspace{0.2cm} \par \refstepcounter{theorem} \noindent \textbf{Problem \thetheorem.}}{\vspace{0.2cm} }%
\fi
%%%%%%%%%%%%%%%%%%%%%%%% Biswa's %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\newcommand{\NC}{\mathsf{NC}}
% \newcommand{\PRG}{\mathsf{PRG}}
% \newcommand{\CC}{\mathsf{CC}}
% \newcommand{\Enc}{\mathsf{Enc}}
% \newcommand{\Dec}{\mathsf{Dec}}
% \newcommand{\Obf}{\mathsf{Obf}}
% \newcommand{\sk}{\mathsf{sk}}
% \newcommand{\ct}{\mathsf{ct}}
% \newcommand{\gen}{\mathsf{gen}}
% \newcommand{\PPT}{\mathsf{PPT}}
%\newcommand{\mathsf{SURJ}}{\mathsf{SURJ}}
%\newcommand{\mathsf{OR}}{\mathsf{OR}}
%\newcommand{\mathsf{AND}}{\mathsf{AND}}
%\newcommand{\mathsf{TH}}{\mathsf{TH}}
%%%%%%%%%%%%%%%%%%%%%%%% Biswa's %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\renewenvironment{proof}{\par \noindent \textbf{Proof:}}{\hfill $\square$ \newline \newline} %QED \par\vspace{0.5cm}}
\renewcommand{\qedsymbol}{$\square$}
\renewcommand{\paragraph}[1]{\textbf{#1}. }
\makeatother
\providecommand{\claimname}{Claim}
\providecommand{\conjecturename}{Conjecture}
\providecommand{\lemmaname}{Lemma}
\providecommand{\problemname}{Problem}
\providecommand{\propositionname}{Proposition}
\providecommand{\remarkname}{Remark}
\providecommand{\theoremname}{Theorem}
\begin{document}
\global\long\def\E{\mathrm{\mathbb{E}}}
%\global\long\def\e{\mathrm{\mathbb{\epsilonilon}}}
\noindent
\href{http://www.ccs.neu.edu/home/viola/classes/spepf17.html}{Special
Topics in Complexity Theory}, Fall 2017. Instructor:
\href{http://www.ccs.neu.edu/home/viola/}{Emanuele Viola}
%\href{http://www.ccs.neu.edu/home/viola/classes/spepf17.html}{For the
%class webpage click here.}
\section{Lecture 10, Scribe: Biswaroop Maiti}
This is a guest lecture by Justin Thaler regarding lower bounds on approximate degree \cite{bun2017polynomial,bun2015hardness,bun2017nearly}. We will sketch some details of the lower bound on the approximate degree of $\mathsf{AND} \circ \mathsf{OR}$, $\mathsf{SURJ}$ and some intuition about the techniques used.
Recall the definition of $\mathsf{SURJ}$ from the previous lecture as below:
\begin{definition} The surjectivity function $\mathsf{SURJ}\colon
\left(\{-1,1\}^{\log R}\right)^N \to \{-1,1\}$, takes input
$x=(x_1, \dots, x_N)$ where each $x_i \in \{-1, 1\}^{\log R}$ is interpreted as an element of $[R]$. $\mathsf{SURJ}(x)$ has value $-1$
if and only if $\forall j \in [R], \exists i\colon x_i = j$.
\end{definition}
Recall from the last lecture that $\mathsf{AND}_R \circ \mathsf{OR}_N \colon \{-1,1\}^{R\times N} \rightarrow \{-1,1\}$ is the block-wise composition of the $\mathsf{AND}$ function on $R$ bits and the $\mathsf{OR}$ function on $N$ bits. In general, we will denote the block-wise composition of two functions $f$, and $g$, where $f$ is defined on $R$ bits and $g$ is defined on $N$ bits, by $f_R \circ g_N$. Here, the outputs of $R$ copies of $g$ are fed into $f$ (with the inputs to each copy of $g$ being pairwise disjoint). The total number of inputs to $f_R \circ g_N$ is $R \cdot N$.
% In these lectures, we finish the proof of the approximate degree lower
% bound for AND-OR function, then we move to the surjectivity function
% SURJ. Finally we discuss quasirandom groups.
\subsection{Lower Bound of $d_{1/3}( \mathsf{SURJ} )$ via lower bound of $d_{1/3}($AND-OR$)$}
%\geq d_{1/3}^{\leq N}(\mathsf{AND-OR})
\begin{claim}\label{l8-9:thm:1}
$d_{1/3}( \mathsf{SURJ} ) = \widetilde{\Theta}(n^{3/4}) $.
\end{claim}
% \begin{theorem}\label{l8-9:thm:2}
% $d_{1/3}^{\leq N}($AND-OR$) \geq \Omega(N^{3/4})$ for some suitable $R = \mathsf{TH}eta(N)$.
% \end{theorem}
We will look at only the lower bound in the claim. We interpret the input as a list of $N$ numbers from $[R]:= \{1,2, \cdots R\}$. As presented in \cite{bun2017polynomial}, the proof for the lower bound proceeds in the following steps.
\begin{enumerate}
\item Show that to approximate $\mathsf{SURJ}$, it is necessary to approximate the block-composition $\mathsf{AND}_R \circ \mathsf{OR}_N$ on inputs of Hamming weight at most $N$. i.e., show that $d_{1/3}(\mathsf{surj}) \geq d_{1/3}^{\leq N}(\mathsf{AND}_R \circ \mathsf{OR}_N)$.
Step 1 was covered in the previous lecture, but we briefly recall a bit of intuition for why the claim in this step is reasonable.
The intuition comes from the fact that the \emph{converse} of the claim is easy to establish, i.e.,
it is easy to show that in order to approximate $\mathsf{SURJ}$, it is \emph{sufficient} to approximate $\mathsf{AND}_R \circ \mathsf{OR}_N$
on inputs of Hamming weight exactly $N$.
This is because $\mathsf{SURJ}$ can be expressed as an $\mathsf{AND}_R$ (over
all range items $r \in [R]$) of the $\mathsf{OR}_N$ (over all inputs $i \in [N]$) of ``Is input $x_i$ equal to $r$''?
Each predicate of the
form in quotes is computed exactly by a polynomial of degree $\log R$, since it depends on only $\log R$ of the input bits,
and exactly $N$ of the predicates (one for each $i \in [N]$) evaluates to TRUE.
%This means that in order to approximate $\mathsf{SURJ}$,
%it is \emph{sufficient} to approximate $\mathsf{AND}_R \circ \mathsf{OR}_N$ on inputs of Hamming weight exactly $N$.
Step 1 of the lower bound proof for $\mathsf{SURJ}$ in \cite{bun2017polynomial} shows a converse, namely that the \emph{only} way to approximate $\mathsf{SURJ}$
is to approximate $\mathsf{AND}_R \circ \mathsf{OR}_N$ on inputs of Hamming weight at most $N$.
\item Show that $d_{1/3}^{\leq N}(\mathsf{AND}_R \circ \mathsf{OR}_N) = \widetilde{\Omega}(n^{3/4})$, i.e., the degree required to approximate $\mathsf{AND} _R \circ \mathsf{OR}_N$ on inputs of Hamming weight at most $N$ is at least $D=\widetilde{\Omega}(n^{3/4})$.\\\\
Step 2 itself proceeds via two substeps:
\begin{enumerate}
\item Give a dual witness $\Phi$ for $\mathsf{AND}_R \cdot \mathsf{OR}_N$ that has places little mass (namely, total mass less then $(R \cdot N \cdot D)^{-2D}$) on inputs of hamming weight $\geq N$.\\
\item By modifying $\Phi$, give a dual witness $\Phi'$ for $\mathsf{AND}_R \cdot \mathsf{OR}_N$ that places zero mass on inputs of Hamming weight $\geq N$.\\
\end{enumerate}
\end{enumerate}
In \cite{bun2017polynomial}, both Substeps 2a and 2b proceed entirely in the dual world (i.e., they explicitly manipulate dual witnesses $\Phi$ and $\Phi'$). The main goal of this section of the lecture notes is to explain how to replace Step 2b of the argument of \cite{bun2017polynomial} with a wholly ``primal'' argument.
The intuition of the primal version of Step 2b that we'll cover is as follows. First, we will show that a polynomial $p \colon \{-1, 1\}^{R \cdot N} \to \{-1, 1\}$ of degree $D$ that is bounded on the low Hamming Weight inputs, cannot be too big on the high Hamming weight inputs. In particular, we will prove the following claim.
\begin{claim} \label{key} If $p \colon \{-1, 1\}^{M} \to \mathbb{R}$ is a degree $D$ polynomial that satisfies $|p(x)| \leq 4/3$ on all inputs of $x$ of Hamming weight at most $D$, then $|p(x)| \leq (4/3) \cdot D \cdot M^D$ for \emph{all} inputs $x$.
\end{claim}
Second, we will explain that the dual witness $\Phi$ constructed in Step 2a has the following ``primal'' implication:
\begin{claim}
\label{key2}
For $D \approx N^{3/4}$, any polynomial $p$ of degree $D$ satisfying $|p(x) - \left(\mathsf{AND}_R \circ \mathsf{OR}_N\right)(x) | \leq 1/4$ for all inputs $x$ of Hamming weight at most $N$
must satisfy $|p(x)| > (4/3) \cdot D \cdot ( R \cdot N)^D$ for some input $x \in \{-1, 1\}^{R \cdot N}$.
\end{claim}
Combining Claims \ref{key} and \ref{key2}, we conclude that no polynomial $p$ of degree $D \approx N^{3/4}$ can satisfy \begin{equation} \label{eq} |p(x) - (\mathsf{AND}_R \circ \mathsf{OR}_N)(x) | \leq 1/4 \text{ for all inputs } x \text{ of Hamming weight at most } N,\end{equation}
which is exactly the desired conclusion of Step 2.
This is because any polynomial $p$ satisfying Equation \eqref{eq} also satisfies $|p(x)| \leq 5/4 \leq 4/3$ for all $x$ of Hamming weight of most $N$,
and hence Claim \ref{key} implies that \begin{equation} \label{eq2} |p(x)| \leq \frac{4}{3} \cdot D \cdot (R \cdot N)^D \text{ for \emph{all} inputs } x \in \{-1, 1\}^{R \cdot N}.\end{equation}
But Claim \ref{key2} states that any polynomial satisfying both Equations \eqref{eq} and \eqref{eq2} requires degree strictly larger than $D$.
In the remainder of this section, we prove Claims \ref{key} and \ref{key2}.
\subsection{Proof of Claim \ref{key}}
\begin{proof}[Proof of Claim \ref{key}]
For notational simplicity, let us prove this claim for polynomials on domain $\{0, 1\}^{M}$, rather than $\{-1, 1\}^M$.
\medskip \noindent \textbf{Proof in the case that $p$ is symmetric.} Let us assume first that $p$ is symmetric, i.e., $p$ is only a function of the Hamming weight $|x|$ of its input $x$. Then $p(x) = g(|x|)$ for some degree $D$ univariate polynomial $g$ (this is a direct consequence of Minsky-Papert symmetrization, which we have seen in the lectures before). %Therefore, we can consider only low degree univariate polynomials.
We can express $g$ as below in the same spirit of Lagrange interpolation.
\[g(t)= \sum_{k=0}^{D-1} g(k) \cdot \prod_{i=0}^{D-1} \frac{t-i}{k-i}. \]
Here, the first term, $g(k)$ ,is bounded in magnitude by $|g(k)| \leq 4/3$, and $|\prod_{i=0}^{D-1} \frac{t-i}{k-i}| \leq M^D$. Therefore, we get the final bound: \[|g(t)| \leq (4/3) \cdot D \cdot M^D.\]
\medskip \noindent \textbf{Proof for general $p$.} Let us now consider the case of general (not necessarily symmetric) polynomials $p$. Fix any input $x \in \{0, 1\}^M$.
The goal is to show that $|p(x)| \leq \frac43 D \cdot M^D$.
Let us consider a polynomial $\hat{p}_x \colon \{0,1\}^{|x|} \rightarrow \{0,1\} $ of degree $D$ obtained from $p$ by restricting each input $i$
such that $x_i=0$ to have the value 0. For example, if $M=4$ and $x=(0, 1, 1, 0)$, then $\hat{p}_x(y_2, y_3)=p(0, y_2, y_3, 0)$. We will exploit three properties
of $\hat{p}_x$:
\begin{itemize}
\item[Property 1.] $\deg(\hat{p}_x) \leq \deg(p) \leq D$.
\item[Property 2.] Since $|p(x)| \leq 4/3$ for all inputs with $|x| \leq D$, $\hat{p}_x(y)$ satisfies the analogous property: $|\hat{p}_x(y)| \leq 4/3$ for all inputs with $|y| \leq D$.
\item[Property 3.]
If $\mathbf{1}_{|x|}$ denotes
the all-1s vector of length $|x|$, then $\hat{p}_x(\mathbf{1}_x) = p(x)$.
\end{itemize}
Property 3 means that our goal is
to show that $|\widehat{p}(\mathbf{1}_x)| \leq \frac43 \cdot D \cdot M^D$.
%We show a proof by example where $p$ is a function on $M=4$ inputs. Look at any arbitrary $x \in \{0,1\}^4$, say $x=\{0,1,1,0\}$. Then, we restrict the polynomial by setting all the variables corresponding to the zero coordinates of $x$, to 0, and introduce variables corresponding to 1-coordinates i.e. we look at the restricted polynomial $\widehat{p_y(x)}:=p(0, y_2, y_3, 0)$. This is still a degree $D$ polynomial but on fewer variables. Since, we get $\widehat{p(x)}$ by setting $y_2=1,y_3=1$, and if we decompose the restricted polynomial in terms of hamming weights, then there is exactly one input that corresponds to $p(x)$ and that is of maximum hamming weight.
\medskip
Let $p^{\text{symm}}_x \colon \{0, 1\}^{M} \to \mathbb{R}$ denote the symmetrized version of $\hat{p}_x$, i.e., $p^{\text{symm}}_x(y) = \mathbb{E}_{\sigma}[\hat{p}_x(\sigma(y))]$,
where the expectation is over a random permutation $\sigma$ of $\{1, \dots, |x|\}$, and $\sigma(y)=(y_{\sigma(1)}, \dots, y_{\sigma(|x|)})$.
Since $\sigma(\mathbf{1}_{|x|}) = \mathbf{1}_{|x|}$ for all permutations $\sigma$, $p^{\text{symm}}_x(\mathbf{1}_{|x|}) = \hat{p}_x(\mathbf{1}_{|x|}) = p(x)$.
But $p^{\text{symm}}_x$ is symmetric, so Properties 1 and 2 together mean that the analysis from the first part of the proof implies that $|p^{\text{symm}}_x(y)| \leq \frac43 \cdot D \cdot M^D$ for all inputs
$y$. In particular, letting $y = \mathbf{1}_{|x|}$, we conclude that $|p(x)| \leq \frac43 \cdot D \cdot M^D$ as desired.
\end{proof}
\medskip \textbf{Discussion.} One may try to simplify the analysis of the general case in the proof Claim \ref{key}
by considering the polynomial $p^{\text{symm}} \colon \{0, 1\}^M \to \mathbb{R}$ defined via $p^{\text{symm}}(x)=\mathbb{E}_{\sigma}[p(\sigma(x))$],
where the expectation is over permutations $\sigma$ of $\{1, \dots, M\}$.
$p^{\text{symm}}$ is a symmetric polynomial, so the analysis for symmetric polynomials immediately implies that $|p^{\text{symm}}(x)| \leq \frac43 \cdot D \cdot M^D$.
Unfortunately, this does \emph{not} mean that $|p(x)| \leq \frac43 \cdot D \cdot M^D$.
This is because the symmetrized polynomial $p^{\mathsf{symm}}$ is averaging the values of $p$ over all those inputs of a given Hamming weight. So, a bound on this averaging polynomial does not preclude the case where $p$ is massively positive on some inputs of a given Hamming weight, and massively negative on other
inputs of the same Hamming weight, and these values cancel out to obtain a small average value. That is, it is not enough to conclude that on the average over inputs of any given Hamming weight, the magnitude of $p$ is not too big.
Thus, we needed to make sure that when we symmetrize $\hat{p}_x$ to $p^{\mathsf{sym}}_x$, such large cancellations don't happen, and a bound of the average value of $\hat{p}$ on a given Hamming weight really gives us a bound on $p$ on the input $x$ itself. We
defined $\hat{p}_x$ so that $\hat{p}_x(\mathbf{1}_M) = p(x)$. Since there is only \emph{one} input in $\{0, 1\}^M$ of Hamming weight $M$,
$p^{\text{symm}}_x(\mathbf{1}_M)$ does not average $\hat{p}_x$'s values on many inputs, meaning we don't need to worry about massive cancellations.
%It is just a consequence of polynomial interpolation that if you have a low degree polynomial that is not too big on low Hamming weight input (for which we need to look at up to layer $D$, where $D$ is the degree), then it cannot be too big on high Hamming weight inputs, in particular it can be at most $\exp(D)$. This is under the assumption that $p$ is symmetric. However, we don't have to do symmetrization, this can be shown for general $p$ that does not have to be symmetric.
\medskip \textbf{A note on the history of Claim \ref{key}.} Claim \ref{key} was implicit in \cite{razborov2010sign}. They explicitly showed a similar bound for symmetric polynomials using primal view and (implicitly) gave a different (dual) proof of the case for general polynomials.
%They show a dual witness that bounds the magnitude of the polynomial on the maximum hamming weight input in terms of the magnitude of the polynomial on low hamming weight inputs.
%This can be extended for general polynomials as well, but was not explicitly mentioned by them.
% Recall from the last lecture that AND-OR$:\{0,1\}^{R\times N} \rightarrow
% \{0,1\}$ is the composition of the AND function on $R$ bits and the OR
% function on $N$ bits. We also proved the following lemma.
%The statement that it is bounded on low Hamming weight inputs, already tells you that this condition on bounding the value on high Hamming weight inputs is redundant. This is a general statement for any low degree polynomial.
%The intuition here is that the surjectivity function ensures that .. The inputs of each OR are fresh inputs and do not overlap. We have already seen Step 1 before. We will look at the Step 2 in more detail. We would like to find a dual witness that has no weight on the inputs of hamming weight more than $N$. But, in Step 1 we do not put any such restriction.
\subsection{Proof of Claim \ref{key2}}
\subsubsection{Interlude Part 1: Method of Dual Polynomials \cite{bun2017nearly}} A dual polynomial is a dual solution to a certain linear program that captures the approximate degree of any given function $f \colon \{-1, 1\}^n \to \{-1, 1\}$. These polynomials act as certificates of the high approximate degree of $f$. The notion of strong LP duality implies that the technique is lossless, in comparison to symmetrization techniques which we saw before. For any function $f$ and any $\varepsilon$, there is always some dual polynomial $\Psi$ that witnesses a tight $\varepsilon$-approximate degree lower bound for $f$. A dual polynomial that witnesses the fact that $\mathsf{d}_\varepsilon(f) \geq d$ is a function $\Psi \colon \{-1, 1\}^n \rightarrow \{-1, 1\}$ satisfying three properties:
\begin{itemize}
\item \textbf{Correlation analysis: } $$\sum_{x \in \{-1,1\}^n }{\Psi(x) \cdot f(x)} > \varepsilon.$$ If $\Psi$ satisfies this condition, it is said to be well-correlated with $f$. \\
\item \textbf{Pure high degree: } For all polynomials $p \colon \{-1, 1\}^n \rightarrow \mathbb{R}$ of degree less than $d$, we have
$$\sum_{x \in \{-1,1\}^n } { p(x) \cdot \Psi(x)} = 0.$$ If
$\Psi$ satisfies this condition, it is said to have \emph{pure high degree} at least $d$.
\item \textbf{ $\ell_1$ norm: } $$\sum_{x \in \{-1,1\}^n }|\Psi(x)| = 1.$$
\end{itemize}
\subsubsection{Interlude Part 2: Applying The Method of Dual Polynomials To Block-Composed Functions}
For any function $f \colon \{-1, 1\}^n \to \{-1, 1\}$, we can write an LP capturing the approximate degree of $f$. We can prove lower bounds on the approximate degree of $f$ by proving lower bounds on the value of feasible solution of this LP. One way to do this is by writing down the Dual of the LP, and exhibiting a feasible solution to the dual, thereby giving an upper bound on the value of the Dual. By the principle of LP duality, an upper bound on the Dual LP will be a lower bound of the Primal LP. Therefore, exhibiting such a feasible solution, which we call a dual witness, suffices to prove an approximate degree lower bound for $f$.
However, for any given dual witness, some work will be required to verify that the witness indeed meets the criteria imposed by the Dual constraints.
When the function $f$ is a block-wise composition of two functions, say $h$ and $g$, then we can try to construct a good dual witness for $f$ by looking at dual witnesses for each of $h$ and $g$, and combining them carefully, to get the dual witness for $h \circ g$.
The dual witness $\Phi$ constructed in Step 2a for $\mathsf{AND} \circ \mathsf{OR}$ is expressed below in terms of the dual witness of the inner $\mathsf{OR}$ function viz. $\Psi_{\mathsf{OR}}$ and the dual witness of the outer $\mathsf{AND}$, viz. $\Psi_{ \mathsf{AND} }$. % if it follows certain conditions stated below.
\begin{equation} \label{tired} \Phi(x_1 \dots x_R) = \Psi_{ \mathsf{AND} }\left( \cdots, \mathsf{sgn}(\Psi_{\mathsf{OR}}(x_i)), \cdots \right) \cdot \prod_{i=1}^R| \Psi_{\mathsf{OR}}(x_i)|. \end{equation}
This method of combining dual witnesses $\Psi_{\mathsf{AND}}$ for the ``outer'' function $\mathsf{AND}$ and $\Psi_{\mathsf{OR}}$ for the ``inner function''
$\Psi_{\mathsf{OR}}$ is referred to in \cite{bun2017polynomial, bun2017nearly} as \emph{dual block composition}.
\subsubsection{Interlude Part 3: Hamming Weight Decay Conditions} Step 2a of the proof of the $\mathsf{SURJ}$ lower bound from \cite{bun2017polynomial} gave a dual witness $\Phi$ for $\mathsf{AND}_R \circ \mathsf{OR}_N$ (with $R=\Theta(N)$) that has:
\begin{equation} \text{ pure high degree } D=\tilde{\Omega}(N^{3/4}) \label{eqsigh1} \end{equation}
\begin{equation} \ell_1\text{-norm equal to one}, \label{eqsigh2} \end{equation}
\begin{equation}
\text{correlation } .4 \text{ with } \mathsf{AND}_R \circ \mathsf{OR}_N, \label{eqsigh3} \end{equation} and
%also satisfies Equation \eqref{eq3} below. % and \eqref{eq4} below.
%In substeps 2a and 2b of the proof of the lower bound for $\mathsf{SURJ}$, we will reason about dual witnesses $\Phi$ for $\mathsf{AND}_R \circ \mathsf{OR}_N$
%and $\Psi_{\mathsf{OR}}$ for $\mathsf{OR}_N$ that satisfy various notions of ``Hamming-weight decay'', which means that these dual witnesses do not place much mass
%on inputs of high Hamming weight. Specifically, Step 2a of the proof requires showing that %The first condition from the first step states that for inputs of weight more than $N$, the
%% \mathsf{AND} \mathsf{OR} $\mathsf{AND}$ $\mathsf{SURJ}$ $\mathsf{AND_R \cdot OR_N}$
%%\mathsf{AND_R \cdot OR_N}
\begin{equation} \label{eq3}
\sum_{|x|>N} {|\Phi(x)|} \ll (R \cdot N \cdot D)^{-2D}.
\end{equation}
%\begin{equation} \label{eq4}
%\text{For all } t=0, \dots, N,
% \sum_{|x|=t} {|\Phi(x)|} \leq \frac{1}{15 \cdot (1+t)^2}.
%\end{equation}
Equation \eqref{eq3} is a very strong ``Hamming weight decay'' condition: it shows that the total mass that $\Psi$
places on inputs of high Hamming weight is very small. Hamming weight decay conditions
play an essential role in the lower bound analysis for $\mathsf{SURJ}$ from \cite{bun2017polynomial}.
In addition to Equation \eqref{eq3} itself being a Hamming weight decay condition,
\cite{bun2017polynomial}'s proof that $\Phi$ satisfies Equation \eqref{eq3} exploits the fact that the dual witness $\Psi_{\mathsf{OR}}$
for $\mathsf{OR}$ can be chosen to simultaneously have pure high degree $N^{1/4}$, and to satisfy the following weaker Hamming weight decay condition:
\begin{claim} \label{nomore}
There exist constants $c_1, c_2$ such that for all $t=0, \cdots N$,
\begin{equation} \label{done}
\sum_{|x|=t} { \Psi_{\mathsf{OR}}(x)} \leq c_1 \cdot \frac{1}{(1+t)^2} \cdot \exp(-c_2 \cdot t/N^{1/4}).
\end{equation}
\end{claim}
(We will not prove Claim \ref{nomore} in these notes, we simply state it to highlight the importance of dual decay to the analysis of $\mathsf{SURJ}$).
\medskip Dual witnesses satisfying various notions of Hamming weight decay have a natural primal interpretation:
they witness approximate degree lower bounds for the target function ($\mathsf{AND}_R \circ \mathsf{OR}_N$ in the case of Equation \eqref{eq3}, and $\mathsf{OR}_N$
in the case of Equation \eqref{done}) \emph{even when the approximation is allowed to be exponentially large on
inputs of high Hamming weight}. This primal interpretation of dual decay is formalized in the following claim.
\begin{claim} \label{veryclose}
Let $L(t)$ be any function mapping $\{0, 1, \dots, n\}$ to $\mathbb{R}_+$. Suppose $\Psi$ is a dual witness for $f$ on $n$ bits satisfying the following properties:
\begin{itemize}
\item (Correlation): $\sum_{x \in \{-1,1\}^n }{\Psi(x) \cdot f(x)} > 1/3$.
\item ($\ell_1$-norm): $\sum_{x \in \{-1,1\}^n }{|\Psi(x)|} =1$.
\item (Pure high degree): $\Psi$ has pure high degree $D$.
\item (Dual decay): $\sum_{t=0}^n \sum_{|x|=t} |\Psi(x)| \cdot L(t) \leq 1/3 $ for all $t = 0, 1, \dots, n$.
\end{itemize}
Then there is no degree $D$ polynomial $p$ such that for all $t = 0, 1, \dots, n$, \begin{equation} \label{swift} |p(x)-f(x)| \leq L(t) \text{ for all } |x|=t.\end{equation}
\end{claim}
\begin{proof}
Let $p$ be any degree $D$ polynomial. Since $\Psi$ has pure high degree $D$, $\sum_{x \in \{-1, 1\}^n} p(x) \cdot \Psi(x)=0$.
We will now show that if $p$ satisfies Equation \eqref{swift}, then the other two properties satisfied by $\Psi$ (correlation and dual decay)
together imply that $\sum_{x \in \{-1, 1\}^n} p(x) \cdot \Psi(x) >0$, a contradiction.
\begin{eqnarray*}
\sum_{x \in \{-1, 1\}^n} \Psi(x) \cdot p(x) =
\sum_{x \in \{-1, 1\}^n} \Psi(x) \cdot f(x) - \sum_{x \in \{-1, 1\}^n} \Psi(x) \cdot (p(x) - f(x))\\
\geq 1/3 - \sum_{x \in \{-1, 1\}^n} |\Psi(x)| \cdot |p(x) - f(x)|\\
\geq 1/3 - \sum_{t=0}^n \sum_{|x|=t} |\Psi(x)| \cdot L(t) > 0
\end{eqnarray*}
Here, Line 2 exploited that $\Psi$ has correlation at least $1/3$ with $f$, Line 3 exploited the assumption
that $p$ satisfies Equation \eqref{swift}, and Line 4 exploited the dual decay condition that $\Psi$ is assumed to satisfy.
\end{proof}
\subsubsection{Proof of Claim \ref{key2}}
\begin{proof}
%Claim \ref{key2} follows from Equation \eqref{eq3}, combined with Claim \ref{veryclose}. Specifically,
Let $\Phi$ be the dual witness for $f=\mathsf{AND}_R \circ \mathsf{OR}_N$ constructed in Step 2a of the argument from \cite{bun2017polynomial} (which satisfies Equations \eqref{eqsigh1}-\eqref{eq3}).
Claim \ref{key2} follows so long as we can apply
Claim \ref{veryclose} to $\Phi$ and $f$, with $$L(t) = \begin{cases} 1/4 \text{ if } t \leq N \\ (R \cdot N \cdot D)^{D} \text{ if } t > N. \end{cases}$$
So we need to show that $\Phi$ satisfies the four properties required to apply Claim \ref{veryclose}. The first three properties
are immediate from Equations \eqref{eqsigh1}-\eqref{eqsigh3}.
The fourth property holds via the following derivation:
\begin{align*} \sum_{t=0}^{R\cdot N} \sum_{|x|=t} |\Phi(x)| \cdot L(t) \leq\\
\sum_{t=0}^{N} \sum_{|x|=t} |\Phi(x)| \cdot L(t) +
\sum_{t=N+1}^{R \cdot N} \sum_{|x|=t} |\Phi(x)| \cdot L(t) \leq\\
1/4 +
\sum_{t=N+1}^{R \cdot N} \sum_{|x|=t} |\Phi(x)| \cdot (R \cdot N \cdot D)^{D} < 1/3.\end{align*}
Here, the second inequality holds because $\Phi$ has $\ell_1$-norm equal to 1, and
the final inequality holds by Equation \eqref{eq3}.
\end{proof}
%\textbf{Last part}
\section{Generalizing the analysis for $\mathsf{SURJ}$ to prove a nearly linear approximate degree lower bound for $\mathsf{AC}^0$}
Now we take a look at how to extend this kind of analysis for $\mathsf{SURJ}$ to obtain even stronger approximate degree lower bounds for
other functions in $\mathsf{AC}^0$.
Recall that $\mathsf{SURJ}$ can be expressed as an $\mathsf{AND}_R$ (over
all range items $r \in [R]$) of the $\mathsf{OR}_N$ (over all inputs $i \in [N]$) of ``Is input $x_i$ equal to $r$''?
That is, $\mathsf{SURJ}$ simply evaluates $\mathsf{AND}_R \circ \mathsf{OR}_N$ on the inputs $(\dots, y_{j, i}, \dots)$ where $y_{j, i}$ indicates
whether or not input $x_i$ is equal to range item $j \in [R]$.
Our analysis for $\mathsf{SURJ}$ can be viewed as follows: It is a way to turn the $\mathsf{AND}$ function on $R$ bits (which has approximate degree $\Theta\left(\sqrt[]{R}\right)$) into a function on close to $R$ bits, with polynomially larger approximate degree (i.e. $\mathsf{SURJ}$ is defined on $N \log R$ bits where, say, the value of $N$ is $100R$, i.e., it is a function on $100 R \log R$ bits). So, this function is on not much more than $R$ bits, but has approximate degree $\tilde{\Omega}(R^{3/4})$,
polynomially larger than the approximate degree of $\mathsf{AND}_R$. %But we showed that $\mathsf{SURJ}$ has an approximate degree polynomially larger the approximate degree of $\mathsf{AND}_R$. %, which again is $\Theta(\sqrt{R})$).
Hence, the lower bound for $\mathsf{SURJ}$ can be seen as a hardness amplification result. We turn the $\mathsf{AND}$ function on $R$ bits to a function on slightly more bits, but the approximate degree of the new function is significantly larger.
From this perspective, the lower
bound proof for $\mathsf{SURJ}$ showed that in order to approximate $\mathsf{SURJ}$, we need to not only approximate the $\mathsf{AND}_R$ function, but, additionally, instead of feeding the inputs directly to $\mathsf{AND}$ gate itself, we are further driving up the degree by feeding the input through $\mathsf{OR}_N$ gates. The intuition is that we cannot do much better than merely approximate the $\mathsf{AND}$ function and then approximating the block composed $\mathsf{OR}_N$ gates. This additional approximation of the $\mathsf{OR}$ gates give us the extra exponent in the approximate degree expression.
We will see two issues that come in the way of naive attempts at generalizing our hardness amplification technique from $\mathsf{AND}_R$
to more general functions. % \circ\mathsf{OR}$ naively.
\subsection{Interlude: Grover's Algorithm}
\textbf{Grover's algorithm} \cite{grover1996fast} is a quantum algorithm that finds with high probability the unique input to a black box function that produces a given output, using $O({\sqrt {N}})$ queries on the function, where $N$ is the size of the the domain of the function. It is originally devised as a database search algorithm that searches an unsorted database of size $N$ and determines whether or not there is a record in the database that satisfies a given property in $O(\sqrt[]{N})$ queries. This is strictly better compared to deterministic and randomized query algorithms because they will take $\Omega(N)$ queries in the worst case and in expectation respectively. Grover's algorithm is optimal up to a constant factor, for the quantum world.
\subsection{Issues: Why a dummy range item is necessary}
In general, let us consider the problem of taking any function $f$ that does not have maximal approximate degree (say, with approximate degree $n^{1-\Omega(1)}$), and turning it into a function on roughly
the same number of bits, but with polynomially larger approximate degree.
%approximating a general function $f$. It is possible that $f$ is actually $\mathsf{SURJ}$.
%Consider the case, when $f$ is a function with a large approximate degree.
In analogy with how $\mathsf{SURJ}(x_1, \dots, x_N)$
equals $\mathsf{AND}_R \circ \mathsf{OR}_N$ evaluated on inputs $(\dots, y_{ji}, \dots)$, where $y_{ji}$ indicates whether or not $x_i=j$,
we can consider the block composition $f_R \circ \mathsf{OR}_N$ evaluated on $(\dots, y_{ji}, \dots)$, and hope that
this function has polynomially larger approximate degree than $f_R$ itself.
% We can look at this as $f_R \circ \mathsf{TH}_K$, where $\mathsf{TH}_K$ is the Threshold gate i.e. it outputs 1 if the number of inputs that are set to 1 is at least $K$. We know $\mathsf{OR}$ is a $\mathsf{TH}_1$ gate. For $k-$distinctness, the function we consider is $f_R=\mathsf{OR}$.
Unfortunately, this does not work. Consider for example the case $f_R = \mathsf{OR}_R$. The function $\mathsf{OR}_R \circ \mathsf{OR}_N = \mathsf{OR}_{R \cdot N}$ evaluates to 1
on all possible vectors $(\dots, y_{ji}, \dots, )$, since all such vectors of Hamming weight exactly $N > 0$.
One way to try to address this is to introduce a dummy range item, all occurrences of which are simply ignored by the function.
That is, we can consider the (hopefully harder) function $G$ to interpret its input as a list of $N$ numbers from the range
$[R]_0 := \{0, 1, \dots, R\}$, rather than range $[R]$, and define $G=f_R \circ \mathsf{OR}_N(y_{1, 1}, \dots, y_{R, N})$ (note that variables
$y_{0, 1}, \dots, y_{0, N}$, which indicate whether or not each input $x_i$ equals range item $0$, are simply ignored).
In fact, in the previous lecture we already used this technique of introducing a ``dummy'' range item, to ease the lower
bound analysis for $\mathsf{SURJ}$ itself. Last lecture we covered Step 1
of the lower bound proof for $\mathsf{SURJ}$, and we let $z_0= \sum_{i = 1}^N y_{0, i}$ denote
the frequency of the dummy range item, 0. The introduction of this dummy range item let us replace the condition $\sum_{j=0}^R z_j = N$ (i.e., the sum of the frequencies of all the range items
is \emph{exactly} $N$) by the condition $\sum_{j=1}^R z_j \leq N$ (i.e., the sum of the frequencies of all the range items is \emph{at most} $N$).
%This in turn substantially eased %is for the $\mathsf{OR}$ function, that generalizes the definition of approximate degree $\mathsf{deg}_{1/3}^{\leq
%N}(\mathsf{AND}-\mathsf{OR})$, as opposed to $\mathsf{\deg}_{1/3}(\mathsf{AND} \circ\mathsf{OR})$.
%We introduce dummy range items that are slack variables for the corresponding LP and they are necessary. We saw this (in previous lectures) when we introduced the variables $z_0=y_{01},
% \dots, y_{0N}$ and replaced the condition $\sum_{j=0}^R z_j = N$ by the condition $\sum_{j=1}^R z_j \leq N$ for the $\mathsf{OR}$ function, that generalizes the definition of approximate degree $\mathsf{deg}_{1/3}^{\leq
%N}(\mathsf{AND} \circ \mathsf{OR})$, as opposed to $\mathsf{\deg}_{1/3}(\mathsf{AND} \circ\mathsf{OR})$.
%An example to see that this step is necessary will be as follows. If we analyze the nested / block composed $\mathsf{OR}$ function on an arbitrary function $f$ and we introduced slack variables / range items $y_{ij}$, then, a bad example will be when $f:= \mathsf{OR}$, because all the lower $\mathsf{OR}$ functions might evaluate to 1 and the function may be expressed as a constant and therefore it can have exact degree 0. If we did $\mathsf{OR}$ on all $y_{ij}$, and all these slack range items were 1, we end up approximating this function by a constant and the approximate degree we would infer will be 0. Therefore, we will need the slack variables at this level of generality.
%If we try to generalize this naive analysis to any arbitrary function $f$ without slack range items, it will not work, because this function can have degree 0, because all the $\mathsf{OR}$ function might be 1 and the function may be expressed as a constant and therefore will have degree 0, viz. when $f:=\mathsf{OR}$. Therefore, generalizing the technique for $\mathsf{SURJ}$ like this does not work.
% The property of $\mathsf{AND}$ that was useful in the analysis of $\mathsf{SURJ}$ was that its dual witness has high approximate degree $\sqrt[]{R}$. In the correlation analysis, it was also used that the block composition $\mathsf{AND} \circ \mathsf{OR}$
% This can be fixed by applying this technique to $f \circ \mathsf{AND}_{\log R}$, instead of to $f$ itself.
\subsection{A dummy range item is not sufficient on its own}
\label{s:explain}
Unfortunately, introducing a dummy range item is not sufficient on its own. That is, even when the range is is $[R]_0$
rather than $[R]$, the function $G=f_R \circ \mathsf{OR}_N(y_{1, 1}, \dots, y_{R, N})$
may have approximate degree that is \emph{not} polynomially larger than that of $f_R$ itself.
An example of this is (once again) $f_R = \mathsf{OR}_R$. With a dummy range item,
$\mathsf{OR}_R \circ \mathsf{OR}_N(y_{1, 1}, \dots, y_{R, N})$ evaluates to TRUE if and only if
at least one of the $N$ inputs is \emph{not} equal to the dummy range item $0$. This problem has approximate degree $O(N^{1/2})$
(it can be solved using Grover search).
% generalizing the technique for $\mathsf{SURJ}$ may not work out in general. An example is once again, if we try to find the approximate degree of $f$ by considering $f \circ \mathsf{OR}$. This is because, the approximate degree of $f \circ \mathsf{OR}$ is $\Theta(\widetilde{\mathsf{deg}}(f))$.
%With the slack variables for $f \circ \mathsf{OR}$ what the LP is asking is, does there exist any range item in the inputs that is not a dummy range item? Therefore, all the dummy range items are ignored. In other words, it indicates, if the input contains any dummy range item at all. This can alternatively be done by a quantum search algorithm based on Grover's search that would require $\sqrt[]{N}$ queries and would yield a corresponding bound.
Therefore, the most naive approach at general hardness amplification, even with a dummy range item, does not work.
\subsection{The approach that works}
%The essential flaw in this approach is that, we must ensure that the approximate degree of the whole function considered, which was $f_R \circ \mathsf{OR}$ in the previous example to be polynomially larger than the approximate degree of $f$ itself, unless $f$ has maximum possible approximate degree.
The approach that succeeds is to consider the block composition $f \circ \mathsf{AND}_{\log R} \circ \mathsf{OR}_N$ (i.e., apply the naive approach with a dummy range item
not to $f_R$ itself, but to $f_R \circ \mathsf{AND}_{\log R}$). As pointed out in Section \ref{s:explain}, the $\mathsf{AND}_{\log R}$ gates are crucial here for the analysis to go through.
It is instructive to look at where exactly the lower bound proof for $\mathsf{SURJ}$ breaks down if we try to adapt
it to the function $\mathsf{OR}_R \circ \mathsf{OR}_N = \mathsf{OR}_{R \cdot N}$ (rather than the function $\mathsf{AND}_R \circ \mathsf{OR}_N$ which
we analyzed to prove the lower bound for $\mathsf{SURJ}$). Then we can see why the introduction of the $\mathsf{AND}_{\log R}$ gates
fixes the issue.
When analyzing the more naively defined function $G= \left(\mathsf{OR}_R \circ \mathsf{OR}_N\right)(y_{1, 1}, \dots, y_{R, N})$ (with a dummy range item), Step 1 of the lower bound analysis for $\mathsf{SURJ}$ \emph{does work}
unmodified to imply that in order to approximate $G$, it is necessary to approximate block composition of $\mathsf{OR}_R \circ \mathsf{OR}_N$ on inputs of Hamming weight at most $N$. But Step 2 of the analysis breaks down: one can approximate $\mathsf{OR}_R \circ \mathsf{OR}_N$ on inputs of Hamming weight at most $N$ using degree just $O(N^{1/2})$.
Why does the Step 2 analysis break down for $\mathsf{OR}_R \circ\mathsf{OR}_N$? If one tries to construct a dual witness
$\Phi$ for $\mathsf{OR}_R \circ \mathsf{OR}_N$ by applying dual block composition (cf. Equation \eqref{tired}, but with the dual witness $\Psi_{\mathsf{AND}}$ for $\mathsf{AND}_R$ replaced by a dual witness for $\mathsf{OR}_R$),
$\Phi$ will not be well-correlated with $\mathsf{OR}_R \circ\mathsf{OR}_N$.
Roughly speaking, the correlation analysis thinks of each copy of the inner dual witness $\Psi_{\mathsf{OR}}(x_i)$ as consisting of a sign, $\mathsf{sgn}(\Psi_{\mathsf{OR}})(x_i)$,
and a magnitude $|\Psi_{\mathsf{OR}}(x_i)|$, and the inner dual witness ``makes an error'' on $x_i$ if it outputs the wrong sign, i.e., if
$\mathsf{sgn}(\Psi_{\mathsf{OR}})(x_i) \neq \mathsf{OR}(x_i)$. The correlation analysis
winds up performing a union bound over the probability (under the product distribution $\prod_{i=1}^{R}|\Psi_{\mathsf{OR}}(x_i)|$) that \emph{any} of the $R$ copies of the inner dual witness
makes an error. Unfortunately, each copy of the inner dual witness makes an error with constant probability under the distribution $|\Psi_{\mathsf{OR}}|$. So at least one of them makes an error under the product distribution with probability very close to 1.
This means that the correlation of the dual-block-composed dual witness $\Phi$ with $\mathsf{OR}_R \circ \mathsf{OR}_N$ is poor.
%When we analyzed the $\mathsf{AND} \circ \mathsf{OR}$ tree for $\mathsf{SURJ}$, we used the fact that the dual witness for $\mathsf{AND}$ had pure degree $\sqrt[]{R}$. In the corresponding correlation analysis, when we consider the dual witness, it is sufficient to consider the block composition $\Psi_\mathsf{AND} \circ \Psi_\mathsf{OR}$ i.e. the block composition of the corresponding dual witnesses. This naive analysis does not go through for $\mathsf{OR} \circ \mathsf{OR}$. This is because, in the analysis of $f \circ \mathsf{OR}$, a union bound of the error terms in the correlation of the dual witness of the inner function, with the actual inner function results in blowing up the error parameter of the function as a whole. Instead,
But if we look at $\mathsf{OR}_R \circ \left(\mathsf{AND}_{\log R} \circ \mathsf{OR}_N\right)$, the correlation analysis \emph{does} go through.
That is, we can give a dual witness $\Psi_{\mathsf{in}}$ for $\mathsf{AND}_{\log R} \circ \mathsf{OR}_N$ and a dual witness $\Psi_{\mathsf{out}}$ for $\mathsf{OR}_R$
such that the the dual-block-composition of $\Psi_{\mathsf{out}}$ and $\Psi_{\mathsf{in}}$ is well-correlated with $\mathsf{OR}_R \circ \left(\mathsf{AND}_{\log R} \circ \mathsf{OR}_N\right)$.
This is because \cite{bun2015hardness} showed that for $\epsilon=1-1/(3R)$, $d_{\epsilon}\left(\mathsf{AND}_{\log R} \circ \mathsf{OR}_N\right) = \Omega(N^{1/2})$.
This means that $\left(\mathsf{AND}_{\log R} \circ \mathsf{OR}_N\right)$ has a dual witness $\Psi_{\mathsf{in}}$ that ``makes an error'' with probability just $1/(3R)$.
This probability of making an error is so low that a union bound over all $R$ copies of $\Psi_{\mathsf{in}}$ appearing in the dual-block-composition of $\Psi_{\mathsf{out}}$
and $\Psi_{\mathsf{in}}$ implies that with probability at least $1/3$, \emph{none} of the copies of $\Psi_{\mathsf{in}}$ make an error.
In summary, the key difference between $\mathsf{OR}_N$ and $\mathsf{AND}_{\log R} \circ \mathsf{OR}_N$ that allows the lower bound
analysis to go through for the latter but not the former is that the latter has $\epsilon$-approximate degree $\Omega(N^{1/2})$ for $\epsilon = 1-1/(3R)$,
while the former only has $\epsilon$-approximate degree $\Omega(N^{1/2})$ if $\epsilon$ is a constant bounded away from 1.
%\[\Psi(x_1 \cdots x_N)=\Psi_{\mathsf{out}}( \cdots, \mathsf{sgn} (\Psi_{\mathsf{in}}(x_i)), \cdots ) \cdot \prod_{i=1}^R |\Psi_{\mathsf{in}}(x_i)| \]
% if the blocks of $\mathsf{OR}$ are passed through the $\mathsf{AND}$ gates before feeding to the outer $\mathsf{OR}$, the error is under control. In specific, if the correlation of the dual of the inner bit is off by probability $\frac{1}{3R}$, the error in the block with $\mathsf{AND}$ gate on $R$ copies of blocks, reduces to $R \cdot \frac{1}{3R} \sim 1/3$. $\mathsf{AND}_{\log R} \circ \mathsf{OR}_N$ was already analyzed in a prior paper \cite{bun2015hardness} and showed that $\widetilde{\mathsf{deg}}_{1-1/3R}(\mathsf{AND}_{\log R} \circ \mathsf{OR}_N) = \Omega(N^{1/2})$. We extend this technique to the current work \cite{bun2017polynomial}.
To summarize, the $\mathsf{SURJ}$ lower bound can be seen as a way to turn the function $f_R = \mathsf{AND}_R$
into a harder function $G=\mathsf{SURJ}$, meaning that $G$ has polynomially larger approximate degree than $f_R$.
The
right approach to generalize the technique for arbitrary $f_R$ is to (a) introduce a dummy range item, all occurrences of which are
effectively ignored by the harder function $G$, \emph{and} (b) rather than considering the ``inner'' function $\mathsf{OR}_N$,
consider the inner function $\mathsf{AND}_{\log R} \circ \mathsf{OR}_N$, i.e., let $G=f_R \circ \mathsf{AND}_{\log R} \circ \mathsf{OR}_N(y_{1, 1} \dots, y_{R \log R, N})$. The $\mathsf{AND}_{\log R}$ gates are essential to make sure that the error in the correlation of the inner dual witness is very small, and hence the correlation analysis for
the dual-block-composed dual witness goes through.
Note that $G$ can be interpreted as follows: it breaks the range $[R \log R]_0$ up into $R$ blocks, each of length $\log R$, (the dummy range item is excluded from all
of the blocks),
and for each block it computes a bit indicating whether or not every range item in the block has frequency at least 1. It then feeds these bits into $f_R$.
By recursively applying this construction, starting with $f_R = \mathsf{AND}_R$, we get a function in AC$^0$ with approximate degree $\Omega(n^{1-\delta})$ for
any desired constant $\delta > 0$.
\subsection{$k-$distinctness}
The above mentioned very same issue also arises in \cite{bun2017polynomial}'s proof of a lower bound on the approximate degree of the $k$-distinctness function. Step 1 of the lower bound analysis for $\mathsf{SURJ}$ reduced analyzing $k$-distinctness to analyzing $\mathsf{OR} \circ \mathsf{TH}^k_N$ (restricted to inputs of Hamming weight at most $N$), where $\mathsf{TH}^k_N$ is the function that evaluates to TRUE if and only if its input has Hamming weight at least $k$. The lower bound proved in \cite{bun2017polynomial} for $k$-distinctness is $\Omega(n^{3/4-1/(2k)})$. $\mathsf{OR}$ is the $\mathsf{TH}^1$ function. So, $\mathsf{OR}_R \circ \mathsf{TH}^k$ is ``close'' to $\mathsf{OR}_R \circ \mathsf{OR}_N$. And we've seen that the correlation analysis of the dual witness obtained via dual-block-composition breaks down for $\mathsf{OR}_R \circ \mathsf{OR}_N$.
To overcome this issue, we have to show that $\mathsf{TH}^k_N$ is harder to approximate than $\mathsf{OR}_N$ itself, but we have to give up some small factor in the process. We will lose some quantity compared to the $\Omega(n^{3/4})$ lower bound for $\mathsf{SURJ}$. It may seem that this loss factor is just a technical issue and not intrinsic, but this is not so. In fact, this bound is almost tight. There is an upper bound from a complicated quantum algorithm \cite{belovs2011quantum,belovs2012learning} for $k$-distinctness that makes $O(n^{3/4-1/(2^{k+2}-4)})= n^{3/4-\Omega(1)}$ that we won't elaborate on here.
%The key to generalizing the technique for $\mathsf{SURJ}$ is to consider $\mathsf{AND}_{\log R} \circ \mathsf{OR}_N$ and make sure that the correlation of the error in the inner $\mathsf{OR}$ functions with the output is low so that the correlation analysis goes through.
% \subsubsection{Last section rambling scribe}
% This is counting what is the number of slack variables that are not dummy variables
\bibliographystyle{alpha}
\bibliography{biblio.bib}
\end{document}
Below is a Line spanning the entire width of the page
% \noindent\makebox[\linewidth]{\rule{\paperwidth}{0.4pt}}
% Below is a 2cm long line
% \noindent\rule{2cm}{0.4pt}
% Below is a 4cm long line
% \noindent\rule{4cm}{0.4pt}
% Below is a 8cm long line
\noindent\rule{8cm}{0.4pt}
\begin{lemma} Suppose that distributions $A^0, A^1$ over $\{0,1\}^{n_A}$
are $k_A$-wise indistinguishable distributions; and distributions
$B^0, B^1$ over $\{0,1\}^{n_B}$ are $k_B$-wise
indistinguishable distributions.
Define $C^0, C^1$ over $\{0,1\}^{n_A \cdot n_B}$ as follows:
$C^b$: draw a sample $x \in \{0,1\}^{n_A}$ from $A^b$, and replace
each bit $x_i$ by a sample of $B^{x_i}$ (independently).
Then $C^0$ and $C^1$ are $k_A \cdot k_B$-wise indistinguishable.
\end{lemma}
To finish the proof of the lower bound on the approximate degree of the
AND-OR function, it remains to see that AND-OR can distinguish well the
distributions $C^0$ and $C^1$. For this, we begin with observing that we
can assume without loss of generality that the distributions have disjoint
supports.
\begin{claim}
For any function $f$, and for any $k$-wise indistinguishable distributions
$A^0$ and $A^1$, if $f$ can distinguish $A^0$ and $A^1$ with probability
$\epsilonilon$ then there are distributions $B^0$ and $B^1$ with the same
properties ($k$-wise indistinguishability yet distinguishable by $f$) and
also with disjoint supports. (By disjoint support we mean for any $x$ either
$\Pr[B^0 = x] = 0$ or $\Pr[B^1 = x] = 0$.)
\end{claim}
\begin{proof}
Let distribution $C$ be the ``common part" of $A^0$ and $A^1$. That is to
say, we define $C$ such that $\Pr[C = x] := \min \{\Pr[A^0 = x], \Pr[A^1 =
x]\}$ multiplied by some constant that normalize $C$ into a distribution.
Then we can write $A^0$ and $A^1$ as
\begin{align*}
A^0 &= pC + (1-p) B^0 \,,\\
A^1 &= pC + (1-p) B^1 \,,
\end{align*}
where $p \in [0,1]$, $B^0$ and $B^1$ are two distributions. Clearly $B^0$
and $B^1$ have disjoint supports.
Then we have
\begin{align*}
\E[f(A^0)] - \E[f(A^1)] =&~p \E[f(C)] + (1-p) \E[f(B^0)] \notag\\
&- p \E[f(C)] - (1-p) \E[f(B^1)] \\
=&~(1-p) \big( \E[f(B^0)] - \E[f(B^1)] \big) \\
\leq&~\E[f(B^0)] - \E[f(B^1)] \,.
\end{align*}
Therefore if $f$ can distinguish $A^0$ and $A^1$ with probability
$\epsilonilon$ then it can also distinguish $B^0$ and $B^1$ with such
probability.
Similarly, for all $S \neq \varnothing$ such that $|S| \leq k$, we have
\[
0 = \E[\chi_S(A^0)] - \E[\chi_S(A^1)] =
(1-p) \big( \E[\chi_S(B^0)] - \E[\chi_S(B^1)] \big) = 0 \,.
\]
Hence, $B^0$ and $B^1$ are $k$-wise indistinguishable.
\end{proof}
Equipped with the above lemma and claim, we can finally prove the
following lower bound on the approximate degree of AND-OR.
\begin{theorem}
$d_{1/3}($AND-OR$) = \Omega(\sqrt{RN})$.
\end{theorem}
\begin{proof}
Let $A^0, A^1$ be $\Omega(\sqrt{R})$-wise indistinguishable distributions
for AND with advantage $0.99$, i.e. $\Pr[\mathrm{AND}(A^1) = 1] >
\Pr[\mathrm{AND}(A^0) = 1] + 0.99$. Let $B^0, B^1$ be
$\Omega(\sqrt{N})$-wise indistinguishable distributions for OR with
advantage $0.99$. By the above claim, we can assume that $A^0, A^1$
have disjoint supports, and the same for $B^0, B^1$. Compose them by
the lemma, getting $\Omega(\sqrt{RN})$-wise indistinguishable
distributions $C^0,C^1$. We now show that AND-OR can distinguish
$C^0, C^1$:
\begin{itemize}
\item $C_0$: First sample $A^0$. As there exists a unique $x = 1^R$
such that $\mathrm{AND}(x)= 1$, $\Pr[A^1 = 1^R] >0$. Thus by
disjointness of support $\Pr[A^0 = 1^R] = 0$. Therefore when
sampling $A^0$ we always get a string with at least one ``$0$''. But
then ``$0$'' is replaced with sample from $B^0$. We have $\Pr[B^0 =
0^N] \geq 0.99$, and when $B^0 = 0^N$, AND-OR$=0$.
\item $C_1$: First sample $A^1$, and we know that $A^1 = 1^R$ with
probability at least $0.99$. Each bit ``$1$'' is replaced by a sample
from $B^1$, and we know that $\Pr[B^1 = 0^N] = 0$ by disjointness
of support. Then AND-OR$=1$.
\end{itemize}
Therefore we have $d_{1/3}($AND-OR$)= \Omega(\sqrt{RN})$.
\end{proof}
\subsection{Lower Bound of $d_{1/3}($SURJ$)$}
In this subsection we discuss the approximate degree of the surjectivity
function. This function is defined as follows.
\begin{definition} The surjectivity function SURJ$\colon
\left(\{0,1\}^{\log R}\right)^N \to \{0,1\}$, which takes input
$(x_1, \dots, x_N)$ where $x_i \in [R]$ for all $i$, has value $1$
if and only if $\forall j \in [R], \exists i\colon x_i = j$.
\end{definition}
% \bibliographystyle{alpha}
% \bibliography{C:/home/krv/math/OmniBib}
% \end{document}
First, some history. Aaronson first proved that the approximate degree of
SURJ and other functions on $n$ bits including ``the collision problem'' is
$n^{\Omega(1)}$. This was motivated by an application in quantum
computing. Before this result, even a lower bound of $\omega(1)$ had not
been known. Later Shi improved the lower bound to $n^{2/3}$, see
\cite{AaronsonS04}. The instructor believes that the quantum framework
may have blocked some people from studying this problem, though it may
have very well attracted others. Recently Bun and Thaler \cite{BunT17}
reproved the $n^{2/3}$ lower bound, but in a quantum-free paper, and
introducing some different intuition. Soon after, together with Kothari, they
proved \cite{BunKT17} that the approximate degree of SURJ is
$\mathsf{TH}eta(n^{3/4})$.
We shall now prove the $\Omega(n^{3/4})$ lower bound, though one piece
is only sketched. Again we present some things in a different way from
the papers.
For the proof, we consider the AND-OR function under the promise that
the Hamming weight of the $RN$ input bits is at most $N$. Call the
approximate degree of AND-OR under this promise $d_{1/3}^{\leq
N}($AND-OR$)$. Then we can prove the following theorems.
\begin{theorem}\label{l8-9:thm:1}
$d_{1/3}($SURJ$) \geq d_{1/3}^{\leq N}($AND-OR$)$.
\end{theorem}
\begin{theorem}\label{l8-9:thm:2}
$d_{1/3}^{\leq N}($AND-OR$) \geq \Omega(N^{3/4})$ for some suitable $R = \mathsf{TH}eta(N)$.
\end{theorem}
In our settings, we consider $R = \mathsf{TH}eta(N)$. Theorem \ref{l8-9:thm:1}
shows surprisingly that we can somehow ``shrink'' $\mathsf{TH}eta(N^2)$ bits of
input into $N\log N$ bits while maintaining the approximate degree of the
function, under some promise. Without this promise, we just showed in the
last subsection that the approximate degree of AND-OR is $\Omega(N)$
instead of $\Omega(N^{3/4})$ as in Theorem \ref{l8-9:thm:2}.
\begin{proof}[Proof of Theorem \ref{l8-9:thm:1}]
Define an $N \times R$ matrix $Y$ s.t.~the 0/1 variable $y_{ij}$ is the
entry in the $i$-th row $j$-th column, and $y_{ij} = 1$ iff $x_i = j$. We can
prove this theorem in following steps:
\begin{enumerate}
\item $d_{1/3}($SURJ$(\overline{x})) \geq d_{1/3}($AND-OR$(\overline{y}))$ under
the promise that each row has weight $1$;
\item let $z_j$ be the sum of the $j$-th column, then $d_{1/3}($AND-OR$(\overline{y}))$ under
the promise that each row has weight $1$, is at least $d_{1/3}($AND-OR$(\overline{z}))$
under the promise that $\sum_j z_j = N$;
\item $d_{1/3}($AND-OR$(\overline{z}))$ under the promise that $\sum_j z_j = N$, is
at least $d_{1/3}^{=N}($AND-OR$(\overline{y}))$;
\item we can change ``$=N$'' into ``$\leq N$''.
\end{enumerate}
Now we prove this theorem step by step.
\begin{enumerate}
\item Let $P(x_1, \dots, x_N)$ be a polynomial for SURJ, where $x_i = (x_i)_1, \dots, (x_i)_{\log R}$.
Then we have
\[
(x_i)_k = \sum_{j: k\text{-th bit of }j \text{ is } 1} y_{ij}.
\]
Then the polynomial $P'(\overline{y})$ for AND-OR$(\overline{y})$ is the
polynomial $P(\overline{x})$ with $(x_i)_k$ replaced as above, thus the
degree won't increase. Correctness follows by the promise.
\item This is the most extraordinary step, due to Ambainis
\cite{Ambainis05}. In this notation, AND-OR becomes the indicator
function of $\forall j, z_j \neq 0$. Define
\[
Q(z_1, \dots, z_R) := \mathop{\E}_{\substack{\overline{y}: \text{ his rows have weight } 1\\ \text{and is consistent with }\overline{z}}} P(\overline{y}).
\]
Clearly it is a good approximation of AND-OR$(\overline{z})$. It remains
to show that it's a polynomial of degree $k$ in $z$'s if $P$ is a
polynomial of degree $k$ in $y$'s.
Let's look at one monomial of degree $k$ in $P$:
$y_{i_1j_1}y_{i_2j_2}\cdots y_{i_kj_k}$. Observe that all $i_\ell$'s are
distinct by the promise, and by $u^2 = u$ over $\{0,1\}$. By chain rule
we have
\[
\E[y_{i_1j_1}\cdots y_{i_kj_k}] = \E[y_{i_1j_1}]\E[y_{i_2j_2}|y_{i_1j_1} = 1] \cdots
\E[y_{i_kj_k}|y_{i_1j_1}=\cdots =y_{i_{k-1}j_{k-1}} = 1].
\]
By symmetry we have $\E[y_{i_1j_1}] = \frac{z_{j_1}}{N}$, which is linear
in $z$'s. To get $\E[y_{i_2j_2}|y_{i_1j_1} = 1]$, we know that every other
entry in row $i_1$ is $0$, so we give away row $i_1$, average over $y$'s
such that $\left\{\begin{array}{ll}
y_{i_1j_1} = 1 &\\
y_{ij} = 0 & j\neq j_1
\end{array}\right.$ under the promise and consistent with $z$'s. Therefore
\[
\E[y_{i_2j_2}|y_{i_1j_1} = 1] = \left\{
\begin{array}{ll}
\frac{z_{j_2}}{N-1} & j_1 \neq j_2,\\
\frac{z_{j_2}-1}{N-1} & j_1 = j_2.
\end{array}\right.
\]
In general we have
\[
\E[y_{i_kj_k}|y_{i_1j_1}=\cdots =y_{i_{k-1}j_{k-1}} = 1]
= \frac{z_{j_k} - \#\ell < k \colon j_\ell = j_k}{N-k + 1},
\]
which has degree $1$ in $z$'s. Therefore the degree of $Q$ is not
larger than that of $P$.
\item Note that $\forall j$, $z_j = \sum_i y_{ij}$. Hence by replacing $z$'s
by $y$'s, the degree won't increase.
\item We can add a ``slack'' variable $z_0$, or equivalently $y_{01},
\dots, y_{0N}$; then the condition $\sum_{j=0}^R z_j = N$ actually
means $\sum_{j=1}^R z_j \leq N$.
\end{enumerate}
\end{proof}
\begin{proof}[Proof idea for Theorem \ref{l8-9:thm:2}]
First, by the duality argument we can verify that $d_{1/3}^{\leq N}(f) \geq
d$ if and only if there exists $d$-wise indistinguishable distributions $A, B$
such that:
\begin{itemize}
\item $f$ can distinguish $A, B$;
\item $A$ and $B$ are supported on strings of weight $\leq N$.
\end{itemize}
\begin{claim}
$d_{1/3}^{\leq \sqrt{N}}($OR$_N) = \Omega(N^{1/4})$.
\end{claim}
The proof needs a little more information about the weight distribution of
the indistinguishable distributions corresponding to this claim. Basically,
their expected weight is very small.
Now we combine these distributions with the usual ones for And using the
lemma mentioned at the beginning.
What remains to show is that the final distribution is supported on
Hamming weight $\le N$. Because by construction the $R$ copies of the
distributions for Or are sampled independently, we can use concentration
of measure to prove a tail bound. This gives that all but an exponentially
small measure of the distribution is supported on strings of weight $\le N$.
The final step of the proof consists of slightly tweaking the distributions to
make that measure $0$.
\end{proof}
\subsection{Groups}
Groups have many applications in theoretical computer science.
Barrington \cite{Barrington89} used the permutation group $S_5$ to prove
a very surprising result, which states that the majority function can be
computed efficiently using only constant bits of memory (something which
was conjectured to be false). More recently, catalytic computation
\cite{BuhrmanCKLS14} shows that if we have a lot of memory, but it's full
with junk that cannot be erased, we can still compute more than if we had
little memory. We will see some interesting properties of groups in the
following.
Some famous groups used in computer science are:
\begin{itemize}
\item $\{0,1\}^n$ with bit-wise addition;
\item $\mathbb{Z}_m$ with addition mod $m$ ;
\item $S_n$, which are permutations of $n$ elements;
\item Wreath product $G:= (\mathbb{Z}_m \times \mathbb{Z}_m) \wr \mathbb{Z}_2\,$, whose elements are of the form $(a,b)z$ where $z$ is a ``flip bit'', with the following multiplication rules:
\begin{itemize}
\item $(a, b) 1 = 1 (b, a)$ ;
\item $z\cdot z' := z+z'$ in $\mathbb{Z}_2$ ;
\item $(a,b) \cdot (a',b') := (a+a', b+b')$ is the $\mathbb{Z}_m\times \mathbb{Z}_m$ operation;
\end{itemize}
An example is $(5,7)1 \cdot (2,1) 1 = (5,7) 1 \cdot 1 (1, 2) = (6,9)0$ . Generally we have
\[
(a, b) z \cdot (a', b') z' = \left\{
\begin{array}{ll}
(a + a', b+b') z+z' & z = 1\,,\\
(a+b', b + a') z+z' & z = 0\,;
\end{array}\right.
\]
\item $SL_2(q) := \{2\times 2$ matrices over $\mathbb{F}_q$ with determinant $1\},$
in other words, group of matrices $\begin{pmatrix}
a & b\\
c & d
\end{pmatrix}$ such that $ad - bc = 1$.
\end{itemize}
The group $SL_2(q)$ was invented by Galois. (If you haven't, read his
biography on wikipedia.)
\paragraph{Quiz}
Among these groups, which is the ``least abelian''? The latter can be
defined in several ways. We focus on this: If we have two high-entropy
distributions $X, Y$ over $G$, does $X \cdot Y$ has more entropy? For
example, if $X$ and $Y$ are uniform over some $\Omega(|G|)$ elements,
is $X\cdot Y$ close to uniform over $G$? By ``close to'' we mean that the
statistical distance is less that a small constant from the uniform
distribution. For $G=(\{0,1\}^n, +)$, if $Y=X$ uniform over $\{0\}\times
\{0,1\}^{n-1}$, then $X\cdot Y$ is the same, so there is not entropy
increase even though $X$ and $Y$ are uniform on half the elements.
\begin{definition}[Measure of Entropy]
For $\lVert A\rVert_2 = \left(\sum_xA(x)^2\right)^{\frac{1}{2}}$, we think of $\lVert A\rVert^2_2 = 100 \frac{1}{|G|}$ for ``high entropy''.
\end{definition}
Note that $\lVert A\rVert^2_2$ is exactly the ``collision probability'', i.e. $\Pr[A = A']$.
We will consider the entropy of the uniform distribution $U$ as very small, i.e.
$\lVert U\rVert^2_2 = \frac{1}{|G|} \approx \lVert \overline{0}\rVert^2_2$. Then we have
\begin{align*}
\lVert A - U \rVert^2_2 &= \sum_x \left(A(x) - \frac{1}{|G|}\right)^2\\
&= \sum_x A(x)^2 - 2A(x) \frac{1}{|G|} + \frac{1}{|G|^2} \\
&= \lVert A \rVert^2_2 - \frac{1}{|G|} \\
&= \lVert A \rVert^2_2 - \lVert U \rVert^2_2\\
&\approx \lVert A \rVert^2_2\,.
\end{align*}
\begin{theorem}[\cite{Gowers08}, \cite{BabaiNP08}]
If $X, Y$ are independent over $G$, then
\[
\lVert X\cdot Y - U \rVert_2 \leq \lVert X \rVert_2 \lVert Y \rVert_2 \sqrt{\frac{|G|}{d}},
\]
where $d$ is the minimum dimension of irreducible representation of $G$.
\end{theorem}
By this theorem, for high entropy distributions $X$ and $Y$, we get
$\lVert X\cdot Y - U \rVert_2 \leq \frac{O(1)}{\sqrt{|G|d}}$, thus we have
\begin{equation} \label{eq:d}
\lVert X\cdot Y - U \rVert_1 \leq \sqrt{|G|} \lVert X\cdot Y - U \rVert_2 \leq \frac{O(1)}{\sqrt{d}}.
\end{equation}
If $d$ is large, then $X \cdot Y$ is very close to uniform. The following
table shows the $d$'s for the groups we've introduced.
\begin{table}[h]
\centering
\begin{tabular}{|c|c|c|c|c|c|}\hline
$G$ & $\{0,1\}^n$ & $\mathbb{Z}_m$ & $(\mathbb{Z}_m \times \mathbb{Z}_m) \wr \mathbb{Z}_2$ & $A_n$ & $SL_2(q)$\\\hline
$d$ & $1$ & $1$ & should be very small & $\frac{\log |G|}{\log \log |G|}$ & $|G|^{1/3}$ \\ \hline
\end{tabular}
\end{table}
Here $A_n$ is the alternating group of even permutations. We can see
that for the first groups, Equation (\ref{eq:d}) doesn't give non-trivial
bounds.
But for $A_n$ we get a non-trivial bound, and for $SL_2(q)$ we get a
strong bound: we have $\lVert X\cdot Y - U \rVert_2 \leq
\frac{1}{|G|^{\Omega(1)}}$.
\bibliographystyle{alpha}
\bibliography{biblio.bib}
\end{document}
This is a template for the scribe. It is also a test to make sure
everything works. You should change the path for the bibliography
(or remove it altogether if you are not using it).
Optionally, the lectures will be posted on my blog. Using this template
minimizes the risk that my wordpress compiler won't work.
\subsection{Subsection \label{sub:Subsection}}
\subsubsection{Subsubsection}
\paragraph{Paragraph}
\begin{thm}
Theorem $\e=1/\e$.\end{thm}
\begin{lem}
\label{lem:Lemma}Lemma\end{lem}
\begin{prop}
Proposition\end{prop}
\begin{claim}
Claim\end{claim}
\begin{proof}
Done\end{proof}
\begin{conjecture}
Conjecture\end{conjecture}
\begin{enumerate}
\item A list
\item Bla\end{enumerate}
\begin{problem}
Open problem
\end{problem}
In-line math $x^{2}=\frac{1}{x}$. Displayed math
\[
x^{2}=3
\]
align
\begin{align}
x & =3\label{eq:label}\\
x & =5\label{eq:bla}
\end{align}
A reference: By Equation \ref{eq:label} and \ref{eq:bla}. A citation.
By Paper \cite{AAIPR01}. Ref to subsection \ref{sub:Subsection}.
Ref to Lemma \ref{lem:Lemma}.
\begin{rem}
This is a test.
\end{rem}