How MAXMEAN works:

(this explanation is based on Lecture 3 by David P. Williamson on 
Approximation Algorithms, course 6.891 at MIT, Feb. 2000.
http://www.cs.princeton.edu/courses/archive/spr03/cs594/dpw/)

Note: Below we are going for a fraction of the total number of clauses while
      Williamson goes for a fraction of the optimal number of clauses (OPT).

MAXMEAN works with constraints with a weight where the constraints are
formulated in terms of a finite set of relations.
It is best to consider first a special case: the MaxSAT problem
where the relations are Or relations.

Flipping Coins (Johnson's Algorithm)
====================================

MAXMEAN works by setting k variables to true. For MaxSAT it suffices
to set each variable to true with probability 1/2 to get a P-optimal assignment.
This corresponds to k = n/2.

Consider a random variable Y[j] such that Y[j] = 1 if clause j is satisfied and
Y[j] = 0 otherwise.
Let W = sum from 1 to m (w[j]*Y[j]). We want to compute the expected number of
satisfied clauses:
E(W) = sum from 1 to m (w[j]*E(Y[j])) (by linearity of expectation)
     = sum from 1 to m (w[j]*Prob[clause j is satisfied] 
     = sum from 1 to m (w[j]*(1-(1/2)**l[j]) >= 1/2 sum from 1 to m (w[j])
       where l[j] = # literals in clause j and since l[j] >= 1.

It turns out that if we could satisfy one trillionth more than 1/2, 
P = NP.

Derandomization
===============

We make the coin flipping algorithm deterministic using the Method
of Conditional Expectations attributed to Spencer and Erdoes.
This method is general and can also be used for the general MAXMEAN case.

We set the first variable the best way:
if E[W[x1=1]] > E[W[x1=0]] then x1=1 else x1=0

By the definition of conditional expectation, we can express
E[W] in terms of E[W[x1=1]] and E[W[x1=0]]. 

E[W] = E[W[x1=1]*Prob[x1=1] + E[W[x1=0]*Prob[x1=0] 
     = 1/2(E[W[x1=1]] + E[W[x1=0]])

If we set x1 to b1 as above, then the overall expectation is:

E[W[x1=b1]] >= E[W].

So we don't lose anything by setting x1 to b1. Note that this does not
depend on p = 1/2: we can generalize this approach to other coin flip
probabilities.

The remaining variables we set the same way and we are guranteed an assignment
that satisfies at least E[W].

We can think of this derandomization as a walk down a tree, 
with each choice of a variable value corresponds to a choice of a branch 
of the tree. The expectation at each node is the average of
the expected values of the nodes below it. Thus we can assure that
as we make those choices, the expected value is staying at least as large as initially.

The method of conditional expectations allows us to give deterministic
variants of randomized algorithms. Why discuss the randomized algorithms? 
The randomized algorithms are easier to state and analyze than their corresponding
deterministic variants.


Flipping Biased Coins
=====================

Algorithm MAXMEAN uses bent coins to get better approximation algorithms.
First we look at 2-satisfiable CNFs: CNFs where each pair of clauses is 
satisfied. 2-satisfiable CNFs can be viewed as MaxSAT instances 
in which all length 1 clauses are positive.
Now we set x[i] to 1 with probability p >= 1/2.

Lemma 3: Prob[clause j is satisfied] >= min(p, 1-p**2)

Proof:
if l[j]=1 then Prob[C[j] is satisfied] = p since 
every length 1 clause appears positively. If l[j]>=2 then
Prob[C[j] is satisfied]= 1-p**a(1-p)**b >= p**(a+b) >= 1-p**2
where a is the number of negated literals in the clause and
b is the number of unnegated literals. The first inequality
follows since p >= (1-p) and the second since a+b>=2

Let's choose p so that each clause has equal chance of being
satisfied. We set p = 1-p**2 which implies p = (sqrt(5)-1)/2 = g.

We can state now: Flipping biased coins with probability g is 
expected to satisfy the fraction g of the clauses.

Proof: E[W] = sum from j=1 to m (w[j]*Prob[C[j] is satisfied) >= 
  g* sum from j=1 to m (w[j])

It turns out the the fraction g is also best possible:
http://www.ccs.neu.edu/research/demeter/biblio/goldenmean.html

Now let's derive the polynomial 
sum from 1 to m (t[R[i]]*appSAT(x, R[i]))
from http://www.ccs.neu.edu/research/demeter/biblio/partial-sat-II.html
mechanically, using the formulas in the above paper.
We focus only on Or(x) and OR(!x !y) and we assume that we have t1
clauses of length 1 and t2 clauses of length 2 (t1 + t2 = 1).

We get (by consulting the truth tables)

t1*x + t2*((1-x)**2 + 2*x(1-x)) =
t1*x + t2*(1-x**2) = mean(x)

Intuitively, x is the probability with which we set the variables to 1.
Note that this is an approximation to the real average but it is close enough.

The real average would be:
(see http://www.ccs.neu.edu/research/demeter/biblio/goldenmean.html, page 415)

k*x + (bin(n,2) - bin(k,2))*y
------------------------------  = mean(k,n)
n*x  + bin(n,2)*y

Let's test this on the formula (3 variables, all with weight 1):
x = 3, y = 3, t1 = 1/2, t2 = 1/2.

x1 
     x2  
         x3
!x1 !x2
!x1     !x3
    !x2 !x3

mean(0,3) = 1/2
mean(1,3) = 2/3
mean(2,3) = 2/3
mean(3,3) = 1/2

mean(x) = 1/2*x + 1/2*(1-x**2)
mean(0) = 1/2
mean(1/3) = 5.5/9
mean(2/3) = 5.5/9
mean(1) = 1/2

There is a difference between mean(x) and mean(k,n) but it
becomes very small as n gets big.
See Theorem 2.1 in:
http://www.ccs.neu.edu/research/demeter/biblio/partial-sat-II.html

Given a set of relations, the polynomial mean(x) can be generated 
automatically using the formula in 
http://www.ccs.neu.edu/research/demeter/biblio/partial-sat-II.html

Once we have the maximal x, we set k = x*n and then work with mean(k,n).
So we use mean(x) to optimize efficiently, but the computation in MAXMEAN
must be done with mean(k,n).

Note that in 
http://www.ccs.neu.edu/home/lieber/p-optimal/karl-algo-extremal.pdf
the Method of Conditional Expectations is different than the one used above.

Let's try the following:

Instead of using 

mean[k-1](S[v=1]) > mean[k](S[v=0])

we could use:

max mean(x) for S[v=1] > max mean(x) for S[v=0]

where the max is over the intervsal [0,1].
This approach has the advantage that we only have to deal with 
mean(x) and not with mean(k,n). And this approach might also lead
to better results because we take the structure of the reduced formulas
into account and repeat the maximization computation (which is fast).

It is also easy to see that the result will be correct because

max mean(x) for S <= max mean(x) for S[v=1]
or
max mean(x) for S <= max mean(x) for S[v=0]