Algorithms and Data
Spring 2011
Homework 7

Due date: Monday March 12, 2011
At beginning of lecture

Reading:
http://mathworld.wolfram.com/GraphCartesianProduct.html
http://en.wikipedia.org/wiki/Function_problem

In this homework Piazza will play a prominent role.
In playground LEAF(2), explained below, every team of students
(most teams have two students; a few have one) must:

-- propose ONE claim (saying that they have an algorithm that is much faster
   then the brute-force algorithm that would take 30+ years).
-- oppose or agree with TWO claims.

Each refutation protocol communication will be posted on Piazza in a separate
post (note or question).
Start posting on Piazza soon. It will be a part of your grade.

We investigate the following theme:
We have a computational problem which is hard to solve in general.
Fortunately, we have only inputs that satisfy a specific property.
This allows us to speed up the algorithm for the subset that
satisfies the property.

We have seen this before: k-coloring a graph is NP-hard but
2-coloring a graph is polynomial.
Finding a satisfying assignment for a CNF is NP-hard but
is polynomial if the clauses are of length 2.

We first introduce the LeafCovering problem which is
computationally hard and then we look at a special case 
for which we want to find a polynomial-time algorithm.

According to Wikipedia, there are 5 kinds of computational problems:
decision, search, counting, optimization and function problems.
The LeafCovering problem is a function problem
that uses a counting problem in the solution approach.

Terminology:
Ti=(Vi,Ei) is a tree with nodes Vi and edges Ei.
(view the i as subscripts)
DAG = directed acyclic graph

Definition: Graph Cartesian product (GCP) 
  Given n directed trees T1, ..., Tn,  
  where Ti=(Vi,Ei), the graph Cartesian product
  is a DAG
  GCP(T1, ..., Tn) that has as nodes the tuples (v1, ... ,vn) where vi in Vi and
  the following edges:
     Node (v1, ..., vi, ..., vn) has as children all nodes
     (v1, ..., vi', ..., vn), where vi' is a child of vi in Ti, and all
     other components are the same.

Note that GCP is a DAG (directed acyclic graph), so leaves don't repeat.
Each leaf appears exactly once.

The LeafCovering Problem
-------------------------

  INSTANCE:
  Given n directed trees T1, ..., Tn defining the 
  graph Cartesian product GCP(T1, ... .Tn),
  and a set of nodes, M, in GCP(T1, ... .Tn).

  SOLUTION:
  Does each leaf in GCP(T1, ..., Tn) have some ancestor in M?
  The answer is either "yes" (all leaves are covered) or "no". 
  If the answer is "no", you also provide a non-covered leaf, called a witness.

As a connection to your Theory of Computation course:
LeafCovering is co-NP hard.

Example of Graph Cartesian Product:

Tree T1
Baum : Node | Leaf.

Tree T2:
Exp : Lit | Bin.
Bin : Add | Sub.

GCP(T1,T2):
http://www.ccs.neu.edu/home/lieber/courses/algorithms/cs4800/sp12/homeworks/hw7/graphCartesianproduct1.png

Another example:
Tree T1,T2:
T:L|R.

GCP(T1,T2):
TT : LT | RT | TL | TR
LT : LL | LR
RT : RL | RR
TL : LL | RL
TR : LR | RR

There are a total of 9 nodes.
The leaves are: LL,RL,LR,RR.

If, for example, M = (TL,TR), then there is no uncovered node 
while if M = (RT,TL) then LR is a witness (the only one).
You can think of this problem as a cube covering problem (here we
cover the 2x2 cube).

This looks like a rather abstract problem but it 
has a very useful application:
It is a type checking problem that multiple dispatch
programming systems must answer. 
There are many more applications.

There is a straight-forward brute-force algorithm:
Explore all leaves of the GCP and check whether each is covered
by at least one element of M.
Because there are exponentially many leaves, the brute-force
algorithm is not practical.

Can you avoid an explicit representation
of the graph Cartesian product which would clearly lead to an
exponential algorithm?

Now we turn our attention to the special case
LeafCovering(2) where |M|=2.
We assume that M only contains 2 elements.

Can you propose a polynomial-time algorithm for LeafCovering(2)?

======================
PLAYGROUND
======================
We will explore the LeafCovering(2) problem in a playground
called LEAF(2)
on Piazza and in your homework.
For the playground, we simplify the problem further
by assuming that all trees are of this form:

   T
  / \
 /   \
L     R

(all edges are directed downward)
I.e., all trees have one root and two children.

Therefore, an instance now consists of:

n: the number of directed trees. 
M: a set of n tuples, e.g., if n=3, M={(L,T,T), (T,R,L)}

With this simplification it
is much easier to define inputs and the problem is still
interesting and a solution will generalize to
the original problem.
We could use the term cube covering problem instead of leaf covering problem.
We try to cover the n-dimensional cube.

Because we often have many more Ts than Ls and Rs in the nodes
in M, we choose a more economical representation:

(!v3,v9) is a convenient way to represent (T,T,L,T,T,T,T,T,R).
So we represent a node in GCP as a set of literals. 
The first variable is v1.

During the refutation protocol on Piazza you don't reveal your algorithms:
they are your trade secret until after the due date.
On Piazza you will use instances where a brute-force algorithm
would fail to find a solution in a reasonable amount of time.
If we assume the set-up of the table 2.1 on page 34
of the textbook, and choose n to be
50 or larger, it takes 36 years with brute-force.
Therefore, when you attack a claim use an instance where n>=50.

In LEAF(2) the claims are of the form:
LEAF-Claim(Proposer,2,r,s,t) meaning:

r is a constant factor.
s is the exponent of n.
t is the cross-over point (we used to call this n0 in the Landau notation).
We are ready with an algorithm for LeafCovering(2) that uses at most
r*(n^s) milliseconds if n >= t.
n is the number of input trees.
Your objective is to keep r and s as small as possible.
Highest priority is to have a small s and then to have a small r.
t should be at most 10. 
Notice that we not only strive for good asymptotic run-time
but also for good practical run-time with small constants.
The 2 appears in the claim because |M|=2.

With your claim
LEAF-Claim(Proposer,2,r,s,t) you say:
if you give me an instance for LeafCovering(2), I will give you
the result quickly (defined by r,s and t).
In other words,
you claim that for all inputs of the special kind with parameter n>=t,
you will find a solution quickly in r*(n^s) milliseconds.
You make this prediction based on the analysis of your algorithm
and by running your algorithm on various test cases and using the
data to determine the constants r, s, t. You may want to use
Wolfram Alpha to solve the fitting problem. For example,
if your analysis shows that you have a cubic running time,
then you need to solve a cubic fitting problem.
Some of you do solve similar fitting problems in your
physics courses.

Try:
Wolfram Alpha:
cubic fit {10.1,1.2},{12.6, 2.8},{14.8,7.6},{16.0,12.8},{17.5,15.1}

For the purpose of the playground, it is important that the 
validity of the solution can be checked quickly. Therefore,
we require that the instance given has a non-covered leaf.
We use again the honor system: when you try to
refute a claim you must give an instance which has
a non-covered leaf. After the due date you may be asked
to reveal an uncovered leaf for some of the instances
that you provided.
In other words, as opposer you never reveal the uncovered leaf
unless asked by the teaching staff.

Note that the proposer might find a different uncovered leaf
than the secret one the opposer has. 

Another description of a claim is:
For all instances in a specific subset which have a witness
there exists a witness (trivial)
and the proposer has a fast algorithm (makes claim interesting) to find a witness.

The refutation protocol is:
Opposer provides instance Inst to proposer
for which the opposer knows a non-covered leaf (witness).
Proposer solves instance Inst and delivers the solution (a witness)
and the run-time for obtaining the solution (from the input object,
excludes parsing time.)

If the solution is not valid (the leaf is covered) or the run-time 
is larger than predicted, the refutation is successful.

Note that we can efficiently check whether the leaf is uncovered:
Check for both elements of M whether there is a path to
the leaf. This can be done efficiently.

We use the honor system: you must report the run-time faithfully.
What you report can be checked after you have publicized
your algorithm after the due date.
You must follow the scientific method and make your run-times 
reproducible.

Claim LEAF-Claim(Proposer,2,r1,s1,t1) is stronger than
claim LEAF-Claim(Proposer,2,r2,s2,t2) 
if s1<s2 or (if s1=s2 and r1<r2)
This kind of strengthening is different than in previous playgrounds.

Some slides about visualizing the notation used in LEAF(2) see:
http://www.ccs.neu.edu/home/lieber/courses/algorithms/cs4800/sp12/lectures/LeafCovering.ppt

JSON language:

We represent an n-tuple of L,R,T by
numbering (starting from 1) the elements of the tuple from 1 to n.
If L is at position j, we use -j.
If R is at position j, we use j.
T is not mentioned. All the missing numbers represent Ts.

(!v3,v9) is a convenient way to represent (T,T,L,T,T,T,T,T,R).
We write this now as: [-3,9].
[-1,2,3,4,5,6,7,-8] for n=8 represents the leaf: [L,R,R,R,R,R,R,L]

{"instance": "LeafCovering", "n": 3, "M": [[-1,-2,3],[1,3]]
}

Solution
{"solution": "LeafCovering", "answer": "covered"}
{"solution": "LeafCovering", "answer": "uncovered",
  "uncovered_leaf": [-1,-2,3] }


With GSON we can use the following classes in Java:
class Instance{
String instance;
int n;
int [][] M;
}

class Solution{
String solution;
String answer;
int[] uncovered_leaf;
}
===========================

What to turn in:

For LEAF(2):

Pointers to your Piazza interactions.
A pointer to your proposed claim and your two
opposition or agreement attempts.

Play the game with the above playground definition.
Find a fast algorithm through interacting with your partner
and on Piazza with the entire class.

Turn in your algorithm with an argument of correctness 
and describe how you tested the algorithm.

For LEAF(3):
There is a different playground for each k.
Describe how your algorithm generalizes to k=3.

Remember the hint I gave earlier: use a counting problem
for the solution. You heard about the counting problem
most likely in Discrete Structures.