A co-NP-hard problem and a polynomial-time special case

Reading:
http://mathworld.wolfram.com/GraphCartesianProduct.html
http://en.wikipedia.org/wiki/Function_problem

We investigate the following theme:
We have a computational problem which is hard to solve in general.
Fortunately, we have only inputs that satisfy a specific property.
This allows us to speed up the algorithm for the subset that
satisfies the property.

We have seen this before: k-coloring a graph is NP-hard but
2-coloring a graph is polynomial.
Finding a satisfying assignment for a CNF is NP-hard but
is polynomial if the clauses are of length 2.

We first introduce the LeafCovering problem which is
computationally hard and then we look at a special case 
for which we want to find a polynomial-time algorithm.

According to Wikipedia, there are 5 kinds of computational problems:
decision, search, counting, optimization and function problems.
The LeafCovering problem is a function problem
that uses a counting problem in the solution approach.

Terminology:
Ti=(Vi,Ei) is a tree with nodes Vi and edges Ei.
(view the i as subscripts)
DAG = directed acyclic graph

Definition: Graph Cartesian product (GCP) 
  Given n directed trees T1, ..., Tn,  
  where Ti=(Vi,Ei), the graph Cartesian product
  is a DAG
  GCP(T1, ..., Tn) that has as nodes the tuples (v1, ... ,vn) where vi in Vi and
  the following edges:
     Node (v1, ..., vi, ..., vn) has as children all nodes
     (v1, ..., vi', ..., vn), where vi' is a child of vi in Ti, and all
     other components are the same.

Note that GCP is a DAG (directed acyclic graph), so leaves don't repeat.
Each leaf appears exactly once.

The LeafCovering Problem
-------------------------

  INSTANCE:
  Given n directed trees T1, ..., Tn defining the 
  graph Cartesian product GCP(T1, ... ,Tn),
  and a set of nodes, M, in GCP(T1, ... ,Tn).

  SOLUTION:
  Does each leaf in GCP(T1, ..., Tn) have some ancestor in M?
  The answer is either "yes" (all leaves are covered) or "no". 
  If the answer is "no", you also provide a non-covered leaf, called a witness.

Claim: LeafCovering is co-NP hard.
Compare with CLRS exercise 34.4-4:
Whether a boolean formula is a tautology is complete for co-NP.

Example of Graph Cartesian Product:

Tree T1
Baum : Node | Leaf.

Tree T2:
Exp : Lit | Bin.
Bin : Add | Sub.

GCP(T1,T2):
http://www.ccs.neu.edu/home/lieber/courses/algorithms/cs4800/sp12/homeworks/hw7/graphCartesianproduct1.png

Another example:
Tree T1,T2:
T:L|R.

GCP(T1,T2):
TT : LT | RT | TL | TR
LT : LL | LR
RT : RL | RR
TL : LL | RL
TR : LR | RR

There are a total of 9 nodes.
The leaves are: LL,RL,LR,RR.

If, for example, M = (TL,TR), then there is no uncovered node 
while if M = (RT,TL) then LR is a witness (the only one).
You can think of this problem as a cube covering problem (here we
cover the 2x2 cube).

This looks like a rather abstract problem but it 
has a very useful application:
It is a type checking problem that multiple dispatch
programming systems must answer. 
There are many more applications.

There is a straight-forward brute-force algorithm:
Explore all leaves of the GCP and check whether each is covered
by at least one element of M.
Because there are exponentially many leaves, the brute-force
algorithm is not practical.

Can you avoid an explicit representation
of the graph Cartesian product which would clearly lead to an
exponential algorithm?

Now we turn our attention to the special case
LeafCovering(2) where |M|=2.
We assume that M only contains 2 elements.

Can you propose a polynomial-time algorithm for LeafCovering(2)?

======================
DEBATES
======================
We will explore the LeafCovering(2) problem in a debate called
called LEAF(2)
on Piazza and in your homework.
For the debate, we simplify the problem further
by assuming that all trees are of this form:

   T
  / \
 /   \
L     R

(all edges are directed downward)
I.e., all trees have one root and two children.

Therefore, an instance now consists of:

n: the number of directed trees. 
M: a set of n tuples, e.g., if n=3, M={(L,T,T), (T,R,L)}

With this simplification it
is much easier to define inputs and the problem is still
interesting and a solution will generalize to
the original problem.
We could use the term cube covering problem instead of leaf covering problem.
We try to cover the n-dimensional cube.

Because we often have many more Ts than Ls and Rs in the nodes
in M, we choose a more economical representation:

(!v3,v9) is a convenient way to represent (T,T,L,T,T,T,T,T,R).
So we represent a node in GCP as a set of literals. 
The first variable is v1.

During the debates on Piazza you don't reveal your algorithms:
they are your trade secret until after the due date.
On Piazza you will use instances where a brute-force algorithm
would fail to find a solution in a reasonable amount of time.
If we assume the set-up 
of the textbook, and choose n to be
50 or larger, it takes 36 years with brute-force.
Therefore, when you attack a claim use an instance where n>=50.

In LEAF(2) the claims are of the form:
LEAF-Claim(2,r,s,t) meaning:

r is a constant factor.
s is the exponent of n.
t is the cross-over point (we used to call this n0 in the Landau notation).
We are ready with an algorithm for LeafCovering(2) that uses at most
r*(n^s) milliseconds if n >= t.
n is the number of input trees.
Your objective is to keep r and s as small as possible.
Highest priority is to have a small s and then to have a small r.
t should be at most 10. 
Notice that we not only strive for good asymptotic run-time
but also for good practical run-time with small constants.
The 2 appears in the claim because |M|=2.

LEAF-Claim(2,r,s,t) says:
There exists an algorithm LC for LeafCovering(2) so that
for all instances inp(n) of LeafCovering(2) with n>=t and
for which there exists an uncovered leaf,
LC correctly solves inp(n)
in at most r*(n^s) milliseconds.

You make this prediction based on the analysis of your algorithm
and by running your algorithm on various test cases and using the
data to determine the constants r, s, t. You may want to use
Wolfram Alpha to solve the fitting problem. For example,
if your analysis shows that you have a cubic running time,
then you need to solve a cubic fitting problem.
Some of you do solve similar fitting problems in your
physics courses.

Try:
Wolfram Alpha:
cubic fit {10.1,1.2},{12.6, 2.8},{14.8,7.6},{16.0,12.8},{17.5,15.1}

For the purpose of the debates, it is important that the 
validity of the solution can be checked quickly. Therefore,
we require that the instance given has a non-covered leaf.
We use again the honor system: when you try to
refute a claim you must give an instance which has
a non-covered leaf. After the due date you may be asked
to reveal an uncovered leaf for some of the instances
that you provided.
In other words, as falsifier you never reveal the uncovered leaf
unless asked by the teaching staff.

Note that the verifier might find a different uncovered leaf
than the secret one the opposer has. 

Another description of a claim is:
For all instances in a specific subset which have a witness
there exists a witness (trivial)
and the verifier has a fast algorithm (makes claim interesting) to find a witness.

If the solution is not valid (the leaf is covered) or the run-time 
is larger than predicted, the falsifier is successful.

Note that we can efficiently check whether the leaf is uncovered:
Check for both elements of M whether there is a path to
the leaf. This can be done efficiently.

We use the honor system: you must report the run-time faithfully.
What you report can be checked after you have publicized
your algorithm after the due date.
You must follow the scientific method and make your run-times 
reproducible.

Claim LEAF-Claim(2,r1,s1,t1) is stronger than
claim LEAF-Claim(2,r2,s2,t2) 
if s1<s2 or (if s1=s2 and r1<r2)

Some slides about visualizing the notation used in LEAF(2) see:
http://www.ccs.neu.edu/home/lieber/courses/algorithms/cs4800/sp12/lectures/LeafCovering.ppt

JSON language:

We represent an n-tuple of L,R,T by
numbering (starting from 1) the elements of the tuple from 1 to n.
If L is at position j, we use -j.
If R is at position j, we use j.
T is not mentioned. All the missing numbers represent Ts.

(!v3,v9) is a convenient way to represent (T,T,L,T,T,T,T,T,R).
We write this now as: [-3,9].
[-1,2,3,4,5,6,7,-8] for n=8 represents the leaf: [L,R,R,R,R,R,R,L]

{"instance": "LeafCovering", "n": 3, "M": [[-1,-2,3],[1,3]]
}

Solution
{"solution": "LeafCovering", "answer": "covered"}
{"solution": "LeafCovering", "answer": "uncovered",
  "uncovered_leaf": [-1,-2,3] }


With GSON we can use the following classes in Java:
class Instance{
String instance;
int n;
int [][] M;
}

class Solution{
String solution;
String answer;
int[] uncovered_leaf;
}
===========================

What to turn in:

For LEAF(2):

Pointers to your Piazza interactions.
A pointer to your proposed claim and your two
opposition or agreement attempts.

Play the game with the above playground definition.
Find a fast algorithm through interacting with your partner
and on Piazza with the entire class.

Turn in your algorithm with an argument of correctness 
and describe how you tested the algorithm.

For LEAF(3):
There is a different playground for each k.
Describe how your algorithm generalizes to k=3.

Remember the hint I gave earlier: use a counting problem
for the solution. You heard about the counting problem
most likely in Discrete Structures.