A co-NP-hard problem and a polynomial-time special case Reading: http://mathworld.wolfram.com/GraphCartesianProduct.html http://en.wikipedia.org/wiki/Function_problem We investigate the following theme: We have a computational problem which is hard to solve in general. Fortunately, we have only inputs that satisfy a specific property. This allows us to speed up the algorithm for the subset that satisfies the property. We have seen this before: k-coloring a graph is NP-hard but 2-coloring a graph is polynomial. Finding a satisfying assignment for a CNF is NP-hard but is polynomial if the clauses are of length 2. We first introduce the LeafCovering problem which is computationally hard and then we look at a special case for which we want to find a polynomial-time algorithm. According to Wikipedia, there are 5 kinds of computational problems: decision, search, counting, optimization and function problems. The LeafCovering problem is a function problem that uses a counting problem in the solution approach. Terminology: Ti=(Vi,Ei) is a tree with nodes Vi and edges Ei. (view the i as subscripts) DAG = directed acyclic graph Definition: Graph Cartesian product (GCP) Given n directed trees T1, ..., Tn, where Ti=(Vi,Ei), the graph Cartesian product is a DAG GCP(T1, ..., Tn) that has as nodes the tuples (v1, ... ,vn) where vi in Vi and the following edges: Node (v1, ..., vi, ..., vn) has as children all nodes (v1, ..., vi', ..., vn), where vi' is a child of vi in Ti, and all other components are the same. Note that GCP is a DAG (directed acyclic graph), so leaves don't repeat. Each leaf appears exactly once. The LeafCovering Problem ------------------------- INSTANCE: Given n directed trees T1, ..., Tn defining the graph Cartesian product GCP(T1, ... ,Tn), and a set of nodes, M, in GCP(T1, ... ,Tn). SOLUTION: Does each leaf in GCP(T1, ..., Tn) have some ancestor in M? The answer is either "yes" (all leaves are covered) or "no". If the answer is "no", you also provide a non-covered leaf, called a witness. Claim: LeafCovering is co-NP hard. Compare with CLRS exercise 34.4-4: Whether a boolean formula is a tautology is complete for co-NP. Example of Graph Cartesian Product: Tree T1 Baum : Node | Leaf. Tree T2: Exp : Lit | Bin. Bin : Add | Sub. GCP(T1,T2): http://www.ccs.neu.edu/home/lieber/courses/algorithms/cs4800/sp12/homeworks/hw7/graphCartesianproduct1.png Another example: Tree T1,T2: T:L|R. GCP(T1,T2): TT : LT | RT | TL | TR LT : LL | LR RT : RL | RR TL : LL | RL TR : LR | RR There are a total of 9 nodes. The leaves are: LL,RL,LR,RR. If, for example, M = (TL,TR), then there is no uncovered node while if M = (RT,TL) then LR is a witness (the only one). You can think of this problem as a cube covering problem (here we cover the 2x2 cube). This looks like a rather abstract problem but it has a very useful application: It is a type checking problem that multiple dispatch programming systems must answer. There are many more applications. There is a straight-forward brute-force algorithm: Explore all leaves of the GCP and check whether each is covered by at least one element of M. Because there are exponentially many leaves, the brute-force algorithm is not practical. Can you avoid an explicit representation of the graph Cartesian product which would clearly lead to an exponential algorithm? Now we turn our attention to the special case LeafCovering(2) where |M|=2. We assume that M only contains 2 elements. Can you propose a polynomial-time algorithm for LeafCovering(2)? ====================== DEBATES ====================== We will explore the LeafCovering(2) problem in a debate called called LEAF(2) on Piazza and in your homework. For the debate, we simplify the problem further by assuming that all trees are of this form: T / \ / \ L R (all edges are directed downward) I.e., all trees have one root and two children. Therefore, an instance now consists of: n: the number of directed trees. M: a set of n tuples, e.g., if n=3, M={(L,T,T), (T,R,L)} With this simplification it is much easier to define inputs and the problem is still interesting and a solution will generalize to the original problem. We could use the term cube covering problem instead of leaf covering problem. We try to cover the n-dimensional cube. Because we often have many more Ts than Ls and Rs in the nodes in M, we choose a more economical representation: (!v3,v9) is a convenient way to represent (T,T,L,T,T,T,T,T,R). So we represent a node in GCP as a set of literals. The first variable is v1. During the debates on Piazza you don't reveal your algorithms: they are your trade secret until after the due date. On Piazza you will use instances where a brute-force algorithm would fail to find a solution in a reasonable amount of time. If we assume the set-up of the textbook, and choose n to be 50 or larger, it takes 36 years with brute-force. Therefore, when you attack a claim use an instance where n>=50. In LEAF(2) the claims are of the form: LEAF-Claim(2,r,s,t) meaning: r is a constant factor. s is the exponent of n. t is the cross-over point (we used to call this n0 in the Landau notation). We are ready with an algorithm for LeafCovering(2) that uses at most r*(n^s) milliseconds if n >= t. n is the number of input trees. Your objective is to keep r and s as small as possible. Highest priority is to have a small s and then to have a small r. t should be at most 10. Notice that we not only strive for good asymptotic run-time but also for good practical run-time with small constants. The 2 appears in the claim because |M|=2. LEAF-Claim(2,r,s,t) says: There exists an algorithm LC for LeafCovering(2) so that for all instances inp(n) of LeafCovering(2) with n>=t and for which there exists an uncovered leaf, LC correctly solves inp(n) in at most r*(n^s) milliseconds. You make this prediction based on the analysis of your algorithm and by running your algorithm on various test cases and using the data to determine the constants r, s, t. You may want to use Wolfram Alpha to solve the fitting problem. For example, if your analysis shows that you have a cubic running time, then you need to solve a cubic fitting problem. Some of you do solve similar fitting problems in your physics courses. Try: Wolfram Alpha: cubic fit {10.1,1.2},{12.6, 2.8},{14.8,7.6},{16.0,12.8},{17.5,15.1} For the purpose of the debates, it is important that the validity of the solution can be checked quickly. Therefore, we require that the instance given has a non-covered leaf. We use again the honor system: when you try to refute a claim you must give an instance which has a non-covered leaf. After the due date you may be asked to reveal an uncovered leaf for some of the instances that you provided. In other words, as falsifier you never reveal the uncovered leaf unless asked by the teaching staff. Note that the verifier might find a different uncovered leaf than the secret one the opposer has. Another description of a claim is: For all instances in a specific subset which have a witness there exists a witness (trivial) and the verifier has a fast algorithm (makes claim interesting) to find a witness. If the solution is not valid (the leaf is covered) or the run-time is larger than predicted, the falsifier is successful. Note that we can efficiently check whether the leaf is uncovered: Check for both elements of M whether there is a path to the leaf. This can be done efficiently. We use the honor system: you must report the run-time faithfully. What you report can be checked after you have publicized your algorithm after the due date. You must follow the scientific method and make your run-times reproducible. Claim LEAF-Claim(2,r1,s1,t1) is stronger than claim LEAF-Claim(2,r2,s2,t2) if s1