Algorithms and Data Spring 2011 Homework 7 Due date: Monday March 12, 2011 At beginning of lecture Reading: http://mathworld.wolfram.com/GraphCartesianProduct.html http://en.wikipedia.org/wiki/Function_problem In this homework Piazza will play a prominent role. In playground LEAF(2), explained below, every team of students (most teams have two students; a few have one) must: -- propose ONE claim (saying that they have an algorithm that is much faster then the brute-force algorithm that would take 30+ years). -- oppose or agree with TWO claims. Each refutation protocol communication will be posted on Piazza in a separate post (note or question). Start posting on Piazza soon. It will be a part of your grade. We investigate the following theme: We have a computational problem which is hard to solve in general. Fortunately, we have only inputs that satisfy a specific property. This allows us to speed up the algorithm for the subset that satisfies the property. We have seen this before: k-coloring a graph is NP-hard but 2-coloring a graph is polynomial. Finding a satisfying assignment for a CNF is NP-hard but is polynomial if the clauses are of length 2. We first introduce the LeafCovering problem which is computationally hard and then we look at a special case for which we want to find a polynomial-time algorithm. According to Wikipedia, there are 5 kinds of computational problems: decision, search, counting, optimization and function problems. The LeafCovering problem is a function problem that uses a counting problem in the solution approach. Terminology: Ti=(Vi,Ei) is a tree with nodes Vi and edges Ei. (view the i as subscripts) DAG = directed acyclic graph Definition: Graph Cartesian product (GCP) Given n directed trees T1, ..., Tn, where Ti=(Vi,Ei), the graph Cartesian product is a DAG GCP(T1, ..., Tn) that has as nodes the tuples (v1, ... ,vn) where vi in Vi and the following edges: Node (v1, ..., vi, ..., vn) has as children all nodes (v1, ..., vi', ..., vn), where vi' is a child of vi in Ti, and all other components are the same. Note that GCP is a DAG (directed acyclic graph), so leaves don't repeat. Each leaf appears exactly once. The LeafCovering Problem ------------------------- INSTANCE: Given n directed trees T1, ..., Tn defining the graph Cartesian product GCP(T1, ... .Tn), and a set of nodes, M, in GCP(T1, ... .Tn). SOLUTION: Does each leaf in GCP(T1, ..., Tn) have some ancestor in M? The answer is either "yes" (all leaves are covered) or "no". If the answer is "no", you also provide a non-covered leaf, called a witness. As a connection to your Theory of Computation course: LeafCovering is co-NP hard. Example of Graph Cartesian Product: Tree T1 Baum : Node | Leaf. Tree T2: Exp : Lit | Bin. Bin : Add | Sub. GCP(T1,T2): http://www.ccs.neu.edu/home/lieber/courses/algorithms/cs4800/sp12/homeworks/hw7/graphCartesianproduct1.png Another example: Tree T1,T2: T:L|R. GCP(T1,T2): TT : LT | RT | TL | TR LT : LL | LR RT : RL | RR TL : LL | RL TR : LR | RR There are a total of 9 nodes. The leaves are: LL,RL,LR,RR. If, for example, M = (TL,TR), then there is no uncovered node while if M = (RT,TL) then LR is a witness (the only one). You can think of this problem as a cube covering problem (here we cover the 2x2 cube). This looks like a rather abstract problem but it has a very useful application: It is a type checking problem that multiple dispatch programming systems must answer. There are many more applications. There is a straight-forward brute-force algorithm: Explore all leaves of the GCP and check whether each is covered by at least one element of M. Because there are exponentially many leaves, the brute-force algorithm is not practical. Can you avoid an explicit representation of the graph Cartesian product which would clearly lead to an exponential algorithm? Now we turn our attention to the special case LeafCovering(2) where |M|=2. We assume that M only contains 2 elements. Can you propose a polynomial-time algorithm for LeafCovering(2)? ====================== PLAYGROUND ====================== We will explore the LeafCovering(2) problem in a playground called LEAF(2) on Piazza and in your homework. For the playground, we simplify the problem further by assuming that all trees are of this form: T / \ / \ L R (all edges are directed downward) I.e., all trees have one root and two children. Therefore, an instance now consists of: n: the number of directed trees. M: a set of n tuples, e.g., if n=3, M={(L,T,T), (T,R,L)} With this simplification it is much easier to define inputs and the problem is still interesting and a solution will generalize to the original problem. We could use the term cube covering problem instead of leaf covering problem. We try to cover the n-dimensional cube. Because we often have many more Ts than Ls and Rs in the nodes in M, we choose a more economical representation: (!v3,v9) is a convenient way to represent (T,T,L,T,T,T,T,T,R). So we represent a node in GCP as a set of literals. The first variable is v1. During the refutation protocol on Piazza you don't reveal your algorithms: they are your trade secret until after the due date. On Piazza you will use instances where a brute-force algorithm would fail to find a solution in a reasonable amount of time. If we assume the set-up of the table 2.1 on page 34 of the textbook, and choose n to be 50 or larger, it takes 36 years with brute-force. Therefore, when you attack a claim use an instance where n>=50. In LEAF(2) the claims are of the form: LEAF-Claim(Proposer,2,r,s,t) meaning: r is a constant factor. s is the exponent of n. t is the cross-over point (we used to call this n0 in the Landau notation). We are ready with an algorithm for LeafCovering(2) that uses at most r*(n^s) milliseconds if n >= t. n is the number of input trees. Your objective is to keep r and s as small as possible. Highest priority is to have a small s and then to have a small r. t should be at most 10. Notice that we not only strive for good asymptotic run-time but also for good practical run-time with small constants. The 2 appears in the claim because |M|=2. With your claim LEAF-Claim(Proposer,2,r,s,t) you say: if you give me an instance for LeafCovering(2), I will give you the result quickly (defined by r,s and t). In other words, you claim that for all inputs of the special kind with parameter n>=t, you will find a solution quickly in r*(n^s) milliseconds. You make this prediction based on the analysis of your algorithm and by running your algorithm on various test cases and using the data to determine the constants r, s, t. You may want to use Wolfram Alpha to solve the fitting problem. For example, if your analysis shows that you have a cubic running time, then you need to solve a cubic fitting problem. Some of you do solve similar fitting problems in your physics courses. Try: Wolfram Alpha: cubic fit {10.1,1.2},{12.6, 2.8},{14.8,7.6},{16.0,12.8},{17.5,15.1} For the purpose of the playground, it is important that the validity of the solution can be checked quickly. Therefore, we require that the instance given has a non-covered leaf. We use again the honor system: when you try to refute a claim you must give an instance which has a non-covered leaf. After the due date you may be asked to reveal an uncovered leaf for some of the instances that you provided. In other words, as opposer you never reveal the uncovered leaf unless asked by the teaching staff. Note that the proposer might find a different uncovered leaf than the secret one the opposer has. Another description of a claim is: For all instances in a specific subset which have a witness there exists a witness (trivial) and the proposer has a fast algorithm (makes claim interesting) to find a witness. The refutation protocol is: Opposer provides instance Inst to proposer for which the opposer knows a non-covered leaf (witness). Proposer solves instance Inst and delivers the solution (a witness) and the run-time for obtaining the solution (from the input object, excludes parsing time.) If the solution is not valid (the leaf is covered) or the run-time is larger than predicted, the refutation is successful. Note that we can efficiently check whether the leaf is uncovered: Check for both elements of M whether there is a path to the leaf. This can be done efficiently. We use the honor system: you must report the run-time faithfully. What you report can be checked after you have publicized your algorithm after the due date. You must follow the scientific method and make your run-times reproducible. Claim LEAF-Claim(Proposer,2,r1,s1,t1) is stronger than claim LEAF-Claim(Proposer,2,r2,s2,t2) if s1