Complexity ========== In the Fall quarter, we discussed unsolvable problems. In particular, we studied the unsolvability of the halting problem using a self-referential argument. One natural question to ask is "how hard are the solvable problems". Hardness in terms of how long does it take to compute the solution. Clearly it depends on the nature of the problem. We have seen several problems during the course of this quarter: Snafooz, stable marriage, array manipulation problems such as the maxsum problem and the longest decreasing sequence, and ICPC problems. Some of these were hard (Snafooz), others seemed hard but we could solve them efficiently if we thought hard. Complexity of an algorithm: its running time on worst-case input (expressed in terms of the size of the input). e.g., quicksort is O(n log(n)), bubble sort is O(n^2), the homework problem for turnpikes was O(n), etc. Complexity of a problem: the complexity of the best algorithm for the problem. We would like to solve problems "fast" and "efficiently". What does it mean to be "fast" and "efficient"? Generally accepted notion of efficiency: Polynomial time. O(n^2) O(n) O(nlog(n)), O(n^3), etc. O(2^n) O(n!) O(n^{log(n)}) all considered inefficient. For good reason. Traveling Salesperson Problem: A salesperson needs to visit n cities. Is there a route of length <= d? Compare with finding shortest path between two cities, or even between all pairs of cities. The shortest paths problem can be solved very efficiently using greedy techniques. Unfortunately, greedy techniques break down for the Traveling Salesperson problem very soon. Brute force technique takes time O(N!). 10^9 = number of instructions per second on a PC 10^{12} = number of instructions per second on a supercomputer 10^9 = seconds per year 10^{13} = age of the universe 10^{79} = number of electrons estimated in the universe (1000)! > 2^{1000} > 10^{300} >> 10^{79}*10^{13}*10^9*10^{12} Clearly would like to better than brute force. But can we do this? Is TSP instrinsically hard? Is Snafooz intrinsically hard? The Holy Grail of Computer Science: The P vs NP problem ======================================================= The Clay Mathematical Institute came up with a list of 7 problems in 2000, referred to as the Millenium problems. Problem #4 is the P vs NP problem. One million dollars for solving this problem. What are P and NP anyway? P = class of problems that can be solved deterministically in polynomial time. NP = class of problems for which we can check efficiently if a given solution is a valid solution in polynomial time. (referred to as non-deterministic polynomial time). For instance, consider the TSP problem. Suppose somebody computed a tour and claimed that it has cost <= d. We can verify that easily. So TSP is in NP. Another definition of NP: class of problems that can be solved in polynomial time by "inspired guessing". Other example problems in NP: ============================= Factoring: Given a non-prime number, determine two factors of the number. If we are given two numbers that are alleged factors, then we can simply multiply and check whether these are indeed factors. Satisfiability: Given a Boolean formula, is there an assignment to the variables that makes the formula satisfiable? Given an assignment, we can check efficiently whether the assignment makes the formula true. So in NP. Which is bigger, P or NP? Clearly P is a subset of NP. Are they the same? Most likely not. If not, then there should be some problems in NP that are not in P. What do these problems look like? Perhaps, Satisfiability, TSP, and Factoring are all of that kind? Even though we do not know the answer to this question, we have been able to identify some of the hardest problems in NP. The so called NP-complete problems. A problem is NP-complete if it is in NP, and if it turns out that it is in P, then all of NP is in P! Picture for P, NP, and NPC. Is there a problem that is in NPC? Yes. In a landmark result (early 1970s), Cook showed that SAT is NP-complete. Which means that if somebody solves SAT in polynomial time, then all the other problems in NP (including Factoring) can be solved in poly-time. NP-completeness proofs ====================== How to prove that a problem is NP-complete? The idea is to argue that this problem is at least as hard as every other problem in NP. How do I even talk about "every other problem" in NP? Indeed very challenging. Which is why Cook's result is a seminal one. Once SAT was proven to be NP-complete, however, one can show that another problem in NP is NP-complete by showing that it is at least as hard as SAT! How to do that? Reduce SAT to this problem. Example of a reduction ====================== Clique: Given n people, does there exist a group of size k such that every pair of persons in the group know each other? Again if we are given a group that is claimed to be a clique, we can check that easily. So Clique is in NP. We will reduce SAT to Clique. Given a SAT instance S, we will come up with a Clique instance (G, k) such that S is satisfiable if and only if G has a clique of size k. Illustrate the reduction. Implications of NP-completeness =============================== A number of real-life problems are NP-complete. What lies in the future: (1) P != NP: Hope that the worst case does not arise. Try an approximate solution. Change the problem. Exploit the NP-completeness (crypto). Change the model (Quantum computing) (2) P = NP: That is, these seemingly hard problems are really not that hard. We can have conventional machines implement "inspired guessing". Considered very unlikely. Would have adverse impact on most applied crytography.