Complexity
==========

In the Fall quarter, we discussed unsolvable problems.  In particular,
we studied the unsolvability of the halting problem using a
self-referential argument.  One natural question to ask is "how hard
are the solvable problems".  Hardness in terms of how long does it
take to compute the solution.  Clearly it depends on the nature of the
problem.

We have seen several problems during the course of this quarter:
Snafooz, stable marriage, array manipulation problems such as the
maxsum problem and the longest decreasing sequence, and ICPC problems.

Some of these were hard (Snafooz), others seemed hard but we could
solve them efficiently if we thought hard.

Complexity of an algorithm: its running time on worst-case input
(expressed in terms of the size of the input).  e.g., quicksort is O(n
log(n)), bubble sort is O(n^2), the homework problem for turnpikes was
O(n), etc.

Complexity of a problem: the complexity of the best algorithm for the
problem.

We would like to solve problems "fast" and "efficiently".  What does
it mean to be "fast" and "efficient"?

Generally accepted notion of efficiency: Polynomial time.  O(n^2) O(n)
O(nlog(n)), O(n^3), etc.  

O(2^n) O(n!) O(n^{log(n)}) all considered inefficient.  For good
reason.

Traveling Salesperson Problem: A salesperson needs to visit n cities.
Is there a route of length <= d?  Compare with finding shortest path
between two cities, or even between all pairs of cities.  

The shortest paths problem can be solved very efficiently using greedy
techniques.  Unfortunately, greedy techniques break down for the
Traveling Salesperson problem very soon.  Brute force technique takes
time O(N!).  

10^9 = number of instructions per second on a PC

10^{12} = number of instructions per second on a supercomputer

10^9 = seconds per year

10^{13} = age of the universe

10^{79} = number of electrons estimated in the universe

(1000)! > 2^{1000} > 10^{300} >> 10^{79}*10^{13}*10^9*10^{12}

Clearly would like to better than brute force.  But can we do this?
Is TSP instrinsically hard?  Is Snafooz intrinsically hard?  

The Holy Grail of Computer Science: The P vs NP problem
=======================================================

The Clay Mathematical Institute came up with a list of 7 problems in
2000, referred to as the Millenium problems.  Problem #4 is the P vs
NP problem.  One million dollars for solving this problem.  What are P
and NP anyway?

P = class of problems that can be solved deterministically in
polynomial time.

NP = class of problems for which we can check efficiently if a given
solution is a valid solution in polynomial time.  (referred to as
non-deterministic polynomial time).

For instance, consider the TSP problem.  Suppose somebody computed a
tour and claimed that it has cost <= d.  We can verify that easily.
So TSP is in NP.  Another definition of NP: class of problems that can
be solved in polynomial time by "inspired guessing".

Other example problems in NP: 
=============================

Factoring: Given a non-prime number, determine two factors of the
number.  If we are given two numbers that are alleged factors, then we
can simply multiply and check whether these are indeed factors.

Satisfiability: Given a Boolean formula, is there an assignment to the
variables that makes the formula satisfiable?  Given an assignment, we
can check efficiently whether the assignment makes the formula true.
So in NP.

Which is bigger, P or NP?  Clearly P is a subset of NP.  Are they the
same?  Most likely not.  If not, then there should be some problems in
NP that are not in P.  What do these problems look like?  Perhaps,
Satisfiability, TSP, and Factoring are all of that kind?

Even though we do not know the answer to this question, we have been
able to identify some of the hardest problems in NP.  The so called
NP-complete problems.

A problem is NP-complete if it is in NP, and if it turns out that it
is in P, then all of NP is in P!  Picture for P, NP, and NPC.

Is there a problem that is in NPC?  Yes.  In a landmark result (early
1970s), Cook showed that SAT is NP-complete.  Which means that if
somebody solves SAT in polynomial time, then all the other problems in
NP (including Factoring) can be solved in poly-time.

NP-completeness proofs
======================

How to prove that a problem is NP-complete?  The idea is to argue that
this problem is at least as hard as every other problem in NP.  How do
I even talk about "every other problem" in NP?  Indeed very
challenging.  Which is why Cook's result is a seminal one.

Once SAT was proven to be NP-complete, however, one can show that
another problem in NP is NP-complete by showing that it is at least as
hard as SAT!  How to do that?  Reduce SAT to this problem.

Example of a reduction
======================

Clique: Given n people, does there exist a group of size k such that
every pair of persons in the group know each other?  Again if we are
given a group that is claimed to be a clique, we can check that
easily.  So Clique is in NP.

We will reduce SAT to Clique.  Given a SAT instance S, we will come up
with a Clique instance (G, k) such that S is satisfiable if and only
if G has a clique of size k.  Illustrate the reduction.

Implications of NP-completeness
===============================

A number of real-life problems are NP-complete.  What lies in the
future:

(1) P != NP: Hope that the worst case does not arise.  Try an
approximate solution.  Change the problem.  Exploit the
NP-completeness (crypto).  Change the model (Quantum computing)

(2) P = NP: That is, these seemingly hard problems are really not that
hard.  We can have conventional machines implement "inspired
guessing".  Considered very unlikely.  Would have adverse impact on
most applied crytography.