Algorithms CSG113 / Fall 2008
Karl Lieberherr                Code and ideas contributed by Bryan Chadwick


Homework 2:

Due date: September 24.

Reading in text book:
Chapter 4.1, 4.2, 4.4, 4.5, 4.6
and the definition of NP in 8.3

Read the Google paper about how they use MapReduce:
http://googlesystem.blogspot.com/2008/01/google-reveals-more-mapreduce-stats.html

This homework addresses chapter 3: Graph Algorithms/Topological Sorting.

Total: 15 points

Part 1: 4 points
==============================
MapReduce Algorithms

This is an exercise about finding an efficient algorithm
for a problem by finding a way of expressing the 
algorithm in a restricted computational model.

Google is successfully using the MapReduce programming model.
Here are two paragraphs from the Google MapReduce paper (2004):

MapReduce is a programming model and an associated
implementation for processing and generating large
data sets. Users specify a map function that processes a
key/value pair to generate a set of intermediate key/value
pairs, and a reduce function that merges all intermediate
values associated with the same intermediate key. Many
real world tasks are expressible in this model, as shown
in the paper.
Programs written in this functional style are automatically
parallelized and executed on a large cluster of commodity
machines. The run-time system takes care of the
details of partitioning the input data, scheduling the program's
execution across a set of machines, handling machine
failures, and managing the required inter-machine
communication. This allows programmers without any
experience with parallel and distributed systems to easily
utilize the resources of a large distributed system.
===================

We apply the MapReduce programming model to graph algorithms
but without executing the algorithms on a cluster consisting 
of hundreds of machines. Instead we express MapReduce
in DemeterF and run the program on a single machine.

Here is a simple DemeterF program inspired by Google's
MapReduce paper: it computes word occurences in a list
of ducuments.
The idea is to create an entry or pair (w,1) for each word w
and to merge those entries together.

The class dictionary and behavior file, etc. are here:
/home/lieber/.www/courses/csg113/f08/homeworks/hw2/mapReduceDemeterF

This program uses the TUCombiner function object to 
express the map and the reduction conveniently.
The map is expressed here:
(It is a bit confusing that the map returns a Map)

public Map<Word,Integer> combine(){ return empty; }
Map<Word,Integer> combine(Word n){ return empty.put(n,1); }

The first line creates an empty Map and the second line
creates a map containing one entry (or pair): (n,1)
for word n.

The reduction is expressed by the fold function which
folds two maps together. For example the Map-object:
((n,1) (m,3)) and the Map-object ((n,6) (o,7)) are folded
together: ((n,7) (m,3) (o,7)).

ttp://www.ccs.neu.edu/research/demeter/DemeterF/doc/edu/neu/ccs/demeterf/TUCombiner.html
tells you more about TUCombiner.

Please note that the Java class Map is implemented in the DemeterF library.
It is a functional Map but has many of the same features as the 
Java library Map class.

Your task is to implement an important part of topological sorting
using the MapReduce programming model.

Use the following adjacency list class dictionary:

nogen List(X) : Cons(X) | Empty(X).
nogen Cons(X) = <first> X <rest> List(X).
nogen Empty(X) = .

Test = <g> Graph.

Graph = "graph(" +*l *s <adj> List(Adj) -*l ")".
Adj = <node> Node *s "=>" *s "(" <es> List(Edge) ")" *l.
Edge = <targ> Node.
Node = <id> ident.

Find the UNKNOWNs in this code and run it with DemeterF.

Graph{{
    public List<Node> nodes(){
        return new Traversal(new Bc(){
                Node combine(Adj a, Node node){ return node; }
            }).<List<Node>>traverse(adj);
    }

    static class NodeComp implements java.util.Comparator<Node>{
        public int compare(Node a, Node b){
            return a.id.compareTo(b.id);
        }
    }
    // Find the in-degrees of all the nodes in this graph
    public Map<Node,Integer> inDegrees(){
        Map<Node,Integer> in = TUCombiner.traverse(this,new TUCombiner<Map<Node,Integer>>(){
                Map<Node,Integer> empty = Map.<Node,Integer>create(new NodeComp());
                
                public Map<Node,Integer> combine(){ return empty; }
                public Map<Node,Integer> fold(Map<Node,Integer> a, Map<Node,Integer> b){
                    return a.merge(b, new Map.Merge<Integer>(){
                            public Integer merge(Integer i, Integer j){ return i+j; }
                        });
                }
                Map<Node,Integer> combine(Edge e){ return UNKNOWN1 }
                Map<Node,Integer> combine(Node n){ return UNKNOWN2 }
            });
        return in;
    }

    public List<Node> topoSortImp(){
        List<Adj> als = adj;
        List<Node> topo = List.<Node>create();

        while(!als.isEmpty()){
            List<Entry<Node,Integer>> indeg = new Graph(als).inDegrees().toList();
            final Entry<Node,Integer> e = indeg.find(new List.Pred<Entry<Node,Integer>>(){
                    public boolean huh(Entry<Node,Integer> e){ return e.val == UNKNOWN3; }
                });
            als = als.remove(new List.Pred<Adj>(){
                    public boolean huh(Adj a){ return a.node.equals(UNKNOWN4); }
                });
            topo = topo.append(e.key);
        }
        return topo;
    }
}}

In the above code,
you should recognize the algorithm on page 103 / Chapter 3 of the text book.
The recursion has been replaced by an explicit while loop.

Part 2: 5 points
  2 points: cycle decision correct
  3 points: cycle is printed correctly

==============================================================================

Exercise 3, chapter 3, page 107 but the running time of your algorithm
may be slower than O(m+n).

In other words, enhance the algorithm from part 1 so that it detects cycles.

Do a careful analysis of the running time of your algorithm.
Note that the data structures used, like List and Map are all functional.
Discuss possibilities in which the algorithm can be speeded up.

An interesting question is whether the elegant style of the algorithm
affects its asymptotic running-time.

Turn in your enhanced algorithm (program.cd, program.beh, *.java,
if you have any and several test cases in program.input).
and the analysis of its running time.


Part 3: 6 points
==================================================================================
2 points for game rounds
1 point for other questions

Form a team of two to three students
and play a few rounds of
SDG/secret as described in
http://www.ccs.neu.edu/home/lieber/evergreen/specker/secret-hiding.html

Plug in MAX-SAT as the maximization problem (given a weighted
CNF, maximize the fraction of satsfied constraints).
Remember the pair notation: (length, number of positive literals).
Consider:
d1 = (SDG/secret MAX-SAT (2,0) (1,1), p1, Matt)
d2 = (SDG/secret MAX-SAT (1,0) (1,1), p2, Matt)
How would you choose p1 and p2?

Now switch back to SDG/classic:
How would you choose p2 in (SDG/classic MAX-SAT (1,0) (1,1), p2, Matt)?

For lots of extra credit: How would you choose 
p1 in (SDG/classic MAX-SAT (2,0) (1,1), p1, Matt)?

Justify the answers by giving algorithms and 
analyzing their running times.