Algorithms CSG113 / Fall 2008 Karl Lieberherr Code and ideas contributed by Bryan Chadwick Homework 2: Due date: September 24. Reading in text book: Chapter 4.1, 4.2, 4.4, 4.5, 4.6 and the definition of NP in 8.3 Read the Google paper about how they use MapReduce: http://googlesystem.blogspot.com/2008/01/google-reveals-more-mapreduce-stats.html This homework addresses chapter 3: Graph Algorithms/Topological Sorting. Total: 15 points Part 1: 4 points ============================== MapReduce Algorithms This is an exercise about finding an efficient algorithm for a problem by finding a way of expressing the algorithm in a restricted computational model. Google is successfully using the MapReduce programming model. Here are two paragraphs from the Google MapReduce paper (2004): MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. =================== We apply the MapReduce programming model to graph algorithms but without executing the algorithms on a cluster consisting of hundreds of machines. Instead we express MapReduce in DemeterF and run the program on a single machine. Here is a simple DemeterF program inspired by Google's MapReduce paper: it computes word occurences in a list of ducuments. The idea is to create an entry or pair (w,1) for each word w and to merge those entries together. The class dictionary and behavior file, etc. are here: /home/lieber/.www/courses/csg113/f08/homeworks/hw2/mapReduceDemeterF This program uses the TUCombiner function object to express the map and the reduction conveniently. The map is expressed here: (It is a bit confusing that the map returns a Map) public Map combine(){ return empty; } Map combine(Word n){ return empty.put(n,1); } The first line creates an empty Map and the second line creates a map containing one entry (or pair): (n,1) for word n. The reduction is expressed by the fold function which folds two maps together. For example the Map-object: ((n,1) (m,3)) and the Map-object ((n,6) (o,7)) are folded together: ((n,7) (m,3) (o,7)). ttp://www.ccs.neu.edu/research/demeter/DemeterF/doc/edu/neu/ccs/demeterf/TUCombiner.html tells you more about TUCombiner. Please note that the Java class Map is implemented in the DemeterF library. It is a functional Map but has many of the same features as the Java library Map class. Your task is to implement an important part of topological sorting using the MapReduce programming model. Use the following adjacency list class dictionary: nogen List(X) : Cons(X) | Empty(X). nogen Cons(X) = X List(X). nogen Empty(X) = . Test = Graph. Graph = "graph(" +*l *s List(Adj) -*l ")". Adj = Node *s "=>" *s "(" List(Edge) ")" *l. Edge = Node. Node = ident. Find the UNKNOWNs in this code and run it with DemeterF. Graph{{ public List nodes(){ return new Traversal(new Bc(){ Node combine(Adj a, Node node){ return node; } }).>traverse(adj); } static class NodeComp implements java.util.Comparator{ public int compare(Node a, Node b){ return a.id.compareTo(b.id); } } // Find the in-degrees of all the nodes in this graph public Map inDegrees(){ Map in = TUCombiner.traverse(this,new TUCombiner>(){ Map empty = Map.create(new NodeComp()); public Map combine(){ return empty; } public Map fold(Map a, Map b){ return a.merge(b, new Map.Merge(){ public Integer merge(Integer i, Integer j){ return i+j; } }); } Map combine(Edge e){ return UNKNOWN1 } Map combine(Node n){ return UNKNOWN2 } }); return in; } public List topoSortImp(){ List als = adj; List topo = List.create(); while(!als.isEmpty()){ List> indeg = new Graph(als).inDegrees().toList(); final Entry e = indeg.find(new List.Pred>(){ public boolean huh(Entry e){ return e.val == UNKNOWN3; } }); als = als.remove(new List.Pred(){ public boolean huh(Adj a){ return a.node.equals(UNKNOWN4); } }); topo = topo.append(e.key); } return topo; } }} In the above code, you should recognize the algorithm on page 103 / Chapter 3 of the text book. The recursion has been replaced by an explicit while loop. Part 2: 5 points 2 points: cycle decision correct 3 points: cycle is printed correctly ============================================================================== Exercise 3, chapter 3, page 107 but the running time of your algorithm may be slower than O(m+n). In other words, enhance the algorithm from part 1 so that it detects cycles. Do a careful analysis of the running time of your algorithm. Note that the data structures used, like List and Map are all functional. Discuss possibilities in which the algorithm can be speeded up. An interesting question is whether the elegant style of the algorithm affects its asymptotic running-time. Turn in your enhanced algorithm (program.cd, program.beh, *.java, if you have any and several test cases in program.input). and the analysis of its running time. Part 3: 6 points ================================================================================== 2 points for game rounds 1 point for other questions Form a team of two to three students and play a few rounds of SDG/secret as described in http://www.ccs.neu.edu/home/lieber/evergreen/specker/secret-hiding.html Plug in MAX-SAT as the maximization problem (given a weighted CNF, maximize the fraction of satsfied constraints). Remember the pair notation: (length, number of positive literals). Consider: d1 = (SDG/secret MAX-SAT (2,0) (1,1), p1, Matt) d2 = (SDG/secret MAX-SAT (1,0) (1,1), p2, Matt) How would you choose p1 and p2? Now switch back to SDG/classic: How would you choose p2 in (SDG/classic MAX-SAT (1,0) (1,1), p2, Matt)? For lots of extra credit: How would you choose p1 in (SDG/classic MAX-SAT (2,0) (1,1), p1, Matt)? Justify the answers by giving algorithms and analyzing their running times.