Project 2

CSU 670 Fall 2008
Out: September 19, 2008
Due September 26, 2008


PART 1:
========================================

For this project we fill in the unspecified concepts from project 1. What was left unspecified was the concept of a type T of a derivative, the concept of raw material of type T and the concept of finishing the raw material with a certain quality.

Our raw material are Boolean formulas in conjunctive normal form (CNF). A CNF consists of a list of clauses. The same clause may appear several times: so our CNFs are a bag of clauses. Each clause consists of a list of literals. A literal is either a positive or a negative variable.

raw material = CNF = bag of clauses

The concept of finishing the raw material amounts to finding an assignment that satisfies many clauses (ideally it maximizes the number of satisfied clauses over all possible assignments). An assignment assigns true or false to each variable of a CNF. An assignment is represented by a clause: a positive literal means the variable is set to true, a negative literal means the variable is set to false. A clause is satisfied by an assignment if at least one literal in the clause is satisfied. A positive literal is satisfied if its variable is assigned true and a negative literal is satisfied if its variable is assigned false. The quality of an assignment is the fraction of satisfied clauses among the total number of clauses.

finished product = CNF plus assignment

We need to define the concept of type of a CNF. The type of a CNF says what types of clauses the CNF contains. The type of a clause is a pair: the first element is the length of the clause (the number of literals) and the second element the number of positive literals. The clause (a !b !c) is of type (3,1) and the clause (!a !b !c !d) is of type (4,0). The type of a CNF is the union of the clause types. For example the CNF ((a) (b) (c) (!a !b) (!b !c)) is of type ((1,1) (2,0)).

Note that when you buy a derivative you buy it based on its type and its price. So you will need to figure out whether a price for a given type is fair. The uncertainty you have is different than with the weather, or exchange rates.

uncertainty
-----------

weather:  how nice will the weather be?
knowledge to reduce uncertainty: model of weather behavior.

interest rates:  how high will the rate be?
knowledge to reduce uncertainty: model of economy behavior.

Specker Derivative Game: how much can I satisfy in the CNF I get?
knowledge to reduce uncertainty: model of CNF

type of raw material = list of relations. Each relation is given by a pair of numbers.

Complete the object-oriented design from project 1 with the new information.

What to turn in: Your extended object-oriented design as a cd file.

PART  2:
=====================================================================
MapReduce Algorithms

Read the Google paper about how they use MapReduce:
http://googlesystem.blogspot.com/2008/01/google-reveals-more-mapreduce-stats.html

This is an exercise about finding an efficient algorithm
for a problem by finding a way of expressing the 
algorithm in a restricted computational model.

Google is successfully using the MapReduce programming model.
Here are two paragraphs from the Google MapReduce paper (2004):

MapReduce is a programming model and an associated
implementation for processing and generating large
data sets. Users specify a map function that processes a
key/value pair to generate a set of intermediate key/value
pairs, and a reduce function that merges all intermediate
values associated with the same intermediate key. Many
real world tasks are expressible in this model, as shown
in the paper.
Programs written in this functional style are automatically
parallelized and executed on a large cluster of commodity
machines. The run-time system takes care of the
details of partitioning the input data, scheduling the program's
execution across a set of machines, handling machine
failures, and managing the required inter-machine
communication. This allows programmers without any
experience with parallel and distributed systems to easily
utilize the resources of a large distributed system.
===================

We apply the MapReduce programming model to 
implementing SDG
but without executing the algorithms on a cluster consisting 
of hundreds of machines. Instead we express MapReduce
in DemeterF and run the program on a single machine.

Here is a simple DemeterF program inspired by Google's
MapReduce paper: it computes word occurences in a list
of ducuments.
The idea is to create an entry or pair (w,1) for each word w
and to merge those entries together.

The class dictionary and behavior file, etc. are here:
/home/lieber/.www/courses/csu670/f08/project/project2/mapReduceDemeterF

This program uses the TUCombiner function object to 
express the map and the reduction conveniently.
The map is expressed here:
(It is a bit confusing that the map returns a Map)

public Map combine(){ return empty; }
Map combine(Word n){ return empty.put(n,1); }

The first line creates an empty Map and the second line
creates a map containing one entry (or pair): (n,1)
for word n.

The reduction is expressed by the fold function which
folds two maps together. For example the Map-object:
((n,1) (m,3)) and the Map-object ((n,6) (o,7)) are folded
together: ((n,7) (m,3) (o,7)).

ttp://www.ccs.neu.edu/research/demeter/DemeterF/doc/edu/neu/ccs/demeterf/TUCombiner.html
tells you more about TUCombiner.

Please note that the Java class Map is implemented in the DemeterF library.
It is a functional Map but has many of the same features as the 
Java library Map class.

Your task is to implement a program to collect all
derivatives that are currently for sale in the store into a list
using the MapReduce programming model.

For help inspect the files:
general.cd
classic.cd
ListTUCombiner.java

and find the UNKNOWN below:


	/** Returns a list of derivatives that are available for sale
	 * @param store
	 * @return {@link List} 
	 */
	public static List findDerivativesForSale(List> store) {
		return TUCombiner.traverse(store, new DerivativesForSale());
	}

/** Class for traversal
 * @author animesh
 *
 */
class DerivativesForSale extends ListTUCombiner{
	UNKNOWN
	}	
}


Part 3:
==============================================================================

Form a team of two to three students
and play a few rounds of
SDG/secret as described in
http://www.ccs.neu.edu/home/lieber/evergreen/specker/secret-hiding.html

Plug in MAX-SAT as the maximization problem (given a weighted
CNF, maximize the fraction of satsfied constraints).
Remember the pair notation: (length, number of positive literals).
Consider:
d1 = (SDG/secret MAX-SAT (2,0) (1,1), p1, Matt)
d2 = (SDG/secret MAX-SAT (1,0) (1,1), p2, Matt)
How would you choose p1 and p2?

Now switch back to SDG/classic:
How would you choose p2 in (SDG/classic MAX-SAT (1,0) (1,1), p2, Matt)?

For lots of extra credit: How would you choose 
p1 in (SDG/classic MAX-SAT (2,0) (1,1), p1, Matt)?

Justify the answers.