Make sure you check the syllabus for the due date. Please use the notations adopted in class, even if the problem is stated in the book using a different notation.
We are not looking for very long answers (if you find yourself writing more than one or two pages of typed text per problem, you are probably on the wrong track). Try to be concise; also keep in mind that good ideas and explanations matter more than exact details.
Housing data, training and testing sets (description).
The last column are the labels. For
both subproblems, submit a report with a plot (if
necessary), results and comments. DO NOT SUBMIT CODE; but have it ready
for a demo if we ask. The report is expected to be about 2 pages or so
(be concise). Generally, the grading on programming assignments will be
done during the demo.
a) Using housing data, build a decision tree (or regression tree) from the training set. Since the features are numeric values, you will need to use thresholds mechanisms. How well does it do on the testing set?
b) Using housing data, apply regression on the training set to find a linear fit with the labels. Implement both the gradient descent regression and the linear algebra exact solution. Compare the training and testing errors (sum of square differences between prediction and actual label).
TIP: You can try to normalize the columns (features) if the gradient
descent does not seem to converge to the exact solution. Do not normalize labels.
TIP2: Learning rate necessary for gradient descent to work might be smaller that you think (0.0001 or so); better, try a variable learning rate that changes across iterations.
a) Prove that for any arbitrary tree, with possible unequal branching ratios throughout, there exists a binary tree that implements the same classification functionality.
b) Consider a tree with just two levels - a root node connected to B leaf nodes (B>=2) . What are then upper and the lower limits on the number of levels in a functionally equivalent binary tree, as a function of B?
c) As in b), what are the upper and lower limits on number of nodes in a functionally equivalent binary tree?
Write down explicit formulas for normal equations solution presented in class for the case of one input dimension.
(Essentially assume the data is (x_i,y_i) i=1,2,..., m and you are looking for h(x) = ax+b that realizes the minimum mean square error. The problem asks you to write down explicit formulas for a and b.)
HINT: Do not simply copy the formulas from here (but do read the article): either take the general formula derived in class and make the calculations (inverse, multiplications, transpose) for one dimension or derive the formulas for a and b from scratch; in either case show the derivations. You can compare your end formulas with the ones linked above.
a) Somebody tosses a fair coin and if the result is heads, you get
nothing, otherwise you get $5. How much would you be pay to play this
game? What if the win is $500 instead of $5?
b) Suppose you play instead the following game: At the beginning of each game you pay an entry fee of $100. A coin is tossed until a head appears, counting n = the number of tosses it took to see the first head. Your reward is 2^n (that is: if a head appears first at the 4th toss, you get $16). Would you be willing to play this game (why)?
c) Lets assume you answered "yes" at part b (if you did not, you need to fix your math on expected values). What is the probability that you make a profit in one game?
d) [ExtraCredit] After about how many games (estimate) the probability of making a profit overall is bigger than 50% ?
With the notation used in class (and notes), prove that