CS6140 Machine Learning
    HW5 - Features
      
    Make sure you check the syllabus
      for the due date. Please use the notations adopted in class, even
      if the problem is stated in the book using a different notation.
    SpamBase-Poluted dataset:
      the same datapoints as in the original Spambase dataset, only with
      a lot more columns (features) : either random values, or somewhat
      loose features, or duplicated original features.
      
      SpamBase-Poluted with missing values dataset: train,
      test.
      Same dataset, except some values (picked at random) have been
      deleted.
    
    Digits Dataset (Training data,
      
        labels.  Testing data,
      
        labels): about 60,000 images, each 28x28 pixels representing
      digit scans. Each image is labeled with the digit represented, one
      of 10 classes: 0,1,2,...,9.
    
    
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	
	    PROBLEM 5 : Implement Kernel PCA for linear regression (Optional, no credit)
 
	    Dataset:  1000 2-dim datapoints   TwoSpirals 
	    Dataset:  1000 2-dim datapoints   ThreeCircles  
      
      
	    A) First, train a Linear Regression (library) and confirm that it doesnt work , i.e. it has a high classification error or high Root Mean Squared Error. 
	    B) Run KernelPCA with Gaussian Kernel to obtain a representation of T features. For reference these steps we demoed in class (Matlab):
	    	 %get pairwise squared euclidian distance 
	    	 X2 = dot(X,X,2); 
		 DIST_euclid = bsxfun(@plus, X2, X2') - 2 * X * X'; 
		% get a kernel matrix NxN 
		 sigma = 3;  
		 K = exp(-DIST_euclid/sigma); 
		%normalize the Kernel to correspond to zero-mean 
		 U = ones(N)/ N ; 
		 Kn  = K - U*K -K*U + U*K*U ; 
		% obtain kernel eignevalues, vectors; then sort them with largest eig first   
		 [V,D] = eig(Kn,'vector') ;  
		 [D,sorteig] = sort(D,'descend') ;  
		 V = V(:, sorteig);  
		% get the projection matrix  
		 XG = Kn*V';    
		%get first 3 dimmensions  
		 X3G = XG(:,1:3); 
		%get first 20 dimmensions 
		 X20G = XG(:,1:20); 
		%get first 100 dimmensions 
		 X100G = XG(:,1:100); 
    
    
	    C) Retrain Linear regression on the transformed D-dim data. How large D needs to be to get good performance?