## Machine Learning Day

### Lab 2.A: Regularized Least Squares (RLS)

This lab is about applying linear Regularized Least Squares (RLS) for classification, exploring the role of the regularization parameter and the generalization error as dependent on the size and the dimensionality of the training set, the noise in the data etc. It also introduces Leave-One-Out Cross-validation (LOOCV), an extreme case of the Hold-out CV which is useful for small training sets.

### Getting started

• Get the code file, add the directory to MATLAB path (or set it as current/working directory).
• Use the editor to write/save and run/debug longer scripts and functions.
• Use the command window to try/test commands, view variables and see the use of functions.
• Use `plot` (for 1D), `imshow`, `imagesc` (for 2D matrices), `scatter`, `scatter3D` to visualize variables of different types.
• Work your way through the examples below, by following the instructions.

### 1. Classification data generation

1. Use `MixGauss` to generate a 2-dimensional, 2-class training set `[X, Y]`, with classes centered at (-0.5,-0.5) and (0.5,0.5), variance 0.5 for both and 5 points per class. Adjust the output labels `Y`to be {1,-1}, e.g. using `Y(Y==2)=-1`.
2. Generate a corresponding test set 200 points per class `[Xte, Yte]` from the same distribution.
3. Add noise to the data by randomly flipping a percentage of the point labels (e.g. `p = 0.2`), using the provided function `flipLabels`. You will obtain a new set of training `Y` and test `Yte` label vectors.
4. Plot the various datasets using `scatter`, e.g., `scatter(X(:,1), X(:,2), markerSize, Y);`

### 2. RLS classification

Complete the code in functions `regularizedLSTrain` and `regularizedLSTest` for training and testing a regularized Least Squares classifier. Try the functions on the 2-class problem from Section 1.

1. Pick a value for `lambda`, evaluate the classification performance by comparing the estimated to the true outputs and plot the data in a way that visualizes the obtained results (e.g. a scatter plot with the misclassified points labeled differently).
Note: To visualize the separating function ,i.e. the areas of the 2D plane are associated with each class, you can use the function `separatingFRLS`. Superimpose the training and test set data to analyze the generalization properties of the solution.
2. Check the effect of regularization by changing lambda and the effect of noise.
3. Perform parameter selection using leave-one-out cross-validation, through the provided `looCVRLS`, to select `lambda` from a logarithmic range of values, e.g. between 1e-4 and the maximum eigenvalue of the linear kernel matrix `C = X*X'`.
• Plot the training and validation errors for the different values of lambda.
• Apply the best model to the test set and check the classification error.
• Show the separating function and generalization of the solution.
4. Repeat the procedure data generation -- parameter selection -- test multiple times and compare the test error of RLS with that of ordinary least squares (OLS), i.e. with `lambda=0`. Does regularization improve classification performance?

### 3. (Optional)

1. Classification for a high-dimensional data: Generate the same classes as in Section 1 with the Gaussians now residing in a D-dimensional space, e.g., try `D=10, N=5*D`. How should you choose the class mean vectors?
• Check what happens with varying lambda, the input space dimension D (i.e., the distance between points), teh size of the training set and noise.
• Perform parameter selection using leave-one-out or hold-out cross-validation for `lambda` and find the error of the best model. Does regularization help classification performance?
2. Modify `regularizedLSTrain` and `regularizedLSTest` to incorporate an offset b in the linear model (i.e., y = <w,x> + b). Compare the solution with and without offset, in a 2-class dataset with classes centered at (0,0) and (1,1) with variance 0.35 each.
3. Modify `regularizedLSTrain` and `regularizedLSTest` to handle multiclass problems.