## Machine Learning Day

### Lab 2.B: Kernel Regularized Least Squares (KRLS)

This lab is about Regularized Least Squares under the kernel formulation, the use of nonlinear kernels and the classification of nonlinearly separable datasets. This is the second part of the RLS lab.

### Getting started

• Get the code file, add the directory to MATLAB path (or set it as current/working directory).
• Use the editor to write/save and run/debug longer scripts and functions.
• Use the command window to try/test commands, view variables and see the use of functions.
• Use `plot` (for 1D), `imshow`, `imagesc` (for 2D matrices), `scatter`, `scatter3D` to visualize variables of different types.
• Work your way through the examples below, by following the instructions.

### 1. Kernel Regularized Least Squares

Complete the code of functions `regularizedKernLSTrain` and `regularizedKernLSTest` that perform training and testing using kernel RLS.

1. Data: load the "two moons" dataset by typing `load('data/moons_dataset.mat')` and visualize the training and the test set using `scatter`.
2. Study and try the `KernelMatrix` function for computing linear, Gaussian and polynomial kernels from the data.
3. Use a linear kernel (`kernel='linear'`) and check the resulting separating function on the training set (use `separatingFKernRLS`).
4. Use a Gaussian kernel (`kernel='gaussian'`) and try a range of kernel and regularization parameters (e.g., sigma in [0.01, 5], lambda in [0, 1]). Check how the separating function changes.
5. (Optional) Repeat 1.4 by adding noise, using`flipLabels`, with p in [0.05, 0.1]. How does the separating function and errors on the test set change with lambda and why?

### 2. Parameter selection

Apply hold-out cross validation (using the provided `HoldoutCVKernRLS`) for selecting the regularization and Gaussian kernel parameters `(lambda, sigma)`. Indicative values for the hold-out percentage and the number of repetitions are `pho = 0.2, rep=51` respectively.

1. Fix sigma and select lambda from a logarithmic range of values, e.g. between 1e-5 and the maximum eigenvalue of the kernel matrix of the training set. For example set `intSigma = 0.5; intLambda = logspace(-5, ...)`.
2. Plot the validation and train (and optionally test) error with lambda on a logarithmic x-axis scale (use `semilogx`or `set(gca, 'xscale', 'log')` with regular `plot`).
3. A rule of thumb for choosing a single 'reasonable' sigma is to compute the average distance between neighboring points in the training set. Apply this rule using concepts from kNN, using the provided function `autosigma` and compare the performance on the test set with that of Part 1.
4. Repeat cross-validation for a noisy set, e.g. with p in [0.05, 0.1].
5. Fix lambda to a small value, e.g. `intLambda = 0.001;` and repeat 2.1 and 2.2 to select a suitable sigma using a search range, e.g. `[0.01, 3]`. Check the effects of overfitting and oversmoothing by selecting, instead, a small or a large sigma (e.g., sigma=0.01 or sigma=5). How would you choose a reasonable set of values `intSigma` to search for "good" sigmas?
6. Select a good lambda and sigma simultaneously and plot the separating function for the KRLS solution obtained using those values (use `separatingFKernRLS`). For efficiency, use the grids for sigma and lambda in Part 1 and Part 5 and a smaller number of repetitions, e.g `rep = 11`. Compare the performance for a noisy set as in Part 3. Note: Train and validation errors are now surfaces over the lambda and sigma grids, which you can view using `surf` or `mesh`.

### 3. (Optional)

1. Repeat Section 2.6 by subsampling the training set at random (e.g. 70, 50, 30, 20) and `p=0.05` noise. Study how the parameters vary with the size of the training set.
2. Repeat Section 1 with the polynomial kernel (`kernel = 'polynomial'`) and parameters `lambda` in `[0,10]` and `deg`, the exponent of the polynomial in `[2,10]`.
3. Apply parameter selection (like in Section 2.6) with a polynomial kernel and a suitable range of exponents and regularization parameters. Compare with the error and think about the role of the optimal exponent for the kernel.
4. Analyze the eigenvalues of the matrix for the polynomial kernel (use `eig`) for different values of `deg`by plotting them using `semilogy`. What happens as deg increases? Why?