## Machine Learning Day

### Lab 3: Dimensionality reduction and feature selection

In this lab we will look into the problems of dimensionality reduction through Principal Component Analysis (PCA) and feature selection through Orthogonal Matching Pursuit (OMP).

- Get the code file and add the directory to MATLAB path (or set it as current/working directory).
- Work your way through the examples below, by following the instructions.

### 1. Data generation

Generate a 2-class dataset of D-dimensional points with N points for each class. Start with N = 100 and D = 30 and create a train and a test set.

- The first two variables (dimensions) for each point will be generated by
`MixGauss`

, i.e., extracted from two Gaussian distributions, with centroids (1, 1) and (-1,-1) and sigmas 0.7. Adjust the output labels of the classes to be {1,-1} respectively and plot these variables in a 2D plane using`scatter(X(:,1), X(:,2), markerSize, Y);`

. - The remaining (D-2) variables will be generated by Gaussian noise with
`sigma_noise = 0.01`

and`X_noise = sigma_noise*randn(2*N, D-2);`

. - The final D-dimensional train and test data matrices will then be given by
`X = [X X_noise];`

, of size N x D, storing N instances in a D-dimensional space.

### 2. Principal Component Analysis

- Compute the principal components of the training set, using the provided function
`PCA`

. Select the number of principal components by setting`k<D`

. - For the data projected on the subspace of K components,
`X_p`

, plot the first dimension (as points in a line, using`plot(X_proj(:,1), 1)`

), the first two (`scatter(X_p(:,1), X_p(:,2), markerSize, Ytr);`

) and the first three (`scatter3(X_p(:,1), X_p(:,2), X_p(:,3), markerSize, Ytr);`

). - How would you interpret the meaning of the resulting plots? What is the effective dimensionality of this dataset?
- Visualize the square root of the first k=10 eigenvalues. Can you infer the dimensionality from this distribution?
- Visualize the eigenvectors associated with the largest, e.g.,
`scatter(1:D, abs(V(:,1)))`

, second largest and third largest eigenvalue. - Repeat the above with a dataset of different
`sigma_noise`

, e.g., in [0, 2]. To what extent is data visualization by PCA affected by noise?

### 3. Variable selection with OMP

- Standardize the data matrix, so that each column has mean 0 and standard deviation 1. Use the statistics from the train set X (mean and standard deviation) to standardize the corresponding test set Xte. An example of a vectorized implementation:
`m = mean(X);`

s = std(X);

X = bsxfun(@rdivide, bsxfun(@minus, X, m, s); - Use the orthogonal matching pursuit algorithm (function
`OMatchingPursuit`

) using T repetitions, to obtain T-1 coefficients for a sparse approximation of the training set Xt. Plot the resulting coefficients w using`stem(1:D,w)`

. What is the output when setting`T = 3`

and what is the interpretation of the indices of these first active dimensions (coefficients)? - Check the predicted labels on the training (and test) set when approximating the output using w:

Plot the errors on the same plot for increasing T. How do the train and test error change with the number of iterations of the method?`Ypred = sign(Xt * w);`

err = calcErr(Yt, Ypred); - By applying cross-validation on the training set through the provided
`holdoutCVOMP`

find the optimum number of iterations in the range`intIter = 2:D`

(indicative values:`pho = 0.2, rip = 30`

). Plot the training and validation error on the same plot. How do the errors change with the number of iterations?

### 4. (Optional)

Compare the results of Parts 2 and 3 and evaluate the benefits of the two different methods for dimensionality reduction and feature selection when choosing`N >> D`

, `N ~= D`

and `N << D`

.