Machine Learning Day

Lab 3: Dimensionality reduction and feature selection

In this lab we will look into the problems of dimensionality reduction through Principal Component Analysis (PCA) and feature selection through Orthogonal Matching Pursuit (OMP).

1. Data generation

Generate a 2-class dataset of D-dimensional points with N points for each class. Start with N = 100 and D = 30 and create a train and a test set.

  1. The first two variables (dimensions) for each point will be generated by MixGauss, i.e., extracted from two Gaussian distributions, with centroids (1, 1) and (-1,-1) and sigmas 0.7. Adjust the output labels of the classes to be {1,-1} respectively and plot these variables in a 2D plane using scatter(X(:,1), X(:,2), markerSize, Y);.
  2. The remaining (D-2) variables will be generated by Gaussian noise with sigma_noise = 0.01 and X_noise = sigma_noise*randn(2*N, D-2);.
  3. The final D-dimensional train and test data matrices will then be given by X = [X X_noise];, of size N x D, storing N instances in a D-dimensional space.

2. Principal Component Analysis

  1. Compute the principal components of the training set, using the provided function PCA. Select the number of principal components by setting k<D.
  2. For the data projected on the subspace of K components, X_p, plot the first dimension (as points in a line, using plot(X_proj(:,1), 1)), the first two (scatter(X_p(:,1), X_p(:,2), markerSize, Ytr);) and the first three (scatter3(X_p(:,1), X_p(:,2), X_p(:,3), markerSize, Ytr);).
  3. How would you interpret the meaning of the resulting plots? What is the effective dimensionality of this dataset?
  4. Visualize the square root of the first k=10 eigenvalues. Can you infer the dimensionality from this distribution?
  5. Visualize the eigenvectors associated with the largest, e.g., scatter(1:D, abs(V(:,1))), second largest and third largest eigenvalue.
  6. Repeat the above with a dataset of different sigma_noise, e.g., in [0, 2]. To what extent is data visualization by PCA affected by noise?

3. Variable selection with OMP

  1. Standardize the data matrix, so that each column has mean 0 and standard deviation 1. Use the statistics from the train set X (mean and standard deviation) to standardize the corresponding test set Xte. An example of a vectorized implementation:
    m = mean(X);
    s = std(X);
    X = bsxfun(@rdivide, bsxfun(@minus, X, m, s);
  2. Use the orthogonal matching pursuit algorithm (function OMatchingPursuit) using T repetitions, to obtain T-1 coefficients for a sparse approximation of the training set Xt. Plot the resulting coefficients w using stem(1:D,w). What is the output when setting T = 3 and what is the interpretation of the indices of these first active dimensions (coefficients)?
  3. Check the predicted labels on the training (and test) set when approximating the output using w:
    Ypred = sign(Xt * w);
    err = calcErr(Yt, Ypred);
    Plot the errors on the same plot for increasing T. How do the train and test error change with the number of iterations of the method?
  4. By applying cross-validation on the training set through the provided holdoutCVOMP find the optimum number of iterations in the range intIter = 2:D (indicative values: pho = 0.2, rip = 30). Plot the training and validation error on the same plot. How do the errors change with the number of iterations?

4. (Optional)

Compare the results of Parts 2 and 3 and evaluate the benefits of the two different methods for dimensionality reduction and feature selection when choosing N >> D, N ~= D and N << D.