This Lab is about linear Regularized
Least Squares. Follow the instructions below. Think hard before you
call the instructors!
Download:
zipfile (unzip it in a local folder)
Set Matlab path to include the local folder
We start again from data generation, using the function MixGauss:
1.A Generate a 2-class training set where each class is
centered in (-1;-1) and (1;1) and its variance is 0.35 and 0.35,
respectively. Generate 100 points per class: input Xtr, output
Ytr.
Adjust the output labels so to assign them {-1,1}:
Ytr(Ytr==2)=-1.
1.B Generate the corresponding test set of 300 points per class: input Xts, output Yts; Remember to assign {-1,1} to the test labels, consistently with the training set.
1.C Add some noise to the previously generated data, with the function flipLabels (type "help flipLabels" for some guideline). You will obtain a new set of training and test output vectors: Ytrn, Ytsn.
Plot the various datasets with the function scatter, eg:
figure;
hold
on
scatter(Xtr(Ytr==1,1), Xtr(Ytr==1,2),
'.r');
scatter(Xtr(Ytr==-1,1), Xtr(Ytr==-1,2),
'.b');
title('training set')
2.A Have a look at the code of functions regularizedLSTrain and regularizedLSTest, and complete them.
2.B Try the
functions on the previously generated 2-class data from section 1.
Pick a "reasonable" lambda.
Store the predictions in
the Ypred vector
2.C Think of how to plot the data to get a glimpse
of the obtained results. A possible way
is:
figure;
scatter(Xts(:,1),Xts(:,2),25,Ytsn);
hold
on
sel = (sign(Ypred) ~=
Ytsn);
scatter(Xts(sel,1),Xts(sel,2),200,Ytsn(sel),'x');
2.D To evaluate the classification performance, compare the estimated outputs with those previously generated:
2.E
To visualize the separating function (and thus get a more general
view of what areas are associated with each class) you may use the
routine separatingFRLS
(type "help separatingFRLS"
on the Matlab shell, if you still have doubts on how to use it, have
a look at the code).
Superimpose to the function both training
data (Xtr, Ytrn)
and test data (Xts,
Ytsn), on two
separate plots, to analyze the generalization properties of you
solution.
3.A Repeat all the experiments in section 2 for different data-sets (e.g. vary training set size, mean position, percentage of flipping, shape)
3.B Think of how to generate a problem where the Gaussians live in a 200 dimensional space (clearly you will not be able to plot this data-set...)
3.C Consider a high dimensional data-set. For example produce two gaussians with centers [1; zeros(199,1)] and [-1; zeros(199,1)], variances [0.5 , 0.5] and 50 points per class. Check what happens with varying lambda. How would you evaluate the quality of the solution?
3.D Generate a training set and a test set using AnisotropicMixGauss(mean1, Sigma1, mean2, Sigma2) with means [-2;0] and [2;0] and Sigma1 equal to Sigma2 = [1, -1; -1, 0.1]; Check the impact of lambda.
3.E Modify the regularizedLSTrain and regularizedLSTest functions to incorporate an off-set in the linear model. Compare the solution with and without offset, in a 2-class data set where each class is centered on (0,0) and (1,1) respectively, with variances 0.35 and 0.35
3.F Modify the regularizedLSTrain and regularizedLSTest functions to handle multiclass problems.