## Machine Learning Day

### Lab 0: Data Generation

This first (optional) lab is focused on getting started with MATLAB/Octave and working with data for ML. The goal is to provide basic familiarity with MATLAB syntax, along with some preliminary data generation, processing and visualization.

### MATLAB/Octave resources

The labs are designed for MATLAB/Octave. Below you can find a number of resources to get you started.

• MATLAB getting started tutorial for an introduction to the environment, syntax and conventions.
• MATLAB has very thorough documentation, both online and built in. In the command window, type: `help functionName` (check use) or `doc functionName` (pull up documentation).
• Built in tutorials: in the command window enter `demo`.
• Comprehensive MATLAB reference and introduction: (pdf

### Getting Started

• Get the code file, add the directory to MATLAB path (or set it as current/working directory).
• Use the editor to write/save and run/debug longer scripts and functions.
• Use the command window to try/test commands, view variables and see the use of functions.
• Use `plot` (for 1D), `imshow`, `imagesc` (for 2D matrices), `scatter`, `scatter3D` to visualize variables of different types.
• Work your way through the examples below, by following the instructions.

### 1. Optional - MATLAB Warm-up

1. Create a column vector `v = [1; 2; 3]` and a row vector `u = [1,2,3]`
• What happens with the command `v'`? What is the corresponding algebraic/matrix operation?
• Create `z = [5;4;3]` and try basic numerical operations of addition and subtraction with `v`.
• What happens with `u + z`?
2. Create the matrices `A = [1 2 3; 4 5 6; 7 8 9]` and `B = A'`
• What kind of matrix is `C = A + B`?
• Explore what happens with `A(:,1)`, `A(1,:)`, `A(2:3,:)` and `A(:)`.
3. Use the product operator `*`
• What happens with `2*u`, `u*2`, `2*v`?
• What happens with `u*v` and `v*u`, why? With `A*v`, `u*A` and `A*u`?
• Use `size` and/or `length` functions to find the dimensions of vectors and matrices.
4. Use the element-wise operators `.*` and `./`, e.g., `u.*z` and `z./u`
• What happens with `v.*z` and `v./z`?
• Why aren't `A*A` and `A.*A `the same?
5. Use the functions `zeros`, `ones`, `rand`, `randn`
• Create a 3 x 5 matrix of all zeros, all ones or random numbers uniformly distributed between 2 and 3 and random numbers distributed according to a Gaussian of variance 2.
6. Use the functions `eye` and `diag`
• Create a 3 x 3 identity matrix and a matrix whose diagonal is the vector `v`.

### 2. Core - Data generation

The function `MixGauss(means, sigmas, n)` generates datasets where the distribution of each class is an isotropic Gaussian with a given mean and variance, according to the values in matrices/vectors `means` and `sigmas`. Study the function code or type `help MixGauss` on the MATLAB shell. The function `scatter` can be used to plot points in 2D.

1. Generate and visualize a simple dataset:
`[X, C] = MixGauss([[0;0], [1;1]], [0.5, 0.25], 1000);`
`figure; scatter(X(:,1), X(:,2), 25, C); `
2. Generate more complex datasets:
• 4-class dataset: the classes must live in the 2D space and be centered on the corners of the unit square (0,0), (0,1), (1,1), (1,0), all with variance 0.2.
• 2-class dataset: manipulate the data to obtain a 2-class problem where data on opposite corners share the same class. Hint: if you generated the data following the suggested center order, you can use the function `mod` to quickly obtain two labels, e.g. `Y = mod(C, 2).`

### 3. Optional - Extra practice

1. Generate datasets of larger variances, higher dimensionality of input space etc.
2. Add noise to the data by flipping the labels of random points.
3. For a dataset compute the distances among all input points (use vectorization in your code, avoid using a `for` loop). How does the mean distance change with the number of dimensions?
4. Generate regression data: Consider a regression model defined by a linear function with coefficients `w` and Gaussian noise of level (SNR) `delta`.
• Create a MATLAB function with input the number of points `n`, the number of dimensions `D`, the D-dimensional vector `w` and the scalar `delta` and output an (n x D) matrix `X` and an (n x 1) vector `Y`.
• Plot the underlying (linear) function and the noisy output on the same figure.
• Test/visualize the 1-D and 2-D cases, but make the function generic to account for higher dimensional data.
5. Generate regression data using a 1-D model with a non-linear function.
6. Generate a dataset (either for regression or for classification) where most of the input variables are "noise", i.e., they are unrelated to the output.