## Machine Learning Day

### Lab 0: Data Generation

This first (optional) lab is focused on getting started with MATLAB/Octave and working with data for ML. The goal is to provide basic familiarity with MATLAB syntax, along with some preliminary data generation, processing and visualization.

### MATLAB/Octave resources

The labs are designed for MATLAB/Octave. Below you can find a number of resources to get you started.

- MATLAB getting started tutorial for an introduction to the environment, syntax and conventions.
- MATLAB has very thorough documentation, both online and built in. In the command window, type:
`help functionName`

(check use) or`doc functionName`

(pull up documentation). - Built in tutorials: in the command window enter
`demo`

. - Comprehensive MATLAB reference and introduction: (pdf

- MIT Open CourseWare: Introduction to MATLAB
- Stanford/Coursera Octave Tutorial (video)

- Writing Fast MATLAB Code (pdf): Profiling, JIT, vectorization, etc.
- Stack Overflow: MATLAB tutorial for programmers

### Getting Started

- Get the code file, add the directory to MATLAB path (or set it as current/working directory).
- Use the editor to write/save and run/debug longer scripts and functions.
- Use the command window to try/test commands, view variables and see the use of functions.
- Use
`plot`

(for 1D),`imshow`

,`imagesc`

(for 2D matrices),`scatter`

,`scatter3D`

to visualize variables of different types. - Work your way through the examples below, by following the instructions.

### 1. Optional - MATLAB Warm-up

- Create a column vector
`v = [1; 2; 3]`

and a row vector`u = [1,2,3]`

- What happens with the command
`v'`

? What is the corresponding algebraic/matrix operation? - Create
`z = [5;4;3]`

and try basic numerical operations of addition and subtraction with`v`

. - What happens with
`u + z`

?

- What happens with the command
- Create the matrices
`A = [1 2 3; 4 5 6; 7 8 9]`

and`B = A'`

- What kind of matrix is
`C = A + B`

? - Explore what happens with
`A(:,1)`

,`A(1,:)`

,`A(2:3,:)`

and`A(:)`

.

- What kind of matrix is
- Use the product operator
`*`

- What happens with
`2*u`

,`u*2`

,`2*v`

? - What happens with
`u*v`

and`v*u`

, why? With`A*v`

,`u*A`

and`A*u`

? - Use
`size`

and/or`length`

functions to find the dimensions of vectors and matrices.

- What happens with
- Use the element-wise operators
`.*`

and`./`

, e.g.,`u.*z`

and`z./u`

- What happens with
`v.*z`

and`v./z`

? - Why aren't
`A*A`

and`A.*A`

the same?

- What happens with
- Use the functions
`zeros`

,`ones`

,`rand`

,`randn`

- Create a 3 x 5 matrix of all zeros, all ones or random numbers uniformly distributed between 2 and 3 and random numbers distributed according to a Gaussian of variance 2.

- Use the functions
`eye`

and`diag`

- Create a 3 x 3 identity matrix and a matrix whose diagonal is the vector
`v`

.

- Create a 3 x 3 identity matrix and a matrix whose diagonal is the vector

### 2. Core - Data generation

The function `MixGauss(means, sigmas, n)`

generates datasets where the
distribution of each class is an isotropic Gaussian with a given mean and variance, according to the values in matrices/vectors `means`

and `sigmas`

. Study the function code or type `help MixGauss`

on the MATLAB shell. The function `scatter`

can be used to plot points in 2D.

- Generate and visualize a simple dataset:

`[X, C] = MixGauss([[0;0], [1;1]], [0.5, 0.25], 1000);`

`figure; scatter(X(:,1), X(:,2), 25, C);`

- Generate more complex datasets:
- 4-class dataset: the classes must live in the 2D space and be centered on the corners of the unit square (0,0), (0,1), (1,1), (1,0), all with variance 0.2.
- 2-class dataset: manipulate the data to obtain a 2-class problem where data on opposite corners share the same class.
**Hint**: if you generated the data following the suggested center order, you can use the function`mod`

to quickly obtain two labels, e.g.`Y = mod(C, 2).`

### 3. Optional - Extra practice

- Generate datasets of larger variances, higher dimensionality of input space etc.
- Add noise to the data by flipping the labels of random points.
- For a dataset compute the distances among all input points (use vectorization in your code, avoid using a
`for`

loop). How does the mean distance change with the number of dimensions? - Generate regression data: Consider a regression model defined by a linear function with coefficients
`w`

and Gaussian noise of level (SNR)`delta`

.- Create a MATLAB function with input the number of points
`n`

, the number of dimensions`D`

, the D-dimensional vector`w`

and the scalar`delta`

and output an (n x D) matrix`X`

and an (n x 1) vector`Y`

. - Plot the underlying (linear) function and the noisy output on the same figure.
- Test/visualize the 1-D and 2-D cases, but make the function generic to account for higher dimensional data.

- Create a MATLAB function with input the number of points
- Generate regression data using a 1-D model with a non-linear function.
- Generate a dataset (either for regression or for classification) where most of the input variables are "noise", i.e., they are unrelated to the output.