Topics covered in Supervised Machine Learning

It is usually a good idea to perform feature scaling to help your model converge faster. This is especially true if your input features have widely different ranges of values. For that, you will use the StandardScaler class from scikit-learn. This computes the z-score of your inputs. As a refresher, the z-score is given by the equation:

\[z = \frac{x - \mu}{\sigma}\]

where $\mu$ is the mean of the feature values and $\sigma$ is the standard deviation.

Initialize the class

scaler_linear = StandardScaler()

Compute the mean and standard deviation of the training set then transform it

X_train_scaled = scaler_linear.fit_transform(x_train)

print(f”Computed mean of the training set: {scaler_linear.mean_.squeeze():.2f}”) print(f”Computed standard deviation of the training set: {scaler_linear.scale_.squeeze():.2f}”)

Train the model

Next, you will create and train a regression model. For this lab, you will use the LinearRegression class but take note that there are other linear regressors which you can also use.

Initialize the class

linear_model = LinearRegression()

Train the model

linear_model.fit(X_train_scaled, y_train )

Evaluate the Model

To evaluate the performance of your model, you will measure the error for the training and cross validation sets. For the training error, recall the equation for calculating the mean squared error (MSE):

\[J_{train}(\vec{w}, b) = \frac{1}{2m_{train}}\left[\sum_{i=1}^{m_{train}}(f_{\vec{w},b}(\vec{x}_{train}^{(i)}) - y_{train}^{(i)})^2\right]\]

Scikit-learn also has a built-in mean_squared_error() function that you can use. Take note though that as per the documentation, scikit-learn’s implementation only divides by m and not 2*m, where m is the number of examples.

Feed the scaled training set and get the predictions

yhat = linear_model.predict(X_train_scaled)

Use scikit-learn’s utility function and divide by 2

print(f”training MSE (using sklearn function): {mean_squared_error(y_train, yhat) / 2}”)

You can then compute the MSE for the cross validation set with basically the same equation.

  • Say that your training set has an input feature equal to 500 which is scaled down to 0.5 using the z-score.
  • After training, your model is able to accurately map this scaled input x=0.5 to the target output y=300.
  • Now let’s say that you deployed this model and one of your users fed it a sample equal to 500.
  • If you get this input sample’s z-score using any other values of the mean and standard deviation, then it might not be scaled to 0.5 and your model will most likely make a wrong prediction (i.e. not equal to y=300).

You will scale the cross validation set below by using the same StandardScaler you used earlier but only calling its transform() method instead of fit_transform().

Scale the cross validation set using the mean and standard deviation of the training set

X_cv_scaled = scaler_linear.transform(x_cv)

print(f”Mean used to scale the CV set: {scaler_linear.mean_.squeeze():.2f}”) print(f”Standard deviation used to scale the CV set: {scaler_linear.scale_.squeeze():.2f}”)

Feed the scaled cross validation set

yhat = linear_model.predict(X_cv_scaled)

Use scikit-learn’s utility function and divide by 2

print(f”Cross validation MSE: {mean_squared_error(y_cv, yhat) / 2}”)

Adding Polynomial Features

Create the additional features

First, you will generate the polynomial features from your training set. The code below demonstrates how to do this using the PolynomialFeatures class. It will create a new input feature which has the squared values of the input x (i.e. degree=2).

Instantiate the class to make polynomial features

poly = PolynomialFeatures(degree=2, include_bias=False)

Compute the number of features and transform the training set

X_train_mapped = poly.fit_transform(x_train)

Preview the first 5 elements of the new training set. Left column is x and right column is x^2

Note: The e+<number> in the output denotes how many places the decimal point should

be moved. For example, 3.24e+03 is equal to 3240

print(X_train_mapped[:5]) You will then scale the inputs as before to narrow down the range of values.

Instantiate the class

scaler_poly = StandardScaler()

Compute the mean and standard deviation of the training set then transform it

X_train_mapped_scaled = scaler_poly.fit_transform(X_train_mapped)

Preview the first 5 elements of the scaled training set.

print(X_train_mapped_scaled[:5]) You can then proceed to train the model. After that, you will measure the model’s performance against the cross validation set. Like before, you should make sure to perform the same transformations as you did in the training set. You will add the same number of polynomial features then scale the range of values.

Initialize the class

model = LinearRegression()

Train the model

model.fit(X_train_mapped_scaled, y_train )

Compute the training MSE

yhat = model.predict(X_train_mapped_scaled) print(f”Training MSE: {mean_squared_error(y_train, yhat) / 2}”)

Add the polynomial features to the cross validation set

X_cv_mapped = poly.transform(x_cv)

Scale the cross validation set using the mean and standard deviation of the training set

X_cv_mapped_scaled = scaler_poly.transform(X_cv_mapped)

Compute the cross validation MSE

yhat = model.predict(X_cv_mapped_scaled) print(f”Cross validation MSE: {mean_squared_error(y_cv, yhat) / 2}”)

You can create a loop that contains all the steps in the previous code cells. Here is one implementation that adds polynomial features up to degree=10.

Initialize lists containing the lists, models, and scalers

train_mses = [] cv_mses = [] models = [] scalers = []

Loop over 10 times. Each adding one more degree of polynomial higher than the last.

for degree in range(1,11):

# Add polynomial features to the training set
poly = PolynomialFeatures(degree, include_bias=False)
X_train_mapped = poly.fit_transform(x_train)

# Scale the training set
scaler_poly = StandardScaler()
X_train_mapped_scaled = scaler_poly.fit_transform(X_train_mapped)
scalers.append(scaler_poly)

# Create and train the model
model = LinearRegression()
model.fit(X_train_mapped_scaled, y_train )
models.append(model)

# Compute the training MSE
yhat = model.predict(X_train_mapped_scaled)
train_mse = mean_squared_error(y_train, yhat) / 2
train_mses.append(train_mse)

# Add polynomial features and scale the cross validation set
poly = PolynomialFeatures(degree, include_bias=False)
X_cv_mapped = poly.fit_transform(x_cv)
X_cv_mapped_scaled = scaler_poly.transform(X_cv_mapped)

# Compute the cross validation MSE
yhat = model.predict(X_cv_mapped_scaled)
cv_mse = mean_squared_error(y_cv, yhat) / 2
cv_mses.append(cv_mse)

Plot the results

degrees=range(1,11) utils.plot_train_cv_mses(degrees, train_mses, cv_mses, title=”degree of polynomial vs. train and CV MSEs”)

Get the model with the lowest CV MSE (add 1 because list indices start at 0)

This also corresponds to the degree of the polynomial added

degree = np.argmin(cv_mses) + 1 print(f”Lowest CV MSE is found in the model with degree={degree}”)