Supervised Machine Learning
Topics covered in Supervised Machine Learning
It is usually a good idea to perform feature scaling to help your model converge faster. This is especially true if your input features have widely different ranges of values. For that, you will use the StandardScaler
class from scikit-learn. This computes the z-score of your inputs. As a refresher, the z-score is given by the equation:
where $\mu$ is the mean of the feature values and $\sigma$ is the standard deviation.
Initialize the class
scaler_linear = StandardScaler()
Compute the mean and standard deviation of the training set then transform it
X_train_scaled = scaler_linear.fit_transform(x_train)
print(f”Computed mean of the training set: {scaler_linear.mean_.squeeze():.2f}”) print(f”Computed standard deviation of the training set: {scaler_linear.scale_.squeeze():.2f}”)
Train the model
Next, you will create and train a regression model. For this lab, you will use the LinearRegression class but take note that there are other linear regressors which you can also use.
Initialize the class
linear_model = LinearRegression()
Train the model
linear_model.fit(X_train_scaled, y_train )
Evaluate the Model
To evaluate the performance of your model, you will measure the error for the training and cross validation sets. For the training error, recall the equation for calculating the mean squared error (MSE):
\[J_{train}(\vec{w}, b) = \frac{1}{2m_{train}}\left[\sum_{i=1}^{m_{train}}(f_{\vec{w},b}(\vec{x}_{train}^{(i)}) - y_{train}^{(i)})^2\right]\]Scikit-learn also has a built-in mean_squared_error()
function that you can use. Take note though that as per the documentation, scikit-learn’s implementation only divides by m
and not 2*m
, where m
is the number of examples.
Feed the scaled training set and get the predictions
yhat = linear_model.predict(X_train_scaled)
Use scikit-learn’s utility function and divide by 2
print(f”training MSE (using sklearn function): {mean_squared_error(y_train, yhat) / 2}”)
You can then compute the MSE for the cross validation set with basically the same equation.
- Say that your training set has an input feature equal to
500
which is scaled down to0.5
using the z-score. - After training, your model is able to accurately map this scaled input
x=0.5
to the target outputy=300
. - Now let’s say that you deployed this model and one of your users fed it a sample equal to
500
. - If you get this input sample’s z-score using any other values of the mean and standard deviation, then it might not be scaled to
0.5
and your model will most likely make a wrong prediction (i.e. not equal toy=300
).
You will scale the cross validation set below by using the same StandardScaler
you used earlier but only calling its transform()
method instead of fit_transform()
.
Scale the cross validation set using the mean and standard deviation of the training set
X_cv_scaled = scaler_linear.transform(x_cv)
print(f”Mean used to scale the CV set: {scaler_linear.mean_.squeeze():.2f}”) print(f”Standard deviation used to scale the CV set: {scaler_linear.scale_.squeeze():.2f}”)
Feed the scaled cross validation set
yhat = linear_model.predict(X_cv_scaled)
Use scikit-learn’s utility function and divide by 2
print(f”Cross validation MSE: {mean_squared_error(y_cv, yhat) / 2}”)
Adding Polynomial Features
Create the additional features
First, you will generate the polynomial features from your training set. The code below demonstrates how to do this using the PolynomialFeatures
class. It will create a new input feature which has the squared values of the input x
(i.e. degree=2).
Instantiate the class to make polynomial features
poly = PolynomialFeatures(degree=2, include_bias=False)
Compute the number of features and transform the training set
X_train_mapped = poly.fit_transform(x_train)
Preview the first 5 elements of the new training set. Left column is x
and right column is x^2
Note: The e+<number>
in the output denotes how many places the decimal point should
be moved. For example, 3.24e+03
is equal to 3240
print(X_train_mapped[:5]) You will then scale the inputs as before to narrow down the range of values.
Instantiate the class
scaler_poly = StandardScaler()
Compute the mean and standard deviation of the training set then transform it
X_train_mapped_scaled = scaler_poly.fit_transform(X_train_mapped)
Preview the first 5 elements of the scaled training set.
print(X_train_mapped_scaled[:5]) You can then proceed to train the model. After that, you will measure the model’s performance against the cross validation set. Like before, you should make sure to perform the same transformations as you did in the training set. You will add the same number of polynomial features then scale the range of values.
Initialize the class
model = LinearRegression()
Train the model
model.fit(X_train_mapped_scaled, y_train )
Compute the training MSE
yhat = model.predict(X_train_mapped_scaled) print(f”Training MSE: {mean_squared_error(y_train, yhat) / 2}”)
Add the polynomial features to the cross validation set
X_cv_mapped = poly.transform(x_cv)
Scale the cross validation set using the mean and standard deviation of the training set
X_cv_mapped_scaled = scaler_poly.transform(X_cv_mapped)
Compute the cross validation MSE
yhat = model.predict(X_cv_mapped_scaled) print(f”Cross validation MSE: {mean_squared_error(y_cv, yhat) / 2}”)
You can create a loop that contains all the steps in the previous code cells. Here is one implementation that adds polynomial features up to degree=10.
Initialize lists containing the lists, models, and scalers
train_mses = [] cv_mses = [] models = [] scalers = []
Loop over 10 times. Each adding one more degree of polynomial higher than the last.
for degree in range(1,11):
# Add polynomial features to the training set
poly = PolynomialFeatures(degree, include_bias=False)
X_train_mapped = poly.fit_transform(x_train)
# Scale the training set
scaler_poly = StandardScaler()
X_train_mapped_scaled = scaler_poly.fit_transform(X_train_mapped)
scalers.append(scaler_poly)
# Create and train the model
model = LinearRegression()
model.fit(X_train_mapped_scaled, y_train )
models.append(model)
# Compute the training MSE
yhat = model.predict(X_train_mapped_scaled)
train_mse = mean_squared_error(y_train, yhat) / 2
train_mses.append(train_mse)
# Add polynomial features and scale the cross validation set
poly = PolynomialFeatures(degree, include_bias=False)
X_cv_mapped = poly.fit_transform(x_cv)
X_cv_mapped_scaled = scaler_poly.transform(X_cv_mapped)
# Compute the cross validation MSE
yhat = model.predict(X_cv_mapped_scaled)
cv_mse = mean_squared_error(y_cv, yhat) / 2
cv_mses.append(cv_mse)
Plot the results
degrees=range(1,11) utils.plot_train_cv_mses(degrees, train_mses, cv_mses, title=”degree of polynomial vs. train and CV MSEs”)
Get the model with the lowest CV MSE (add 1 because list indices start at 0)
This also corresponds to the degree of the polynomial added
degree = np.argmin(cv_mses) + 1 print(f”Lowest CV MSE is found in the model with degree={degree}”)