Evaluation Technique
J_train not being too high indicates this doesn’t have a high bias problem and J_cv not being much worse than J_train this indicates that it doesn’t have a high variance problem either.
If your learning algorithm has high bias, the key indicator will be if J train is high.
the key indicator for high-variance will be if J_cv is much greater than J train does
if you’re training a neural network, there are some applications where unfortunately you have high bias and high variance
One way to recognize that situation will be if J train is high, so you’re not doing that well on the training set, but even worse, the cross-validation error is again, even much larger than the training set. Meaning we fit the training set really well and we overfit in part of the input, and we don’t even fit the training data well, and we underfit the part of the input.
the value of Lambda is the regularization parameter that controls how much you trade-off keeping the parameters w small versus fitting the training data well.
if Lambda were very large, then the algorithm is highly motivated to keep these parameters w very small and so you end up with w_1, w_2, really all of these parameters will be very close to zero.(d=0)
setting Lambda equals zero you end up with that curve that overfits the data.
By trying out a large range of possible values for Lambda, fitting parameters using those different regularization parameters, and then evaluating the performance on the cross-validation set, you can then try to pick what is the best value for the regularization parameter.
In order to judge if the training error is high, it turns out to be more useful to see if the training error is much higher than a human level of performance
how well humans can do on this task because humans are really good at understanding speech data, or processing images or understanding texts.
some competing algorithm, maybe a previous implementation that someone else has implemented or even a competitor’s algorithm
-guess based on prior experience
I know that we’re used to thinking that having more data is good, but if your algorithm has high bias, then if the only thing you do is throw more training data at it, that by itself will not ever let you bring down the error rate that much. fix a high bias problem
If a learning algorithm suffers from high variance, then getting more training data is indeed likely to help. Because extrapolating to the right of this curve, you see that you can expect J cv to keep on coming down. fix a high variance problem:
Bias Variance tradeoff
large neural networks when trained on small / moderate sized datasets are low bias machines.
A large neural network with well-chosen regularization, well usually do as well or better than a smaller one. if you were to regularize this larger neural network appropriately, then this larger neural network usually will do at least as well or better than the smaller one. So long as the regularization has chosen appropriately. TF lets you choose different values of lambda for different layers although for simplicity you can choose the same value of lambda for all the weights and all of the different layers as follows. And then this will allow you to implement regularization in your neural network.