How to choose what feature to split on at each node? Maximize purity (or minimize impurity)
When do you stop splitting? When a node is 100% one class When splitting a node will result in the tree exceeding a maximum depth When improvements in purity score are below a threshold When number of examples in a node is below a threshold
Measuring purity:
Entropy as a measure of impurity Po = 1-P1 H(P1) = -P_1log_2(P1) - P_olog_2(po) = -P1 log2(P1) - (1 − P1)log2 (1 − P1) Note: “0 log(0)” = 0 if you look in open source packages you may also hear about something called the Gini criteria, which is another function that looks a lot like the entropy function, and that will work well as well for building decision trees.