Decision tree model

Intro to Decision tree

Intro

_config.yml

Learning Process

  1. How to choose what feature to split on at each node? Maximize purity (or minimize impurity)
  2. When do you stop splitting? When a node is 100% one class When splitting a node will result in the tree exceeding a maximum depth When improvements in purity score are below a threshold When number of examples in a node is below a threshold

Measuring purity:

Entropy as a measure of impurity Po = 1-P1 H(P1) = -P_1log_2(P1) - P_olog_2(po) = -P1 log2(P1) - (1 − P1)log2 (1 − P1) Note: “0 log(0)” = 0 if you look in open source packages you may also hear about something called the Gini criteria, which is another function that looks a lot like the entropy function, and that will work well as well for building decision trees.

Information Gain

_config.yml _config.yml