5002 Decision Tree Memo
Type of Decision Tree
- ID3 Information gain
- C4.5 Information gain normalized by entropy(split info).
- CART
Entropy
To measure how informative a distribution is. The formula is as follows: $\text{Entropy} = - \sum p log p$ Here, the logarithm is based on a base of two. The greater the entropy is, the less informative the distribution is. For example:
P(tail) = 0.5 P(tail) = 0.5 entropy = 1
P(tail) = 1 P(tail) = 0 entropy =
We assume 0log 0 = 0 in the calculation
Information gain
Suppose we get a table having columns: race, income, label
We calculate the base information(entropy): Info(label) = entropy(p_yes, p_no)
We calculate the information associated with an attribute. Info(race) = p_black * entropy(p_yes, p_no | black) + p_white * entropy(p_yes, p_no | white) Here, p_black means the proportion of the black attribute
And we can calculate the information gain by: Gain(race) = Info(label) - Info(race)
We should choose the one with the greatest gain.
C4.5
We use a new measurement: Gain(race) = Info(label) - Info(race) / entropy(race)
Comparison with ID3
ID3 tends to choose the attributes with more values. C4.5 tries to penalize attributes with more values.
CART
It uses gini index to calculate the Info. gini(p_i, …) = 1 - \sum p_i^2 Info(label) = gini(p_yes, p_no) Info(white) = gini(p_yes, p_no | white) Info(black) = gini(p_yes, p_no | black) Info(race) = p_black * Info(black) + p_white * Info(white) Gain(race) = Info(label) - Info(race)
Intuition of Gini
Suppose we get two kinds of balls: x and y. The probability of picking x is p_x, and correspondingly p_y. The probability of taking two balls of different kinds is: 1 - P_x^2 - P_y^2.