Map > Data Science > Predicting the Future > Modeling > Classification > Decision Tree > Overfitting | ||||||||||||||
Decision Tree - Overfitting |
||||||||||||||
Overfitting is a significant practical difficulty for decision tree
models and many other predictive models. Overfitting happens when the learning algorithm continues
to develop
hypotheses that reduce training set error at the cost of an increased test set error. There are several approaches to avoiding overfitting in building decision trees. |
||||||||||||||
|
||||||||||||||
Practically, the second approach of post-pruning overfit trees is more successful because it is not easy to precisely estimate when to stop growing the tree. | ||||||||||||||
The important step of tree pruning is to define a criterion be used to determine the correct final tree size using one of the following methods: | ||||||||||||||
|
||||||||||||||
The first method is the most common approach. In this approach, the available data are separated into two sets of examples: a training set, which is used to build the decision tree, and a validation set, which is used to evaluate the impact of pruning the tree. The second method is also a common approach. Here, we explain the error estimation and Chi2 test. | ||||||||||||||
Post-pruning using Error estimation |
||||||||||||||
Error estimate for a sub-tree is weighted sum of error estimates for all its leaves. The error estimate (e) for a node is: | ||||||||||||||
In the following example we set Z to 0.69 which is equal to a confidence level of 75%. | ||||||||||||||
The error rate at the parent node is 0.46 and since the error rate for its children (0.51) increases with the split, we do not want to keep the children. | ||||||||||||||
Post-pruning using Chi2 test | ||||||||||||||
In Chi2 test we construct the corresponding frequency table and calculate the Chi2 value and its probability. | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
Chi2 = 0.21 Probability = 0.90 degree of freedom=2 |
||||||||||||||
If we require that the probability has to be less than a limit (e.g., 0.05), therefore we decide not to split the node. | ||||||||||||||