The decision trees algorithm does equal-range discretization at each node based on the node support (# of cases) and the actual range of values for the continuous attribute in question at that node. This explains the split example you describe and also why the same attribute may be discretized differently at another point in the tree.
The COMPLEXITY_PENALTY parameter is a floating point number with a range between 0 and 1. It is used to inhibit the growth of the decision tree, the value is subtracted from 1 and used as a factor in determining the likelihood of a split. The deeper the branch of a decision tree, the less likely a split becomes; the complexity penalty influences that likelihood. A low complexity penalty increases the likelihood of a split, while a high complexity penalty decreases the likelihood of a split. The default value is based on the number of attributes for a given model: for 1 to 9 attributes, the value is 0.5; for 10 to 99 attributes, the value is 0.9 and for 100 or more attributes, the value is 0.99.
No comments:
Post a Comment