Wednesday, March 7, 2012

continuous attribute & Complexity_Penalty

I am having one question about discretization of continous attributes values. How does it work? I need this information for my thesis. I have a continous attribut, namely SKS, with range 0-20. When I use Microsoft Decision Tree algorithm, this attribut split in SKS <= 18 and SKS > 18. I want to know how does it find 18 as a number to split not the other. One question again about Microsoft Decision Tree algorithm, about Complexity_Penalty parameter. How does it affect the algorithm? For example, if I set this value=0.1 what does it mean and how does it correspond with growth tree? Thanks a lot before for your kindness to answer my questions.. :-)

The decision trees algorithm does equal-range discretization at each node based on the node support (# of cases) and the actual range of values for the continuous attribute in question at that node. This explains the split example you describe and also why the same attribute may be discretized differently at another point in the tree.

The COMPLEXITY_PENALTY parameter is a floating point number with a range between 0 and 1. It is used to inhibit the growth of the decision tree, the value is subtracted from 1 and used as a factor in determining the likelihood of a split. The deeper the branch of a decision tree, the less likely a split becomes; the complexity penalty influences that likelihood. A low complexity penalty increases the likelihood of a split, while a high complexity penalty decreases the likelihood of a split. The default value is based on the number of attributes for a given model: for 1 to 9 attributes, the value is 0.5; for 10 to 99 attributes, the value is 0.9 and for 100 or more attributes, the value is 0.99.

No comments:

Post a Comment