A class to prune a decision tree using the Cost Complexity method.
(see "Classification and Regression Trees" by Leo Breiman et al)
There are two running modes in CCPruner: (i) one may select a prune strength and prune back the tree \( T_{max}\) until the criterion:
\[ \alpha < \frac{R(T) - R(t)}{|\sim T_t| - 1} \]
is true for all nodes t in \( T \), or (ii) the algorithm finds the sequence of critical points \( \alpha_k < \alpha_{k+1} ... < \alpha_K \) such that \( T_K = root(T_{max}) \) and then selects the optimally-pruned subtree, defined to be the subtree with the best quality index for the validation sample.
Definition at line 62 of file CostComplexityPruneTool.h.
Public Member Functions | |
| CostComplexityPruneTool (SeparationBase *qualityIndex=nullptr) | |
| the constructor for the cost complexity pruning | |
| virtual | ~CostComplexityPruneTool () |
| the destructor for the cost complexity pruning | |
| virtual PruningInfo * | CalculatePruningInfo (DecisionTree *dt, const IPruneTool::EventSample *testEvents=nullptr, Bool_t isAutomatic=kFALSE) |
| the routine that basically "steers" the pruning process. | |
Public Member Functions inherited from TMVA::IPruneTool | |
| IPruneTool () | |
| virtual | ~IPruneTool () |
| Double_t | GetPruneStrength () const |
| Bool_t | IsAutomatic () const |
| void | SetAutomatic () |
| void | SetPruneStrength (Double_t alpha) |
Private Member Functions | |
| void | InitTreePruningMetaData (DecisionTreeNode *n) |
| initialise "meta data" for the pruning, like the "costcomplexity", the critical alpha, the minimal alpha down the tree, etc... for each node!! | |
| MsgLogger & | Log () const |
| output stream to save logging information | |
| void | Optimize (DecisionTree *dt, Double_t weights) |
| after the critical \( \alpha \) values (at which the corresponding nodes would be pruned away) had been established in the "InitMetaData" we need now: automatic pruning: | |
Private Attributes | |
| MsgLogger * | fLogger |
| Int_t | fOptimalK |
| ! the optimal index of the prune sequence | |
| std::vector< DecisionTreeNode * > | fPruneSequence |
| ! map of weakest links (i.e., branches to prune) -> pruning index | |
| std::vector< Double_t > | fPruneStrengthList |
| ! map of alpha -> pruning index | |
| std::vector< Double_t > | fQualityIndexList |
| ! map of R(T) -> pruning index | |
| SeparationBase * | fQualityIndexTool |
| ! the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) } | |
Additional Inherited Members | |
Public Types inherited from TMVA::IPruneTool | |
| typedef std::vector< const Event * > | EventSample |
Protected Attributes inherited from TMVA::IPruneTool | |
| Double_t | B |
| Double_t | fPruneStrength |
| ! regularization parameter in pruning | |
| Double_t | S |
#include <TMVA/CostComplexityPruneTool.h>
| CostComplexityPruneTool::CostComplexityPruneTool | ( | SeparationBase * | qualityIndex = nullptr | ) |
the constructor for the cost complexity pruning
Definition at line 68 of file CostComplexityPruneTool.cxx.
|
virtual |
the destructor for the cost complexity pruning
Definition at line 89 of file CostComplexityPruneTool.cxx.
|
virtual |
the routine that basically "steers" the pruning process.
Call the calculation of the pruning sequence, the tree quality and alike..
Implements TMVA::IPruneTool.
Definition at line 98 of file CostComplexityPruneTool.cxx.
|
private |
initialise "meta data" for the pruning, like the "costcomplexity", the critical alpha, the minimal alpha down the tree, etc... for each node!!
Definition at line 181 of file CostComplexityPruneTool.cxx.
|
inlineprivate |
output stream to save logging information
Definition at line 87 of file CostComplexityPruneTool.h.
|
private |
after the critical \( \alpha \) values (at which the corresponding nodes would be pruned away) had been established in the "InitMetaData" we need now: automatic pruning:
find the value of \( \alpha \) for which the test sample gives minimal error, on the tree with all nodes pruned that have \( \alpha_{critical} < \alpha \), fixed parameter pruning
Definition at line 236 of file CostComplexityPruneTool.cxx.
|
mutableprivate |
Definition at line 86 of file CostComplexityPruneTool.h.
|
private |
! the optimal index of the prune sequence
Definition at line 77 of file CostComplexityPruneTool.h.
|
private |
! map of weakest links (i.e., branches to prune) -> pruning index
Definition at line 73 of file CostComplexityPruneTool.h.
|
private |
! map of alpha -> pruning index
Definition at line 74 of file CostComplexityPruneTool.h.
|
private |
! map of R(T) -> pruning index
Definition at line 75 of file CostComplexityPruneTool.h.
|
private |
! the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) }
Definition at line 71 of file CostComplexityPruneTool.h.