A helper class to prune a decision tree using the Cost Complexity method (see Classification and Regression Trees by Leo Breiman et al)
There are two running modes in CCPruner: (i) one may select a prune strength and prune back the tree \( T_{max}\) until the criterion:
\[ \alpha < \frac{R(T) - R(t)}{|\sim T_t| - 1} \]
is true for all nodes t in \( T \), or (ii) the algorithm finds the sequence of critical points \( \alpha_k < \alpha_{k+1} ... < \alpha_K \) such that \( T_K = root(T_{max}) \) and then selects the optimally-pruned subtree, defined to be the subtree with the best quality index for the validation sample.
Definition at line 62 of file CCPruner.h.
Public Types | |
| typedef std::vector< Event * > | EventList |
Public Member Functions | |
| CCPruner (DecisionTree *t_max, const DataSet *validationSample, SeparationBase *qualityIndex=nullptr) | |
| constructor | |
| CCPruner (DecisionTree *t_max, const EventList *validationSample, SeparationBase *qualityIndex=nullptr) | |
| constructor | |
| ~CCPruner () | |
| std::vector< TMVA::DecisionTreeNode * > | GetOptimalPruneSequence () const |
| return the prune strength (=alpha) corresponding to the prune sequence | |
| Float_t | GetOptimalPruneStrength () const |
| Float_t | GetOptimalQualityIndex () const |
| void | Optimize () |
| determine the pruning sequence | |
| void | SetPruneStrength (Float_t alpha=-1.0) |
Private Attributes | |
| Float_t | fAlpha |
| ! regularization parameter in CC pruning | |
| Bool_t | fDebug |
| ! debug flag | |
| Int_t | fOptimalK |
| ! index of the optimal tree in the pruned tree sequence | |
| Bool_t | fOwnQIndex |
| ! flag indicates if fQualityIndex is owned by this | |
| std::vector< TMVA::DecisionTreeNode * > | fPruneSequence |
| ! map of weakest links (i.e., branches to prune) -> pruning index | |
| std::vector< Float_t > | fPruneStrengthList |
| ! map of alpha -> pruning index | |
| SeparationBase * | fQualityIndex |
| ! the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) } | |
| std::vector< Float_t > | fQualityIndexList |
| ! map of R(T) -> pruning index | |
| DecisionTree * | fTree |
| ! (pruned) decision tree | |
| const DataSet * | fValidationDataSet |
| ! the event sample to select the optimally-pruned tree | |
| const EventList * | fValidationSample |
| ! the event sample to select the optimally-pruned tree | |
#include <TMVA/CCPruner.h>
| typedef std::vector<Event*> TMVA::CCPruner::EventList |
Definition at line 64 of file CCPruner.h.
| CCPruner::CCPruner | ( | DecisionTree * | t_max, |
| const EventList * | validationSample, | ||
| SeparationBase * | qualityIndex = nullptr |
||
| ) |
constructor
Definition at line 69 of file CCPruner.cxx.
| CCPruner::CCPruner | ( | DecisionTree * | t_max, |
| const DataSet * | validationSample, | ||
| SeparationBase * | qualityIndex = nullptr |
||
| ) |
constructor
Definition at line 92 of file CCPruner.cxx.
| CCPruner::~CCPruner | ( | ) |
Definition at line 115 of file CCPruner.cxx.
| std::vector< DecisionTreeNode * > CCPruner::GetOptimalPruneSequence | ( | ) | const |
return the prune strength (=alpha) corresponding to the prune sequence
Definition at line 240 of file CCPruner.cxx.
|
inline |
Definition at line 89 of file CCPruner.h.
|
inline |
Definition at line 85 of file CCPruner.h.
| void CCPruner::Optimize | ( | ) |
determine the pruning sequence
Definition at line 124 of file CCPruner.cxx.
|
inline |
Definition at line 110 of file CCPruner.h.
|
private |
! regularization parameter in CC pruning
Definition at line 93 of file CCPruner.h.
|
private |
! debug flag
Definition at line 106 of file CCPruner.h.
|
private |
! index of the optimal tree in the pruned tree sequence
Definition at line 105 of file CCPruner.h.
|
private |
! flag indicates if fQualityIndex is owned by this
Definition at line 97 of file CCPruner.h.
|
private |
! map of weakest links (i.e., branches to prune) -> pruning index
Definition at line 101 of file CCPruner.h.
|
private |
! map of alpha -> pruning index
Definition at line 102 of file CCPruner.h.
|
private |
! the quality index used to calculate R(t), R(T) = sum[t in ~T]{ R(t) }
Definition at line 96 of file CCPruner.h.
|
private |
! map of R(T) -> pruning index
Definition at line 103 of file CCPruner.h.
|
private |
! (pruned) decision tree
Definition at line 99 of file CCPruner.h.
|
private |
! the event sample to select the optimally-pruned tree
Definition at line 95 of file CCPruner.h.
|
private |
! the event sample to select the optimally-pruned tree
Definition at line 94 of file CCPruner.h.