Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
BernsteinCorrection.cxx
Go to the documentation of this file.
1// @(#)root/roostats:$Id$
2// Author: Kyle Cranmer 28/07/2008
3
4/*************************************************************************
5 * Copyright (C) 1995-2008, Rene Brun and Fons Rademakers. *
6 * All rights reserved. *
7 * *
8 * For the licensing terms see $ROOTSYS/LICENSE. *
9 * For the list of contributors see $ROOTSYS/README/CREDITS. *
10 *************************************************************************/
11
12/** \class RooStats::BernsteinCorrection
13 \ingroup Roostats
14
15BernsteinCorrection is a utility in RooStats to augment a nominal PDF with a polynomial
16correction term. This is useful for incorporating systematic variations to the nominal PDF.
17The Bernstein basis polynomials are particularly appropriate because they are positive definite.
18
19This tool was inspired by the work of Glen Cowan together with Stephan Horner, Sascha Caron,
20Eilam Gross, and others.
21The initial implementation is independent work. The major step forward in the approach was
22to provide a well defined algorithm that specifies the order of polynomial to be included
23in the correction. This is an empirical algorithm, so in addition to the nominal model it
24needs either a real data set or a simulated one. In the early work, the nominal model was taken
25to be a histogram from Monte Carlo simulations, but in this implementation it is generalized to an
26arbitrary PDF (which includes a RooHistPdf). The algorithm basically consists of a
27hypothesis test of an nth-order correction (null) against a n+1-th order correction (alternate).
28The quantity q = -2 log LR is used to determine whether the n+1-th order correction is a major
29improvement to the n-th order correction. The distribution of q is expected to be roughly
30\f$\chi^2\f$ with one degree of freedom if the n-th order correction is a good model for the data.
31 Thus, one only moves to the n+1-th order correction of q is relatively large. The chance that
32one moves from the n-th to the n+1-th order correction when the n-th order correction
33(eg. a type 1 error) is sufficient is given by the Prob(\f$\chi^2_1\f$ > threshold). The constructor
34of this class allows you to directly set this tolerance (in terms of probability that the n+1-th
35 term is added unnecessarily).
36
37To do:
38Add another method to the utility that will make the sampling distribution for -2 log lambda
39for various m vs. m+1 order corrections using a nominal model and perhaps having two ways of
40generating the toys (either via a histogram or via an independent model that is supposed to
41 reflect reality). That will allow one to make plots like Glen has at the end of his DRAFT
42 very easily.
43
44*/
45
46
48
49#include "RooGlobalFunc.h"
50#include "RooDataSet.h"
51#include "RooRealVar.h"
52#include "RooAbsPdf.h"
53#include "RooFitResult.h"
54#include "TMath.h"
55#include <string>
56#include <vector>
57#include <cstdio>
58#include <sstream>
59#include <iostream>
60
61#include "RooEffProd.h"
62#include "RooWorkspace.h"
63#include "RooDataHist.h"
64#include "RooHistPdf.h"
65
66#include "RooBernstein.h"
67
69
70
71
72using namespace RooFit;
73using namespace RooStats;
74using std::vector;
75
76////////////////////////////////////////////////////////////////////////////////
77
79 fMaxDegree(10), fMaxCorrection(100), fTolerance(tolerance){
80}
81
82
83////////////////////////////////////////////////////////////////////////////////
84/// Main method for Bernstein correction.
85
87 const char* nominalName,
88 const char* varName,
89 const char* dataName){
90 // get ingredients out of workspace
91 RooRealVar* x = wks->var(varName);
92 RooAbsPdf* nominal = wks->pdf(nominalName);
93 RooAbsData* data = wks->data(dataName);
94
95 if (!x || !nominal || !data) {
96 std::cout << "Error: wrong name for pdf or variable or dataset - return -1 " << std::endl;
97 return -1;
98 }
99
100 std::cout << "BernsteinCorrection::ImportCorrectedPdf - Doing initial Fit with nominal model " << std::endl;
101
102 // initialize alg, by checking how well nominal model fits
104
105 std::unique_ptr<RooFitResult> nominalResult{nominal->fitTo(*data,Save(),Minos(false), Hesse(false),PrintLevel(printLevel))};
106 double lastNll= nominalResult->minNll();
107
108 if (nominalResult->status() != 0 ) {
109 std::cout << "BernsteinCorrection::ImportCorrectedPdf - Error fit with nominal model failed - exit" << std::endl;
110 return -1;
111 }
112
113 // setup a log
114 std::stringstream log;
115 log << "------ Begin Bernstein Correction Log --------" << std::endl;
116
117 // Local variables that we want to keep in scope after loop
119 vector<RooRealVar*> coefficients;
120 double q = 1E6;
121 Int_t degree = -1;
122
123 // The while loop
124 bool keepGoing = true;
125 while( keepGoing ) {
126 degree++;
127
128 // we need to generate names for vars on the fly
129 std::stringstream str;
130 str<<"_"<<degree;
131
132 RooRealVar* newCoef = new RooRealVar(("c"+str.str()).c_str(),
133 "Bernstein basis poly coefficient",
134 1.0, 0., fMaxCorrection);
135 coeff.add(*newCoef);
136 coefficients.push_back(newCoef);
137 // Since pdf is normalized - coefficient for degree 0 is fixed to be 1
138 if (degree == 0) {
139 newCoef->setVal(1);
140 newCoef->setConstant(true);
141 continue;
142 }
143
144 // make the polynomial correction term
145 RooBernstein* poly = new RooBernstein("poly", "Bernstein poly", *x, coeff);
146
147 // make the corrected PDF = nominal * poly
148 RooEffProd* corrected = new RooEffProd("corrected","",*nominal,*poly);
149
150 // check to see how well this correction fits
151 std::unique_ptr<RooFitResult> result{corrected->fitTo(*data,Save(),Minos(false), Hesse(false),PrintLevel(printLevel))};
152
153 if (result->status() != 0) {
154 std::cout << "BernsteinCorrection::ImportCorrectedPdf - Error fit with corrected model failed" << std::endl;
155 return -1;
156 }
157
158
159 // Hypothesis test between previous correction (null)
160 // and this one (alternate). Use -2 log LR for test statistic
161 q = 2*(lastNll - result->minNll()); // -2 log lambda, goes like significance^2
162 // check if we should keep going based on rate of Type I error
163 keepGoing = (degree < 1 || TMath::Prob(q,1) < fTolerance );
164 if (degree >= fMaxDegree) keepGoing = false;
165
166 if(!keepGoing){
167 // terminate loop, import corrected PDF
168 //RooMsgService::instance().setGlobalKillBelow(RooFit::FATAL) ;
169 wks->import(*corrected);
170 //RooMsgService::instance().setGlobalKillBelow(RooFit::DEBUG) ;
171 } else {
172 // memory management
173 delete corrected;
174 delete poly;
175 }
176
177 // for the log
178 if(degree != 0){
179 log << "degree = " << degree
180 << " -log L("<<degree-1<<") = " << lastNll
181 << " -log L(" << degree <<") = " << result->minNll()
182 << " q = " << q
183 << " P(chi^2_1 > q) = " << TMath::Prob(q,1) << std::endl;
184 }
185
186 // update last result for next iteration in loop
187 lastNll = result->minNll();
188 }
189
190 log << "------ End Bernstein Correction Log --------" << std::endl;
191 std::cout << log.str();
192
193 return degree;
194}
195
196
197////////////////////////////////////////////////////////////////////////////////
198/// Create sampling distribution for q given degree-1 vs. degree corrections
199
201 const char* nominalName,
202 const char* varName,
203 const char* dataName,
206 Int_t degree,
207 Int_t nToys){
208 // get ingredients out of workspace
209 RooRealVar* x = wks->var(varName);
210 RooAbsPdf* nominal = wks->pdf(nominalName);
211 RooAbsData* data = wks->data(dataName);
212
213 if (!x || !nominal || !data) {
214 std::cout << "Error: wrong name for pdf or variable or dataset ! " << std::endl;
215 return;
216 }
217
218 // setup a log
219 std::stringstream log;
220 log << "------ Begin Bernstein Correction Log --------" << std::endl;
221
222 // Local variables that we want to keep in scope after loop
223 RooArgList coeff; // n-th degree correction
224 RooArgList coeffNull; // n-1 correction
225 RooArgList coeffExtra; // n+1 correction
226 vector<RooRealVar*> coefficients;
227
228 //cout << "make coefs" << std::endl;
229 for(int i = 0; i<=degree+1; ++i) {
230 // we need to generate names for vars on the fly
231 std::stringstream str;
232 str<<"_"<<i;
233
234 RooRealVar* newCoef = new RooRealVar(("c"+str.str()).c_str(),
235 "Bernstein basis poly coefficient",
236 1., 0., fMaxCorrection);
237
238 // keep three sets of coefficients for n-1, n, n+1 corrections
239 if(i<degree) coeffNull.add(*newCoef);
240 if(i<=degree) coeff.add(*newCoef);
241 coeffExtra.add(*newCoef);
242 coefficients.push_back(newCoef);
243 }
244
245 // make the polynomial correction term
247 = new RooBernstein("poly", "Bernstein poly", *x, coeff);
248
249 // make the polynomial correction term
251 = new RooBernstein("polyNull", "Bernstein poly", *x, coeffNull);
252
253 // make the polynomial correction term
255 = new RooBernstein("polyExtra", "Bernstein poly", *x, coeffExtra);
256
257 // make the corrected PDF = nominal * poly
259 = new RooEffProd("corrected","",*nominal,*poly);
260
262 = new RooEffProd("correctedNull","",*nominal,*polyNull);
263
265 = new RooEffProd("correctedExtra","",*nominal,*polyExtra);
266
267
268 std::cout << "made pdfs, make toy generator" << std::endl;
269
270 // make a PDF to generate the toys
271 RooDataHist dataHist("dataHist","",*x,*data);
272 RooHistPdf toyGen("toyGen","",*x,dataHist);
273
275
277 if (printLevel < 0) {
278 RooMsgService::instance().setGlobalKillBelow(RooFit::FATAL);
279 }
280
281
282 // TH1F* samplingDist = new TH1F("samplingDist","",20,0,10);
283 double q = 0;
284 double qExtra = 0;
285 // do toys
286 for (int i = 0; i < nToys; ++i) {
287 std::cout << "on toy " << i << std::endl;
288
289 std::unique_ptr<RooDataSet> tmpData{toyGen.generate(*x, data->numEntries())};
290 // check to see how well this correction fits
291 std::unique_ptr<RooFitResult> result{
292 corrected->fitTo(*tmpData, Save(), Minos(false), Hesse(false), PrintLevel(printLevel))};
293
294 std::unique_ptr<RooFitResult> resultNull{
295 correctedNull->fitTo(*tmpData, Save(), Minos(false), Hesse(false), PrintLevel(printLevel))};
296
297 std::unique_ptr<RooFitResult> resultExtra{
298 correctedExtra->fitTo(*tmpData, Save(), Minos(false), Hesse(false), PrintLevel(printLevel))};
299
300 // Hypothesis test between previous correction (null)
301 // and this one (alternate). Use -2 log LR for test statistic
302 q = 2 * (resultNull->minNll() - result->minNll());
303
304 qExtra = 2 * (result->minNll() - resultExtra->minNll());
305
306 samplingDist->Fill(q);
308 if (printLevel > 0) {
309 std::cout << "NLL Results: null " << resultNull->minNll() << " ref = " << result->minNll() << " extra"
310 << resultExtra->minNll() << std::endl;
311 }
312 }
313
314 RooMsgService::instance().setGlobalKillBelow(msglevel);
315
316 // return samplingDist;
317}
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t result
float * q
Abstract base class for binned and unbinned datasets.
Definition RooAbsData.h:57
Abstract interface for all probability density functions.
Definition RooAbsPdf.h:40
RooFit::OwningPtr< RooFitResult > fitTo(RooAbsData &data, CmdArgs_t const &... cmdArgs)
Fit PDF to given dataset.
Definition RooAbsPdf.h:157
RooArgList is a container object that can hold multiple RooAbsArg objects.
Definition RooArgList.h:22
Bernstein basis polynomials are positive-definite in the range [0,1].
Container class to hold N-dimensional binned data.
Definition RooDataHist.h:40
The class RooEffProd implements the product of a PDF with an efficiency function.
Definition RooEffProd.h:19
A probability density function sampled from a multidimensional histogram.
Definition RooHistPdf.h:30
static RooMsgService & instance()
Return reference to singleton instance.
Variable that can be changed from the outside.
Definition RooRealVar.h:37
Int_t ImportCorrectedPdf(RooWorkspace *, const char *, const char *, const char *)
Main method for Bernstein correction.
double fMaxCorrection
maximum correction factor at any point (default is 100)
BernsteinCorrection(double tolerance=0.05)
double fTolerance
probability to add an unnecessary term
Int_t fMaxDegree
maximum polynomial degree correction (default is 10)
void CreateQSamplingDist(RooWorkspace *wks, const char *nominalName, const char *varName, const char *dataName, TH1F *, TH1F *, Int_t degree, Int_t nToys=500)
Create sampling distribution for q given degree-1 vs. degree corrections.
Persistable container for RooFit projects.
1-D histogram with a float per channel (see TH1 documentation)
Definition TH1.h:877
RooCmdArg Hesse(bool flag=true)
RooCmdArg Save(bool flag=true)
RooCmdArg Minos(bool flag=true)
RooCmdArg PrintLevel(Int_t code)
Double_t x[n]
Definition legend1.C:17
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition CodegenImpl.h:65
MsgLevel
Verbosity level for RooMsgService::StreamConfig in RooMsgService.
Namespace for the RooStats classes.
Definition CodegenImpl.h:59
Double_t Prob(Double_t chi2, Int_t ndf)
Computation of the probability for a certain Chi-squared (chi2) and number of degrees of freedom (ndf...
Definition TMath.cxx:637