Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
_rdataframe.pyzdoc
Go to the documentation of this file.
2
3You can use RDataFrame in Python thanks to the dynamic Python/C++ translation of [PyROOT](https://root.cern/manual/python). In general, the interface
5
6~~~{.py}
7df = ROOT.RDataFrame("myTree", "myFile.root")
8sum = df.Filter("x > 10").Sum("y")
9print(sum.GetValue())
10~~~
11
12### User code in the RDataFrame workflow
13
14#### C++ code
15
16In the simple example that was shown above, a C++ expression is passed to the Filter() operation as a string
17(`"x > 0"`), even if we call the method from Python. Indeed, under the hood, the analysis computations run in
19
20To perform more complex operations that don't fit into a simple expression string, you can just-in-time compile
21C++ functions - via the C++ interpreter cling - and use those functions in an expression. See the following
23
24~~~{.py}
25# JIT a C++ function from Python
26ROOT.gInterpreter.Declare("""
27bool myFilter(float x) {
28 return x > 10;
29}
30""")
31
32df = ROOT.RDataFrame("myTree", "myFile.root")
33# Use the function in an RDF operation
34sum = df.Filter("myFilter(x)").Sum("y")
35print(sum.GetValue())
36~~~
37
40
41~~~{.py}
42ROOT.gSystem.Load("path/to/myLibrary.so") # Library with the myFilter function
43ROOT.gInterpreter.Declare('#include "myLibrary.h"') # Header with the declaration of the myFilter function
44df = ROOT.RDataFrame("myTree", "myFile.root")
45sum = df.Filter("myFilter(x)").Sum("y")
46print(sum.GetValue())
47~~~
48
49A more thorough explanation of how to use C++ code from Python can be found in the [PyROOT manual](https://root.cern/manual/python/#loading-user-libraries-and-just-in-time-compilation-jitting).
50
51#### Python code
52
53ROOT also offers the option to compile Python functions with fundamental types and arrays thereof using [Numba](https://numba.pydata.org/).
54Such compiled functions can then be used in a C++ expression provided to RDataFrame.
55
58
59~~~{.py}
60@ROOT.Numba.Declare(["float"], "bool")
62 return x > 10
63
64df = ROOT.RDataFrame("myTree", "myFile.root")
65sum = df.Filter("Numba::myFilter(x)").Sum("y")
66print(sum.GetValue())
67~~~
68
70
71~~~{.py}
72@ROOT.Numba.Declare(['RVec<float>', 'int'], 'RVec<float>')
74 return numpyvec**pow
75
76df.Define('array', 'ROOT::RVecF{1.,2.,3.}')\
77 .Define('arraySquared', 'Numba::pypowarray(array, 2)')
78~~~
79
81
83
85
86Eventually, you probably would like to inspect the content of the RDataFrame or process the data further
89
91##### Scalar columns
92If your column contains scalar values of fundamental types (e.g., integers, floats), `AsNumpy()` produces NumPy arrays with the appropriate `dtype`:
93~~~{.py}
94rdf = ROOT.RDataFrame(10).Define("int_col", "1").Define("float_col", "2.3")
95print(rdf.AsNumpy(["int_col", "float_col"]))
96# Output: {'int_col': array([...], dtype=int32), 'float_col': array([...], dtype=float64)}
97~~~
98
100
101##### Collection Columns
102If your column contains collections of fundamental types (e.g., std::vector<int>), `AsNumpy()` produces a NumPy array with `dtype=object` where each
104
105If the collection at a certain entry contains values of fundamental types, or if it is a regularly shaped multi-dimensional array of a fundamental type,
107~~~{.py}
108rdf = rdf.Define("v_col", "std::vector<int>{{1, 2, 3}}")
109print(rdf.AsNumpy(["v_col", "int_col", "float_col"]))
110# Output: {'v_col': array([array([1, 2, 3], dtype=int32), ...], dtype=object), ...}
111~~~
112
114
116
118
120create an RDataFrame using `ROOT.RDF.FromNumpy`. The factory function accepts a dictionary where
121the keys are the column names and the values are NumPy arrays, and returns a new RDataFrame with the provided
122columns.
123
125Data is read directly from the arrays: no copies are performed.
126
127~~~{.py}
128# Read data from NumPy arrays
129# The column names in the RDataFrame are taken from the dictionary keys
130x, y = numpy.array([1, 2, 3]), numpy.array([4, 5, 6])
131df = ROOT.RDF.FromNumpy({"x": x, "y": y})
132
133# Use RDataFrame as usual, e.g. write out a ROOT file
134df.Define("z", "x + y").Snapshot("tree", "file.root")
135~~~
136
137
139### Interoperability with [AwkwardArray](https://awkward-array.org/doc/main/user-guide/how-to-convert-rdataframe.html)
140
141The function for RDataFrame to Awkward conversion is ak.from_rdataframe(). The argument to this function accepts a tuple of strings that are the RDataFrame column names. By default this function returns ak.Array type.
142
143~~~{.py}
144import awkward as ak
145import ROOT
146
147array = ak.from_rdataframe(
148 df,
149 columns=(
150 "x",
151 "y",
152 "z",
153 ),
154)
155~~~
156
157The function for Awkward to RDataFrame conversion is ak.to_rdataframe().
158
159The argument to this function requires a dictionary: { <column name string> : <awkward array> }. This function always returns an RDataFrame object.
160
161The arrays given for each column have to be equal length:
162
163~~~{.py}
164array_x = ak.Array(
165 [
166 {"x": [1.1, 1.2, 1.3]},
167 {"x": [2.1, 2.2]},
168 {"x": [3.1]},
169 {"x": [4.1, 4.2, 4.3, 4.4]},
170 {"x": [5.1]},
171 ]
172)
173array_y = ak.Array([1, 2, 3, 4, 5])
174array_z = ak.Array([[1.1], [2.1, 2.3, 2.4], [3.1], [4.1, 4.2, 4.3], [5.1]])
175
177
178df = ak.to_rdataframe({"x": array_x, "y": array_y, "z": array_z})
179~~~
180
181### Construct histogram and profile models from a tuple
182
183The Histo1D(), Histo2D(), Histo3D(), Profile1D() and Profile2D() methods return
186
188profile model with a Python tuple, as shown in the example below:
189
190~~~{.py}
191# First argument is a tuple with the arguments to construct a TH1D model
192h = df.Histo1D(("histName", "histTitle", 64, 0., 128.), "myColumn")
193~~~
194
195### AsRNode helper function
196
197The ROOT::RDF::AsRNode function casts an RDataFrame node to the generic ROOT::RDF::RNode type. From Python, it can be used to pass any RDataFrame node as an argument of a C++ function, as shown below:
198
199~~~{.py}
200ROOT.gInterpreter.Declare("""
201ROOT::RDF::RNode MyTransformation(ROOT::RDF::RNode df) {
202 auto myFunc = [](float x){ return -x;};
203 return df.Define("y", myFunc, {"x"});
204}
205""")
206
207# Cast the RDataFrame head node
208df = ROOT.RDataFrame("myTree", "myFile.root")
209df_transformed = ROOT.MyTransformation(ROOT.RDF.AsRNode(df))
210
211# ... or any other node
212df2 = df.Filter("x > 42")
213df2_transformed = ROOT.MyTransformation(ROOT.RDF.AsRNode(df2))
214~~~
215
#define g(i)
Definition RSha256.hxx:105
#define a(i)
Definition RSha256.hxx:99
#define h(i)
Definition RSha256.hxx:106
#define e(i)
Definition RSha256.hxx:103
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
Option_t Option_t option
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t result
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h length
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void on
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void value
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t UChar_t len
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void when
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t type
char name[80]
Definition TGX11.cxx:110
The public interface to the RDataFrame federation of classes.
RInterface< Proxied, DS_t > Define(std::string_view name, F expression, const ColumnNames_t &columns={})
Define a new column.
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...
RooCmdArg Columns(Int_t ncol)
T Sum(const RVec< T > &v, const T zero=T(0))
Sum elements of an RVec.
Definition RVec.hxx:1955
RVec< T > Filter(const RVec< T > &v, F &&f)
Create a new collection with the elements passing the filter expressed by the predicate.
Definition RVec.hxx:2183
Double_t y[n]
Definition legend1.C:17
Double_t x[n]
Definition legend1.C:17
for(Int_t i=0;i< n;i++)
Definition legend1.C:18
ROOT::VecOps::RVec< T > RVec
Definition RVec.hxx:70
RNode AsRNode(NodeType node)
Cast a RDataFrame node to the common type ROOT::RDF::RNode.
void function(const Char_t *name_, T fun, const Char_t *docstring=0)
Definition RExports.h:167
tbb::task_arena is an alias of tbb::interface7::task_arena, which doesn't allow to forward declare tb...
TString as(SEXP s)
Definition RExports.h:86
constexpr Double_t C()
Velocity of light in .
Definition TMath.h:114
static uint64_t sum(uint64_t i)
Definition Factory.cxx:2345