Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
hadd.cxx
Go to the documentation of this file.
1/**
2 \file hadd.cxx
3 \brief This program will merge compatible ROOT objects, such as histograms, Trees and RNTuples,
4 from a list of root files and write them to a target root file.
5 In order for a ROOT object to be mergeable, it must implement the Merge() function.
6 Non-mergeable objects will have all instances copied as-is into the target file.
7 The target file must not be identical to one of the source files.
8
9 Syntax:
10 ```{.cpp}
11 hadd [flags] targetfile source1 source2 ... [flags]
12 ```
13
14 Flags can be passed before or after the positional arguments.
15 The first positional (non-flag) argument will be interpreted as the targetfile.
16 After that, the first sequence of positional arguments will be interpreted as the input files.
17 If two sequences of positional arguments are separated by flags, hadd will emit an error and abort.
18
19 By default, any argument starting with `-` is interpreted as a flag. If you want to pass filenames
20 starting with `-` you need to pass them after `--`:
21 ```{.cpp}
22 hadd [flags] -- -file1 -file2 ...
23 ```
24 Note that in this case you need to pass ALL positional arguments after `--`.
25
26 If a flag requires an argument, the argument can be specified in any of these ways:
27
28 # All equally valid:
29 -j 16
30 -j16
31 -j=16
32
33 The first syntax is the preferred one since it's backward-compatible with previous versions of hadd.
34 The -f flag is an exception to this rule: it only supports the `-f[0-9]` syntax.
35
36 Note that merging multiple flags is NOT supported: `-jfa` will be interpreted as -j=fa, which is invalid!
37
38 The flags are as follows:
39
40 \param -a Append to the output
41 \param -cachesize <SIZE> Resize the prefetching cache used to speed up I/O operations (use 0 to disable).
42 \param -d <DIR> Carry out the partial multiprocess execution in the specified directory
43 \param -dbg Enable verbosity. If -j was specified, do not not delete partial files
44 stored inside working directory.
45 \param -experimental-io-features <FEATURES> Enables the corresponding experimental feature for output trees.
46 \see ROOT::Experimental::EIOFeatures
47 \param -f Force overwriting of output file.
48 \param -f[0-9] Set target compression level. 0 = uncompressed, 9 = highly compressed. Default is 101
49 (kDefaultZLIB). You can also specify the full compression algorithm, e.g. -f505.
50 \param -fk Sets the target file to contain the baskets with the same compression as the input files
51 (unless -O is specified). Compresses the meta data using the compression level specified
52 in the first input or the compression setting after fk (for example 505 when using -fk505)
53 \param -ff The compression level used is the one specified in the first input
54 \param -j [N_JOBS] Parallelise the execution in `N_JOBS` processes. If the number of processes is not specified,
55 or is 0, use the system maximum.
56 \param -k Skip corrupt or non-existent files, do not exit
57 \param -L <FILE> Read the list of objects from FILE and either only merge or skip those objects depending on
58 the value of "-Ltype". FILE must contain one object name per line, which cannot contain
59 whitespaces or '/'. You can also pass TDirectory names, which apply to the entire directory
60 content. Lines beginning with '#' are ignored. If this flag is passed, "-Ltype" MUST be
61 passed as well.
62 \param -Ltype <SkipListed|OnlyListed> Sets the type of operation performed on the objects listed in FILE given with the
63 "-L" flag. "SkipListed" will skip all the listed objects; "OnlyListed" will only merge those
64 objects. If this flag is passed, "-L" must be passed as well.
65 \param -n <N_FILES> Open at most `N` files at once (use 0 to request to use the system maximum - which is also
66 the default)
67 \param -O Re-optimize basket size when merging TTree
68 \param -T Do not merge Trees
69 \param -v [LEVEL] Explicitly set the verbosity level: 0 request no output, 99 is the default
70 \return hadd returns a status code: 0 if OK, 1 otherwise
71
72 For example assume 3 files f1, f2, f3 containing histograms hn and Trees Tn
73 - f1 with h1 h2 h3 T1
74 - f2 with h1 h4 T1 T2
75 - f3 with h5
76 the result of
77 ```
78 hadd -f x.root f1.root f2.root f3.root
79 ```
80 will be a file x.root with h1 h2 h3 h4 h5 T1 T2
81 where
82 - h1 will be the sum of the 2 histograms in f1 and f2
83 - T1 will be the merge of the Trees in f1 and f2
84
85 The files may contain sub-directories.
86
87 If the source files contains histograms and Trees, one can skip
88 the Trees with
89 ```
90 hadd -T targetfile source1 source2 ...
91 ```
92
93 Wildcarding and indirect files are also supported
94 ```
95 hadd result.root myfil*.root
96 ```
97 will merge all files in myfil*.root
98 ```
99 hadd result.root file1.root @list.txt file2. root myfil*.root
100 ```
101 will merge file1.root, file2.root, all files in myfil*.root
102 and all files in the indirect text file list.txt ("@" as the first
103 character of the file indicates an indirect file. An indirect file
104 is a text file containing a list of other files, including other
105 indirect files, one line per file).
106
107 If the sources and and target compression levels are identical (default),
108 the program uses the TChain::Merge function with option "fast", ie
109 the merge will be done without unzipping or unstreaming the baskets
110 (i.e. direct copy of the raw byte on disk). The "fast" mode is typically
111 5 times faster than the mode unzipping and unstreaming the baskets.
112
113 If the option -cachesize is used, hadd will resize (or disable if 0) the
114 prefetching cache use to speed up I/O operations.
115
116 For options that take a size as argument, a decimal number of bytes is expected.
117 If the number ends with a `k`, `m`, `g`, etc., the number is multiplied
118 by 1000 (1K), 1000000 (1MB), 1000000000 (1G), etc.
119 If this prefix is followed by `i`, the number is multiplied by the traditional
120 1024 (1KiB), 1048576 (1MiB), 1073741824 (1GiB), etc.
121 The prefix can be optionally followed by B whose casing is ignored,
122 eg. 1k, 1K, 1Kb and 1KB are the same.
123
124 \note By default histograms are added. However hadd does not support the case where
125 histograms have their bit TH1::kIsAverage set.
126
127 \authors Rene Brun, Dirk Geppert, Sven A. Schmidt, Toby Burnett
128*/
129#include "Compression.h"
130#include "TClass.h"
131#include "TFile.h"
132#include "TFileMerger.h"
133#include "THashList.h"
134#include "TKey.h"
135#include "TSystem.h"
136#include "TUUID.h"
137
138#include <ROOT/RConfig.hxx>
139#include <ROOT/StringConv.hxx>
140#include <ROOT/TIOFeatures.hxx>
141
142#include "haddCommandLineOptionsHelp.h"
143
144#include <climits>
145#include <cstdlib>
146#include <filesystem>
147#include <fstream>
148#include <iostream>
149#include <optional>
150#include <sstream>
151#include <string>
152
153#ifndef R__WIN32
155#endif
156
157////////////////////////////////////////////////////////////////////////////////
158
159inline std::ostream &Err()
160{
161 std::cerr << "Error in <hadd>: ";
162 return std::cerr;
163}
164
165inline std::ostream &Warn()
166{
167 std::cerr << "Warning in <hadd>: ";
168 return std::cerr;
169}
170
171inline std::ostream &Info()
172{
173 std::cerr << "Info in <hadd>: ";
174 return std::cerr;
175}
176
177using IntFlag_t = uint32_t;
178
179struct HAddArgs {
182 bool fForce;
185 bool fDebug;
188
189 std::optional<std::string> fWorkingDir;
190 std::optional<IntFlag_t> fNProcesses;
191 std::optional<std::string> fObjectFilterFile;
192 std::optional<Int_t> fObjectFilterType;
193 std::optional<TString> fCacheSize;
194 std::optional<ROOT::TIOFeatures> fFeatures;
195 std::optional<IntFlag_t> fMaxOpenedFiles;
196 std::optional<IntFlag_t> fVerbosity;
197 std::optional<IntFlag_t> fCompressionSettings;
198
201 // This is set to true if and only if the user passed `--`. In this special
202 // case, we must not stop parsing positional arguments even if we find one
203 // that starts with a `-`.
205};
206
208
209static EFlagResult FlagToggle(const char *arg, const char *flagStr, bool &flagOut)
210{
211 const auto argLen = strlen(arg);
212 const auto flagLen = strlen(flagStr);
213 if (argLen == flagLen && strncmp(arg, flagStr, flagLen) == 0) {
214 if (flagOut)
215 Warn() << "duplicate flag: " << flagStr << "\n";
216 flagOut = true;
218 }
220}
221
222// NOTE: not using std::stoi or similar because they have bad error checking.
223// std::stoi will happily parse "120notvalid" as 120.
224static std::optional<IntFlag_t> StrToUInt(const char *str)
225{
226 if (!str)
227 return {};
228
229 uint32_t res = 0;
230 do {
231 if (!isdigit(*str))
232 return {};
233 if (res * 10 < res) // overflow is an error
234 return {};
235 res *= 10;
236 res += *str - '0';
237 } while (*++str);
238
239 return res;
240}
241
242template <typename T>
247
248template <typename T>
249static FlagConvResult<T> ConvertArg(const char *);
250
251template <>
253{
254 return {arg, EFlagResult::kParsed};
255}
256
257template <>
259{
260 // Don't even try to parse arg if it doesn't look like a number.
261 if (!isdigit(*arg))
262 return {0, EFlagResult::kIgnored};
263
264 auto intOpt = StrToUInt(arg);
265 if (intOpt)
266 return {*intOpt, EFlagResult::kParsed};
267
268 Err() << "error parsing integer argument '" << arg << "'\n";
269 return {0, EFlagResult::kErr};
270}
271
272template <>
274{
276 std::stringstream ss;
277 ss.str(arg);
278 std::string item;
279 while (std::getline(ss, item, ',')) {
280 if (!features.Set(item))
281 Warn() << "ignoring unknown feature request: " << item << "\n";
282 }
284}
285
287{
288 TString cacheSize;
289 int size;
292 Err() << "could not parse the cache size passed after -cachesize: '" << arg << "'\n";
293 return {"", EFlagResult::kErr};
295 double m;
296 const char *munit = nullptr;
298 Warn() << "the cache size passed after -cachesize is too large: " << arg << " is greater than " << m << munit
299 << ". We will use the maximum value.\n";
300 return {std::to_string(m) + munit, EFlagResult::kParsed};
301 } else {
302 cacheSize = "cachesize=";
303 cacheSize.Append(arg);
304 }
305 return {cacheSize, EFlagResult::kParsed};
306}
307
309{
310 if (strcmp(arg, "SkipListed") == 0)
312 if (strcmp(arg, "OnlyListed") == 0)
314
315 Err() << "invalid argument for -Ltype: '" << arg << "'. Can only be 'SkipListed' or 'OnlyListed' (case matters).\n";
316 return {{}, EFlagResult::kErr};
317}
318
319// Parses a flag that is followed by an argument of type T.
320// If `defaultVal` is provided, the following argument is optional and will be set to `defaultVal` if missing.
321// `conv` is used to convert the argument from string to its type T.
322template <typename T>
323static EFlagResult
324FlagArg(int argc, char **argv, int &argIdxInOut, const char *flagStr, std::optional<T> &flagOut,
325 std::optional<T> defaultVal = std::nullopt, FlagConvResult<T> (*conv)(const char *) = ConvertArg<T>)
326{
327 int argIdx = argIdxInOut;
328 const char *arg = argv[argIdx] + 1;
329 int argLen = strlen(arg);
330 int flagLen = strlen(flagStr);
331 const char *nxtArg = nullptr;
332
333 if (strncmp(arg, flagStr, flagLen) != 0)
335
336 bool argIsSeparate = false;
337 if (argLen > flagLen) {
338 // interpret anything after the flag as the argument.
339 nxtArg = arg + flagLen;
340 // Ignore one '=', if present
341 if (nxtArg[0] == '=')
342 ++nxtArg;
343 } else if (argLen == flagLen) {
344 argIsSeparate = true;
345 if (argIdx + 1 < argc) {
346 ++argIdxInOut;
348 } else {
349 Err() << "expected argument after '-" << flagStr << "' flag.\n";
350 return EFlagResult::kErr;
351 }
352 } else {
354 }
355
356 auto converted = conv(nxtArg);
357 if (converted.fResult == EFlagResult::kParsed) {
358 flagOut = converted.fValue;
359 } else if (converted.fResult == EFlagResult::kIgnored) {
360 if (defaultVal && argIsSeparate) {
362 // If we had tried parsing the next argument, step back one arg idx.
364 } else {
365 Err() << "the argument after '-" << flagStr << "' flag was not of the expected type.\n";
366 return EFlagResult::kErr;
367 }
368 } else {
369 return EFlagResult::kErr;
370 }
371
373}
374
376{
377 // Must be a number between 0 and 509 (with a 0 in the middle)
378 if (compSettings == 0)
379 return true;
380 // We also accept [1-9] as aliases of [101-109], but it's discouraged.
381 if (compSettings >= 1 && compSettings <= 9) {
382 Warn() << "interpreting " << compSettings << " as " << 100 + compSettings
383 << "."
384 " This behavior is deprecated, please use the full compression settings.\n";
385 return true;
386 }
387 return (compSettings >= 100 && compSettings <= 509) && ((compSettings / 10) % 10 == 0);
388}
389
390// The -f flag has a somewhat complicated logic.
391// We have 4 cases:
392// 1. -f
393// 2. -ff
394// 3. -fk
395// 4. -f[0-509]
396//
397// and a combination thereof (e.g. -fk101, -ff202, -ffk, -fk209)
398// -ff and -f[0-509] are incompatible.
399//
400// ALL these flags imply '-f' ("force overwrite"), but only if they parse successfully.
401// This means that if we see a -f[something] and that "something" doesn't parse to a valid
402// number between 0 and 509, or f or k, we consider the flag invalid and skip it without
403// setting any state.
404//
405// Note that we don't allow `-f [0-9]` because that would be a backwards-incompatible
406// change with the previous arg parsing semantic, changing the meaning of a cmdline like:
407//
408// $ hadd -f 200 f.root g.root # <- '200' is the output file, not an argument to -f!
409static EFlagResult FlagF(const char *arg, HAddArgs &args)
410{
411 if (arg[0] != 'f')
413
414 args.fForce = true;
415 const char *cur = arg + 1;
416 while (*cur) {
417 switch (cur[0]) {
418 case 'f':
420 Warn() << "duplicate flag: -ff\n";
421 if (args.fCompressionSettings) {
422 std::cerr
423 << "[err] Cannot specify both -ff and -f[0-9]. Either use the first input compression or specify it.\n";
424 return EFlagResult::kErr;
425 } else
426 args.fUseFirstInputCompression = true;
427 break;
428 case 'k':
429 if (args.fKeepCompressionAsIs)
430 Warn() << "duplicate flag: -fk\n";
431 args.fKeepCompressionAsIs = true;
432 break;
433 default:
434 if (isdigit(cur[0])) {
435 if (args.fUseFirstInputCompression) {
436 Err() << "cannot specify both -ff and -f[0-9]. Either use the first input compression or "
437 "specify it.\n";
438 return EFlagResult::kErr;
439 } else if (!args.fCompressionSettings) {
440 if (auto compLv = StrToUInt(cur)) {
443 // we can't see any other argument after the number, so we return here to avoid
444 // incorrectly parsing the rest of the characters in `arg`.
446 } else {
447 Err() << *compLv << " is not a supported compression settings.\n";
448 return EFlagResult::kErr;
449 }
450 } else {
451 Err() << "failed to parse compression settings '" << cur << "' as an integer.\n";
452 return EFlagResult::kErr;
453 }
454 } else {
455 Err() << "cannot specify -f[0-9] multiple times!\n";
456 return EFlagResult::kErr;
457 }
458 } else {
459 Err() << "invalid flag: " << arg << "\n";
460 return EFlagResult::kErr;
461 }
462 }
463 ++cur;
464 }
465
467}
468
469// Returns nullopt if any of the flags failed to parse.
470// If an unknown flag is encountered, it will print a warning and go on.
471static std::optional<HAddArgs> ParseArgs(int argc, char **argv)
472{
473 HAddArgs args{};
474
475 enum {
481
482 for (int argIdx = 1; argIdx < argc; ++argIdx) {
483 const char *argRaw = argv[argIdx];
484 if (!*argRaw)
485 continue;
486
487 if (!args.fNoFlagsAfterPositionalArguments && argRaw[0] == '-' && argRaw[1] != '\0') {
488 if (argRaw[1] == '-' && argRaw[2] == '\0') {
489 // special case `--`: force parsing to consider all future args as positional arguments.
491 Err()
492 << "found `--`, but we've already parsed (or are still parsing) a sequence of positional arguments!"
493 " This is not supported: you must have exactly one sequence of positional arguments, so if you"
494 " need to use `--` make sure to pass *all* positional arguments after it.";
495 return {};
496 }
497 args.fNoFlagsAfterPositionalArguments = true;
498 continue;
499 }
500
501 // parse flag
503
504 const char *arg = argRaw + 1;
505 bool validFlag = false;
506
507#define PARSE_FLAG(func, ...) \
508 do { \
509 if (!validFlag) { \
510 const auto res = func(__VA_ARGS__); \
511 if (res == EFlagResult::kErr) \
512 return {}; \
513 validFlag = res == EFlagResult::kParsed; \
514 } \
515 } while (0)
516
517 // NOTE: if two flags have the same prefix (e.g. -Ltype and -L) always put the longest one first!
518 PARSE_FLAG(FlagToggle, arg, "T", args.fNoTrees);
519 PARSE_FLAG(FlagToggle, arg, "a", args.fAppend);
520 PARSE_FLAG(FlagToggle, arg, "k", args.fSkipErrors);
521 PARSE_FLAG(FlagToggle, arg, "O", args.fReoptimize);
522 PARSE_FLAG(FlagToggle, arg, "dbg", args.fDebug);
523 PARSE_FLAG(FlagArg, argc, argv, argIdx, "d", args.fWorkingDir);
524 PARSE_FLAG(FlagArg, argc, argv, argIdx, "j", args.fNProcesses, {0});
525 PARSE_FLAG(FlagArg, argc, argv, argIdx, "Ltype", args.fObjectFilterType, {}, ConvertFilterType);
526 PARSE_FLAG(FlagArg, argc, argv, argIdx, "L", args.fObjectFilterFile);
527 PARSE_FLAG(FlagArg, argc, argv, argIdx, "cachesize", args.fCacheSize, {}, ConvertCacheSize);
528 PARSE_FLAG(FlagArg, argc, argv, argIdx, "experimental-io-features", args.fFeatures);
529 PARSE_FLAG(FlagArg, argc, argv, argIdx, "n", args.fMaxOpenedFiles);
530 PARSE_FLAG(FlagArg, argc, argv, argIdx, "v", args.fVerbosity, {99});
531 PARSE_FLAG(FlagF, arg, args);
532
533#undef PARSE_FLAG
534
535 if (!validFlag)
536 Warn() << "unknown flag: " << argRaw << "\n";
537
538 } else if (!args.fOutputArgIdx) {
539 // First positional argument is the output
540 args.fOutputArgIdx = argIdx;
543 } else {
544 // We should be in the same positional argument group as the output, error otherwise
546 if (!args.fFirstInputIdx) {
547 args.fFirstInputIdx = argIdx;
548 }
549 } else {
550 Err() << "seen a positional argument '" << argRaw
551 << "' after some flags."
552 " Positional arguments were already parsed at this point (from '"
553 << argv[args.fOutputArgIdx]
554 << "' onwards), and you can only have one sequence of them, so you cannot pass more."
555 " Please group your positional arguments all together so that hadd works as you expect.\n"
556 "Cmdline: ";
557 for (int i = 0; i < argc; ++i)
558 std::cerr << argv[i] << " ";
559 std::cerr << "\n";
560
561 return {};
562 }
563 }
564 }
565
566 return args;
567}
568
569// Returns the flags to add to the file merger's flags, or -1 in case of errors.
570static Int_t ParseFilterFile(const std::optional<std::string> &filterFileName,
571 std::optional<Int_t> objectFilterType, TFileMerger &fileMerger)
572{
573 if (filterFileName) {
574 std::ifstream filterFile(*filterFileName);
575 if (!filterFile) {
576 Err() << "error opening filter file '" << *filterFileName << "'\n";
577 return -1;
578 }
580 std::string line;
581 std::string objPath;
582 int nObjects = 0;
583 while (std::getline(filterFile, line)) {
584 std::istringstream ss(line);
585 // only read exactly 1 token per line (strips any whitespaces and such)
586 objPath.clear();
587 ss >> objPath;
588 if (!objPath.empty() && objPath[0] != '#') {
589 filteredObjects.Append(objPath + ' ');
590 ++nObjects;
591 }
592 }
593
594 if (nObjects) {
595 Info() << "added " << nObjects << " object from filter file '" << *filterFileName << "'\n";
596 fileMerger.AddObjectNames(filteredObjects);
597 } else {
598 Warn() << "no objects were added from filter file '" << *filterFileName << "'\n";
599 }
600
601 assert(objectFilterType.has_value());
602 const auto filterFlag = *objectFilterType;
604 return filterFlag;
605 }
606 return 0;
607}
608
609static bool FilesAreEquivalent(std::string_view source, std::string_view target)
610{
611 const bool sourceHasProtocol = source.find_first_of("://") == std::string_view::npos;
612 const bool targetHasProtocol = target.find_first_of("://") == std::string_view::npos;
614 return false;
615
616 // We cannot use std::filesystem functions for file paths that have a protocol.
618 return source == target;
619
620 return std::filesystem::exists(target) && std::filesystem::equivalent(source, target);
621}
622
623int main(int argc, char **argv)
624{
625 if (argc < 3 || "-h" == std::string(argv[1]) || "--help" == std::string(argv[1])) {
627 return (argc == 2 && ("-h" == std::string(argv[1]) || "--help" == std::string(argv[1]))) ? 0 : 1;
628 }
629
630 const auto argsOpt = ParseArgs(argc, argv);
631 if (!argsOpt)
632 return 1;
633 const HAddArgs &args = *argsOpt;
634
636 Int_t maxopenedfiles = args.fMaxOpenedFiles.value_or(0);
637 Int_t verbosity = args.fVerbosity.value_or(99);
638 Int_t newcomp = args.fCompressionSettings.value_or(-1);
639 TString cacheSize = args.fCacheSize.value_or("");
640
641 // For the -j flag (nProcesses), we check if the flag is present and, if so, if it has a
642 // valid value (i.e. any value > 0).
643 // If the flag is present at all, we do multiprocessing. If the value of nProcesses is invalid,
644 // we default to the number of cpus on the machine.
645 Bool_t multiproc = args.fNProcesses.has_value();
646 int nProcesses;
647 if (args.fNProcesses && *args.fNProcesses > 0) {
648 nProcesses = *args.fNProcesses;
649 } else {
650 SysInfo_t s;
651 gSystem->GetSysInfo(&s);
652 nProcesses = s.fCpus;
653 }
654 if (multiproc)
655 Info() << "parallelizing with " << nProcesses << " processes.\n";
656
657 // If the user specified a workingDir, use that. Otherwise, default to the system temp dir.
658 std::string workingDir;
659 if (!args.fWorkingDir) {
661 } else if (args.fWorkingDir && gSystem->AccessPathName(args.fWorkingDir->c_str())) {
662 Err() << "could not access the directory specified: " << *args.fWorkingDir << ".\n";
663 return 1;
664 } else {
665 workingDir = *args.fWorkingDir;
666 }
667
668 // Verify that -L and -Ltype are either both present or both absent.
669 if (args.fObjectFilterFile.has_value() != args.fObjectFilterType.has_value()) {
670 Err() << "-L must always be passed along with -Ltype.\n";
671 return 1;
672 }
673
674 const char *targetname = 0;
675 if (!args.fOutputArgIdx) {
676 Err() << "missing output file.\n";
677 return 1;
678 }
679 if (!args.fFirstInputIdx) {
680 Err() << "missing input file.\n";
681 return 1;
682 }
684
685 if (verbosity > 1)
686 Info() << "target file: " << targetname << "\n";
687
688 if (args.fCacheSize)
689 Info() << "Using " << cacheSize << "\n";
690
691 ////////////////////////////// end flags processing /////////////////////////////////
692
693 gSystem->Load("libTreePlayer");
694
696 fileMerger.SetMsgPrefix("hadd");
697 fileMerger.SetPrintLevel(verbosity - 1);
698 if (maxopenedfiles > 0) {
699 fileMerger.SetMaxOpenedFiles(maxopenedfiles);
700 }
701 // The following section will collect all input filenames into a vector,
702 // including those listed within an indirect file.
703 // If any file can not be accessed, it will error out, unless args.fSkipErrors is true
704 std::vector<std::string> allSubfiles;
705 for (int a = args.fFirstInputIdx; a < argc; ++a) {
706 if (!args.fNoFlagsAfterPositionalArguments && argv[a] && argv[a][0] == '-') {
707 break;
708 }
709 if (argv[a] && argv[a][0] == '@') {
710 std::ifstream indirect_file(argv[a] + 1);
711 if (!indirect_file.is_open()) {
712 Err() << "could not open indirect file " << (argv[a] + 1) << std::endl;
713 if (!args.fSkipErrors)
714 return 1;
715 } else {
716 std::string line;
717 while (indirect_file) {
718 if (std::getline(indirect_file, line) && line.length()) {
719 if (gSystem->AccessPathName(line.c_str(), kReadPermission) == kTRUE) {
720 Err() << "could not validate the file name \"" << line << "\" within indirect file "
721 << (argv[a] + 1) << std::endl;
722 if (!args.fSkipErrors)
723 return 1;
724 } else if (FilesAreEquivalent(line, targetname)) {
725 Err() << "file " << line << " cannot be both the target and an input!\n";
726 if (!args.fSkipErrors)
727 return 1;
728 } else {
729 allSubfiles.emplace_back(line);
730 }
731 }
732 }
733 }
734 } else {
735 const char *line = argv[a];
737 Err() << "could not validate argument \"" << line << "\" as input file " << std::endl;
738 if (!args.fSkipErrors)
739 return 1;
740 } else if (FilesAreEquivalent(line, targetname)) {
741 Err() << "file " << line << " cannot be both the target and an input!\n";
742 if (!args.fSkipErrors)
743 return 1;
744 } else {
745 allSubfiles.emplace_back(line);
746 }
747 }
748 }
749 if (allSubfiles.empty()) {
750 Err() << "could not find any valid input file " << std::endl;
751 return 1;
752 }
753 // The next snippet determines the output compression if unset
754 if (newcomp == -1) {
756 // grab from the first file.
757 TFile *firstInput = TFile::Open(allSubfiles.front().c_str());
758 if (firstInput && !firstInput->IsZombie())
759 newcomp = firstInput->GetCompressionSettings();
760 else
762 delete firstInput;
763 fileMerger.SetMergeOptions(TString("FirstSrcCompression"));
764 } else {
766 fileMerger.SetMergeOptions(TString("DefaultCompression"));
767 }
768 }
769 if (verbosity > 1) {
770 if (args.fKeepCompressionAsIs && !args.fReoptimize)
771 Info() << "compression setting for meta data: " << newcomp << '\n';
772 else
773 Info() << "compression setting for all output: " << newcomp << '\n';
774 }
775 if (args.fAppend) {
776 if (!fileMerger.OutputFile(targetname, "UPDATE", newcomp)) {
777 Err() << "error opening target file for update :" << targetname << ".\n";
778 return 2;
779 }
780 } else if (!fileMerger.OutputFile(targetname, args.fForce, newcomp)) {
781 Err() << "error opening target file (does " << targetname << " exist?).\n";
782 if (!args.fForce)
783 Info() << "pass \"-f\" argument to force re-creation of output file.\n";
784 return 1;
785 }
786
787 auto step = (allSubfiles.size() + nProcesses - 1) / nProcesses;
788 if (multiproc && step < 3) {
789 // At least 3 files per process
790 step = 3;
791 nProcesses = (allSubfiles.size() + step - 1) / step;
792 Info() << "each process should handle at least 3 files for efficiency."
793 " Setting the number of processes to: "
794 << nProcesses << std::endl;
795 }
796 if (nProcesses == 1)
798
799 std::vector<std::string> partialFiles;
800
801#ifndef R__WIN32
802 // this is commented out only to try to prevent false positive detection
803 // from several anti-virus engines on Windows, and multiproc is not
804 // supported on Windows anyway
805 if (multiproc) {
806 auto uuid = TUUID();
807 auto partialTail = uuid.AsString();
808 for (auto i = 0; (i * step) < allSubfiles.size(); i++) {
809 std::stringstream buffer;
810 buffer << workingDir << "/partial" << i << "_" << partialTail << ".root";
811 partialFiles.emplace_back(buffer.str());
812 }
813 }
814#endif
815
816 auto merger) {
817 if (args.fReoptimize) {
818 merger.SetFastMethod(kFALSE);
819 } else {
820 if (!args.fKeepCompressionAsIs && merger.HasCompressionChange()) {
821 // Don't warn if the user has requested any re-optimization.
822 Warn() << "Sources and Target have different compression settings\n"
823 "hadd merging will be slower\n";
824 }
825 }
826 merger.SetNotrees(args.fNoTrees);
827 merger.SetMergeOptions(TString(merger.GetMergeOptions()) + " " + cacheSize);
830 merger.SetIOFeatures(features);
833 if (extraFlags < 0)
834 return false;
836 if (args.fAppend)
838 else
840 Bool_t status = merger.PartialMerge(fileMergerFlags);
841 return status;
842 };
843
844 auto sequentialMerge = [&](TFileMerger &merger, int start, int nFiles) {
845 for (auto i = start; i < (start + nFiles) && i < static_cast<int>(allSubfiles.size()); i++) {
846 if (!merger.AddFile(allSubfiles[i].c_str())) {
847 if (args.fSkipErrors) {
848 Warn() << "skipping file with error: " << allSubfiles[i] << std::endl;
849 } else {
850 Err() << "exiting due to error in " << allSubfiles[i] << std::endl;
851 return kFALSE;
852 }
853 }
854 }
855 return merger);
856 };
857
858 auto parallelMerge = [&](int start) {
860 mergerP.SetMsgPrefix("hadd");
861 mergerP.SetPrintLevel(verbosity - 1);
862 if (maxopenedfiles > 0) {
863 mergerP.SetMaxOpenedFiles(maxopenedfiles / nProcesses);
864 }
865 if (!mergerP.OutputFile(partialFiles[start / step].c_str(), newcomp)) {
866 Err() << "error opening target partial file\n";
867 exit(1);
868 }
869 return sequentialMerge(mergerP, start, step);
870 };
871
872 auto reductionFunc = [&]() {
873 for (const auto &pf : partialFiles) {
874 fileMerger.AddFile(pf.c_str());
875 }
876 return fileMerger);
877 };
878
879 Bool_t status;
880
881#ifndef R__WIN32
882 if (multiproc) {
884 auto res = p.Map(parallelMerge, ROOT::TSeqI(0, allSubfiles.size(), step));
885 status = std::accumulate(res.begin(), res.end(), 0U) == partialFiles.size();
886 if (status) {
887 status = reductionFunc();
888 } else {
889 Err() << "failed at the parallel stage\n";
890 }
891 if (!args.fDebug) {
892 for (const auto &pf : partialFiles) {
893 gSystem->Unlink(pf.c_str());
894 }
895 }
896 } else {
897 status = sequentialMerge(fileMerger, 0, allSubfiles.size());
898 }
899#else
900 status = sequentialMerge(fileMerger, 0, allSubfiles.size());
901#endif
902
903 if (status) {
904 if (verbosity == 1) {
905 Info() << "merged " << allSubfiles.size() << " (" << fileMerger.GetMergeList()->GetEntries()
906 << ") input (partial) files into " << targetname << ".\n";
907 }
908 return 0;
909 } else {
910 if (verbosity == 1) {
911 Err() << "failure during the merge of " << allSubfiles.size() << " ("
912 << fileMerger.GetMergeList()->GetEntries() << ") input (partial) files into " << targetname << ".\n";
913 }
914 return 1;
915 }
916}
int main()
Definition Prototype.cxx:12
#define a(i)
Definition RSha256.hxx:99
size_t size(const MatrixT &matrix)
retrieve the size of a square matrix
bool Bool_t
Definition RtypesCore.h:63
int Int_t
Definition RtypesCore.h:45
constexpr Bool_t kFALSE
Definition RtypesCore.h:94
constexpr Bool_t kTRUE
Definition RtypesCore.h:93
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
winID h TVirtualViewer3D TVirtualGLPainter p
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t target
@ kReadPermission
Definition TSystem.h:55
R__EXTERN TSystem * gSystem
Definition TSystem.h:572
TIOFeatures provides the end-user with the ability to change the IO behavior of data written via a TT...
This class provides a simple interface to execute the same task multiple times in parallel,...
This class provides file copy and merging services.
Definition TFileMerger.h:30
@ kAll
Merge all type of objects (default)
Definition TFileMerger.h:87
@ kIncremental
Merge the input file with the content of the output file (if already existing).
Definition TFileMerger.h:82
@ kSkipListed
Skip objects specified in fObjectNames list.
Definition TFileMerger.h:91
@ kOnlyListed
Only the objects specified in fObjectNames list.
Definition TFileMerger.h:90
@ kRegular
Normal merge, overwriting the output file.
Definition TFileMerger.h:81
@ kFailOnError
The merging process will stop and yield failure when encountering invalid objects.
@ kSkipOnError
The merging process will skip invalid objects and continue.
A ROOT file is an on-disk file, usually with extension .root, that stores objects in a file-system-li...
Definition TFile.h:131
static TFile * Open(const char *name, Option_t *option="", const char *ftitle="", Int_t compress=ROOT::RCompressionSetting::EDefaults::kUseCompiledDefault, Int_t netopt=0)
Create / open a file.
Definition TFile.cxx:4131
Basic string class.
Definition TString.h:139
TString & Append(const char *cs)
Definition TString.h:572
virtual int GetSysInfo(SysInfo_t *info) const
Returns static system info, like OS type, CPU type, number of CPUs RAM size, etc into the SysInfo_t s...
Definition TSystem.cxx:2470
virtual int Load(const char *module, const char *entry="", Bool_t system=kFALSE)
Load a shared library.
Definition TSystem.cxx:1869
virtual Bool_t AccessPathName(const char *path, EAccessMode mode=kFileExists)
Returns FALSE if one can access a file using the specified access mode.
Definition TSystem.cxx:1308
virtual int Unlink(const char *name)
Unlink, i.e.
Definition TSystem.cxx:1393
virtual const char * TempDirectory() const
Return a user configured or systemwide directory to create temporary files in.
Definition TSystem.cxx:1494
This class defines a UUID (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDent...
Definition TUUID.h:42
TLine * line
static EFlagResult FlagArg(int argc, char **argv, int &argIdxInOut, const char *flagStr, std::optional< T > &flagOut, std::optional< T > defaultVal=std::nullopt, FlagConvResult< T >(*conv)(const char *)=ConvertArg< T >)
Definition hadd.cxx:324
EFlagResult
Definition hadd.cxx:207
static bool ValidCompressionSettings(int compSettings)
Definition hadd.cxx:375
FlagConvResult< IntFlag_t > ConvertArg< IntFlag_t >(const char *arg)
Definition hadd.cxx:258
#define PARSE_FLAG(func,...)
static FlagConvResult< Int_t > ConvertFilterType(const char *arg)
Definition hadd.cxx:308
static bool FilesAreEquivalent(std::string_view source, std::string_view target)
Definition hadd.cxx:609
static Int_t ParseFilterFile(const std::optional< std::string > &filterFileName, std::optional< Int_t > objectFilterType, TFileMerger &fileMerger)
Definition hadd.cxx:570
static FlagConvResult< T > ConvertArg(const char *)
uint32_t IntFlag_t
Definition hadd.cxx:177
static std::optional< HAddArgs > ParseArgs(int argc, char **argv)
Definition hadd.cxx:471
FlagConvResult< ROOT::TIOFeatures > ConvertArg< ROOT::TIOFeatures >(const char *arg)
Definition hadd.cxx:273
std::ostream & Warn()
Definition hadd.cxx:165
std::ostream & Info()
Definition hadd.cxx:171
static FlagConvResult< TString > ConvertCacheSize(const char *arg)
Definition hadd.cxx:286
static EFlagResult FlagF(const char *arg, HAddArgs &args)
Definition hadd.cxx:409
static EFlagResult FlagToggle(const char *arg, const char *flagStr, bool &flagOut)
Definition hadd.cxx:209
static std::optional< IntFlag_t > StrToUInt(const char *str)
Definition hadd.cxx:224
std::ostream & Err()
Definition hadd.cxx:159
static constexpr const char kCommandLineOptionsHelp[]
void ToHumanReadableSize(value_type bytes, Bool_t si, Double_t *coeff, const char **units)
Return the size expressed in 'human readable' format.
EFromHumanReadableSize FromHumanReadableSize(std::string_view str, T &value)
Convert strings like the following into byte counts 5MB, 5 MB, 5M, 3.7GB, 123b, 456kB,...
EFlagResult fResult
Definition hadd.cxx:245
bool fNoFlagsAfterPositionalArguments
Definition hadd.cxx:204
bool fKeepCompressionAsIs
Definition hadd.cxx:186
bool fForce
Definition hadd.cxx:182
std::optional< TString > fCacheSize
Definition hadd.cxx:193
std::optional< IntFlag_t > fCompressionSettings
Definition hadd.cxx:197
bool fNoTrees
Definition hadd.cxx:180
std::optional< Int_t > fObjectFilterType
Definition hadd.cxx:192
int fFirstInputIdx
Definition hadd.cxx:200
std::optional< IntFlag_t > fNProcesses
Definition hadd.cxx:190
bool fUseFirstInputCompression
Definition hadd.cxx:187
std::optional< std::string > fObjectFilterFile
Definition hadd.cxx:191
bool fSkipErrors
Definition hadd.cxx:183
std::optional< IntFlag_t > fVerbosity
Definition hadd.cxx:196
std::optional< IntFlag_t > fMaxOpenedFiles
Definition hadd.cxx:195
std::optional< std::string > fWorkingDir
Definition hadd.cxx:189
int fOutputArgIdx
Definition hadd.cxx:199
bool fDebug
Definition hadd.cxx:185
bool fReoptimize
Definition hadd.cxx:184
std::optional< ROOT::TIOFeatures > fFeatures
Definition hadd.cxx:194
bool fAppend
Definition hadd.cxx:181
@ kUseCompiledDefault
Use the compile-time default setting.
Definition Compression.h:53
Int_t fCpus
Definition TSystem.h:162
TMarker m
Definition textangle.C:8