-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added parameter optimization and evaluation functionality for iotools benchmarks #11
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks, @DanteNiewenhuis! 👍
I have attached some preliminary comments, but I guess it's up to @jblomer to finally approve and merge.
gen_atlas.cxx
Outdated
size_t pagesize = 65536; | ||
size_t clustersize = 52428800; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t pagesize = 65536; | |
size_t clustersize = 52428800; | |
size_t pageSize = (64 * 1024); | |
size_t clusterSize = (50 * 1000 * 1000); |
Or even better, IMO, set it to 0
or -1U
, and only call SetApproxUnzippedPageSize()
, etc., if a value was actually set by the user.
If unset, importer
should be using the default value from RNTupleWriteOptions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments in gen_atlas.cxx
apply here.
gen_h1.cxx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments in gen_atlas.cxx
apply here.
gen_lhcb.cxx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment in gen_atlas.cxx
apply here.
|
||
|
||
import sys | ||
sys.path.insert(0, "/home/dante-niewenhuis/Documents/CERN-parameters-new") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think changing this path is not mentioned in the README.md. Also, what is it used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, Python cannot import packages that are in the parent folder of a file.
To fix this, I add the optimization_tools folder to the paths so Python can find it.
I have changed the string so it is clearer that it is something the user should to change.
page_sizes = [(16, "KB"), (32, "KB"), (64, "KB"), (128, "KB"), (256, "KB"), (512, "KB"), | ||
(1, "MB"), (2, "MB"), (4, "MB"), (8, "MB"), (16, "MB")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the convertToByte()
function, MB here refers to a power-of-two unit, i.e. a mebibyte.
However, we typically use the MB (1000 * 1000) bytes as the unit for on-disk data (see, e.g. the default value for the fApproxZippedClusterSize
member in RNTupleWriteOptions).
return False | ||
|
||
|
||
def convertToByte(value: int, form: str = "b") -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to duplicate this function in Parameters.py. I think we should get rid of one of them.
""" | ||
for line in benchmark_output.split("\n"): | ||
if target in line: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for _ in tqdm(range(evaluations)): | ||
|
||
# Reset cache | ||
os.system('sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iotools repository already has clear_page_cache.c. The recipe in the Makefile also sets the suid bit, so that no sudo
is required after
$ make clear_page_cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thanks a lot!
Some minor comments.
Feel free to add your name and/or contact details to the README.
.gitignore
Outdated
gen_h1 | ||
gen_lhcb | ||
|
||
*.pyc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline at the end of the file
gen_atlas.cxx
Outdated
size_t pagesize = (64 * 1024); | ||
size_t clustersize = (50 * 1000 * 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in the other generators: use zero to indicate the default page size should be used if the option is not set. I think also the file name modification should only be done if there is a deviation from the default.
gen_cms.cxx
Outdated
{ | ||
std::cout << "Usage: " << progname << " -i <ttjet_13tev_june2019.root> -o <ntuple-path> -c <compression> [-m(t)]" | ||
std::cout << "Usage: " << progname << " -o <ntuple-path> -c <compression> -p <page-size> -x <cluster-size> [-m(t)] <H1 root file>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::cout << "Usage: " << progname << " -o <ntuple-path> -c <compression> -p <page-size> -x <cluster-size> [-m(t)] <H1 root file>" | |
std::cout << "Usage: " << progname << " -o <ntuple-path> -c <compression> -p <page-size> -x <cluster-size> [-m(t)] <CMS root file>" |
optimization_tools/README.md
Outdated
@@ -0,0 +1,12 @@ | |||
This folder contains python files used to benchmark and optimize parameters based on the iotools benchmarks. | |||
|
|||
To run the files, first the Paths in Benchmarking/variables.py need to be updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To run the files, first the Paths in Benchmarking/variables.py need to be updated. | |
To run the files, first the paths in Benchmarking/variables.py need to be updated. |
|
||
|
||
import sys | ||
sys.path.insert(0, "/home/dante-niewenhuis/Documents/CERN-parameters-new") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sys.path.insert(0, "/home/dante-niewenhuis/Documents/CERN-parameters-new") | |
sys.path.insert(0, "CERN-parameters-new") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not work in this case because "CERN-parameters-new" is a parent of the current working directory
# root [12] auto ntuple = ROOT::Experimental::RNTupleReader::Open("DecayTree" , "generated/B2HHH~none.ntuple") | ||
# root [13] ntuple->PrintInfo(ROOT::Experimental::ENTupleInfo::kStorageDetails) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add to the comment what it means / why it is there
|
||
for _ in tqdm(range(evaluations)): | ||
# Reset cache | ||
os.system('sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this we have the clean_page_cache
utility
Added python files that can be used to optimize the parameters of reading and writing and RNTuple file.
The main functionality is evaluating single parameter configurations (see tutorials/evaluate_parameters.py),
and optimizing parameters for a specific benchmark (see tutorials/run_annealer.py).
Three different methods of optimizing parameters are provided in Algorithms.py:
To use the files, first the paths in variables.py have to be updated.
This update also adds two flags to the generator files for defining page and cluster size:
-p define page size in bytes (default 65536)
-x define cluster size in bytes (default 52428800)