-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmarking of various mounting strategies #67
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #67 +/- ##
==========================================
+ Coverage 60.43% 60.71% +0.27%
==========================================
Files 9 10 +1
Lines 685 840 +155
Branches 169 193 +24
==========================================
+ Hits 414 510 +96
- Misses 251 310 +59
Partials 20 20 ☔ View full report in Codecov by Sentry. |
since s3 can have different latencies at different times of the day, let's also make sure we have some estimate of s3 latency during each benchmark if these benchmarks take some time to run. if they run in mins, i would be less worried about latencies, and in such a scenario we should just get multiple estimates to create some error bars. |
How? |
this is old, but something like this: https://github.com/dvassallo/s3-benchmark |
@satra This seems like something that should be done separately from |
c237d13
to
2c7bebe
Compare
@yarikoptic Problem: |
since it all in motion, I think it would be ok to point to that branch you have for dandi-cli with pydantic 2.0 compat |
tools/run.sh
Outdated
@@ -3,7 +3,7 @@ set -ex | |||
|
|||
PYTHON="$HOME"/miniconda3/bin/python | |||
DANDISETS_PATH=/mnt/backup/dandi/dandisets-healthstatus/dandisets | |||
MOUNT_PATH=/mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse | |||
MOUNT_PATH=/tmp/dandisets-fuse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do it under some safer user-specific location, e.g. /var/run/user/$UID/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse
is the path we've been using on drogon for FUSE-mounting the Dandisets. Also, tools/run.sh
is just for generating the healthstatus reports; the shell script for generating the benchmarks is unfinished and hasn't been committed yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a general security concern/practice to avoid: As /tmp
is allowed to be written by anyone, it is not recommended to rely on pre-defined paths under there, since attacker potentially cause a problem by establishing a symlink to the target file/location "on your behalf". /mnt/backup/dandi/dandisets-healthstatus/
is allowed to be written only by dedicated user(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to update the allowed sudo mount
commands to operate on /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse
instead of /tmp/dandisets-fuse
, and also update /etc/davfs2/davfs2.conf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, on drogon I made following changes under /etc
diff --git a/davfs2/davfs2.conf b/davfs2/davfs2.conf
index 2f0053e..e6034e3 100644
--- a/davfs2/davfs2.conf
+++ b/davfs2/davfs2.conf
@@ -46,7 +46,7 @@
# ignore_dav_header 0
# use_compression 0
# min_propset 0
-# follow_redirect 0
+follow_redirect 1
# sharepoint_href_bug 0
# server_charset
# connect_timeout 10 # seconds
diff --git a/sudoers.d/fuse b/sudoers.d/fuse
new file mode 100644
index 0000000..fa21fd9
--- /dev/null
+++ b/sudoers.d/fuse
@@ -0,0 +1,4 @@
+jwodder ALL=(ALL:ALL) NOPASSWD: /usr/bin/umount /tmp/dandisets-fuse
+jwodder ALL=(ALL:ALL) NOPASSWD: /usr/bin/mount -t davfs https\://webdav.dandiarchive.org /tmp/dandisets-fuse
+jwodder ALL=(ALL:ALL) NOPASSWD: /usr/bin/umount /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse
+jwodder ALL=(ALL:ALL) NOPASSWD: /usr/bin/mount -t davfs https\://webdav.dandiarchive.org /mnt/backup/dandi/dandisets-healthstatus/dandisets-fuse
Please provide results of running such benchmarking across possible solutions e.g. on typhon. (should be less busy ATM) |
scrape that about typhon, I forgot that we rely on having |
@yarikoptic You initially said here to run the benchmarks on smaug. |
if benchmarks rely on full clone of |
@yarikoptic I need permission to
Note that the colons in the URLs need to be escaped when adding them to the sudoers file. Also, |
done |
@yarikoptic 42 seconds |
hm. Any ideas on why fuse solution takes that long? how long it takes with datalad-fuse? |
@yarikoptic I don't know why it's so slow with FUSE, and I don't know how long it would take with FUSE, as the benchmark code kills the process at the 2-hour timeout. |
please make time out 5 hours and run against both fuse solutions -- datalad-fuse and dandidav + davfs2 |
ideally: profile datalad-fuse while running the test to see where it spends time. |
@yarikoptic The matnwb test on datalad-fuse exceeded the five-hour time limit as well. How exactly should I profile it? Just use py-spy? |
First - py-spy would not hurt indeed. Then I would have probably added log lines at DEBUG level within datalad-fuse to see what is actually taking time there if py-spy was not conclusive. |
@yarikoptic Is there a way to get datalad's logs to include timestamps? |
yes, there is also a number of other possibly helpful options (available through env vars or even git config since defined in common_cfg) for augmenting logging behavior:
|
Disregard@yarikoptic When I try running
I don't know why the EDIT: I realized that I wasn't setting the environment for the |
@yarikoptic I believe |
Let's try on |
@yarikoptic I finally got a run that didn't time out by using a 44 MB asset from Dandiset 000005. Here's the output, converted to a table:
|
So |
@yarikoptic Those are the "fusefs" entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall -- great, thank you!
Let's just avoid using pre-defined paths under /tmp/
Closes #66.
To do:
pynwb_open_load_ns
matnwb_nwbRead
DANDI_CACHE=ignore dandi ls
(to load metadata) on a single local assetsub-mouse1-fni16/sub-mouse1-fni16_ses-161228151100.nwb
in 000016 is suggested as a possible candidate.