-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wire O_DIRECT also to Uncached I/O #17218
base: master
Are you sure you want to change the base?
Conversation
The change is quite invasive on DMU/DBUF APIs. In some places it was not obvious what is better or where to stop, so I am open to comments on whether we'd like to change more, while already there, or less, to keep some parts of compatibility if that is even possible at all with so big change. |
Would it makes sense to activate this when the user does a
|
@tonyhutter I am not familiar with that fcntl, but sounds possible as a next step. I was thinking about |
Ah yes, |
742a1cb
to
7579173
Compare
Before Direct I/O was implemented, I've implemented lighter version I called Uncached I/O. It uses normal DMU/ARC data path with some optimizations, but evicts data from caches as soon as possible and reasonable. Originally I wired it only to a primarycache property, but now completing the integration all the way up to the VFS. While Direct I/O has the lowest possible memory bandwidth usage, it also has a significant number of limitations. It require I/Os to be page aligned, does not allow speculative prefetch, etc. The Uncached I/O does not have those limitations, but instead require additional memory copy, though still one less than regular cached I/O. As such it should fill the gap in between. Considering this I've disabled annoying EINVAL errors on misaligned requests, adding a tunable for those who wants to test their applications. To pass the information between the layers I had to change a number of APIs. But as side effect upper layers can now control not only the caching, but also speculative prefetch. I haven't wired it to VFS yet, since it require looking on some OS specifics. But while there I've implemented speculative prefetch of indirect blocks for Direct I/O, controllable via all the same mechanisms. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Fixes openzfs#17027
|
*/ | ||
ioflag &= ~O_DIRECT; | ||
if (os->os_direct == ZFS_DIRECT_ALWAYS) { | ||
/* Force either direct of uncached I/O. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"direct or uncached"
So, I am not particularly in favor of a strict being disabled by default. I think we already murky the waters enough when a write I/O is My concern is people are expecting one thing and get another transparently. This leads to confusion and makes ZFS seem like it is not doing what they strictly asked it to do (do not make any copies of my data). I also don't like the idea that prefetching can happen if a user explicitly requested I think if we decided to add another dataset property setting for I think we don't need to complicate things more in my opinion. This should be an opt in feature if anything with maybe a new dataset value for |
My patch actually improves this case, evicting the data from DBUF and ARC caches as soon as possible, reducing cache effects as user asked.
I don't think it has to be strict other than for software testing. The man page on Linux says: "The handling of misaligned O_DIRECT I/Os also varies; they can either fail with EINVAL or fall back to buffered I/O.", so relaxed behavior is not a violation.
We've already found several examples of software having no idea about alignment, but using O_DIRECT, including so general as systemd extensions. Considering they are "broken", I bet more typical Linux file systems to do not enforce alignment. And for us being strict just increase suffering. I am not interested to fix all software in the world, but happy to provide them a testing tool in a shape of a tunable.
The data prefetch will only activate on misaligned I/Os. So obviously user already does not know what he is doing. It is trivial to actually disable it now, if you insist. But I considered that somebody might use O_DIRECT with Direct I/O disabled via module parameter, just to reduce cache trashing on some parts of workload, and prefetch really give performance improvement in many cases even with NVMe pools if application queue depth is insufficient.
I was actually thinking about additional
It is likely the only way to use Direct I/O with SMB and NFS also, since they have no concepts of alignment in protocol. So it seems the world is somewhat less "perfect" than we'd like. ;) |
Before Direct I/O was implemented, I've implemented lighter version I called Uncached I/O. It uses normal DMU/ARC data path with some optimizations, but evicts data from caches as soon as possible and reasonable. Originally I wired it only to a primarycache property, but now completing the integration all the way up to the VFS.
While Direct I/O has the lowest possible memory bandwidth usage, it also has a significant number of limitations. It require I/Os to be page aligned, does not allow speculative prefetch, etc. The Uncached I/O does not have those limitations, but instead require additional memory copy, though still one less than regular cached I/O. As such it should fill the gap in between. Considering this I've disabled annoying EINVAL errors on misaligned requests, adding a tunable for those who wants to test their applications.
To pass the information between the layers I had to change a number of APIs. But as side effect upper layers can now control not only the caching, but also speculative prefetch. I haven't wired it to VFS yet, since it require looking on some OS specifics. But while there I've implemented speculative prefetch of indirect blocks for Direct I/O, controllable via all the same mechanisms.
Fixes #17027
How Has This Been Tested?
Basic read/write tests with and without O_DIRECT, observing proper cache behavior.
Types of changes
Checklist:
Signed-off-by
.