-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Seek::stream_len
impl for File
#125087
base: master
Are you sure you want to change the base?
Conversation
r? @ChrisDenton rustbot has assigned @ChrisDenton. Use |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
☔ The latest upstream changes (presumably #127360) made this pull request unmergeable. Please resolve the merge conflicts. |
Alas, this is insufficient. There are other filesystems with questionable
|
Amazing, I did not foresee this.
For this particular example, this doesn't look insufficient, since it doesn't regress anything: use std::fs::File;
use std::io;
use std::io::Seek as _;
use std::io::SeekFrom;
fn main() -> io::Result<()> {
let file = File::open("/sys/kernel/oops_count")?;
println!("{}", (&file).seek(SeekFrom::End(0))?);
Ok(())
} This outputs 4096 on my machine, the same as the
|
EDIT: Seems like I was incorrect. It uses some heuristic to
|
Ok, maybe seeking is nonsense on those files too and they're only meant to be used via But that still leaves FUSE drivers which can implement basically arbitrary behavior. And any approach distinguishing good from bad filesystems would require additional syscalls. |
What is Rust's platform support for bad FUSE drivers? Who decides which of the methods to trust ( I can kinda see not crashing the Rust program if a FUSE driver decides to return |
FUSE is just the canary in the coal mine. sysfs shows that in-kernel filesystems do strange things too. If we have 2 examples already then I assume there's some network filesystem will also do weird things when weird remote machines are involved. So I'd say the trust hierarchy when it comes to "how many bytes does this file contain is" read > seek > stat It's unfortunate that |
I had assumed
So after checking again, it seems that even for procfs |
Sorry, just catching up. Given the above, @the8472 are you happy that this is at least no worse than the status quo? I.e. if seek or stat is returning wrong or nonsense values then that's a problem in either case. |
I think it would be better to only do this for regular files. Non-regular ones have weird behavior in general. E.g. directories have a size according to Even with that restriction I think someone will still eventually run into issues around FUSE, network filesystems or weird filesystems. That said, stream_len is an unstable API and it's a convenience method, so we don't necessarily have to provide ironclad guarantees. Maybe we should just rephrase the documentation of
this would be better
I mean, it always applies to trait methods that trait impls can override them and can provide different behavior, so this isn't special. But perhaps it's better to not suggest that implementations will only ever behave exactly like that. |
I think we should evaluate the proc / sysfs case separately from the FUSE case. For the former, having a heuristic like "this is exactly 4096 bytes" seems...acceptable, if painful. For the latter, I think if people write FUSE filesystems with suffciently weird behavior, it's fine to get issues like trusting the size. We shouldn't crash, but it's fine if we don't have ideal behavior. |
Given all the above I would be inclined to accept this PR, maybe with the adjustment to the documentation suggested by the8472. However, I'm not really comfortable unilaterally approving if not all libs-team members are on board. Can I ask for a libs consensus here? |
Summary: This changes It was initially believed that this might change the results, but we have not found any case in which that happens. Both the old and the new method return the same bogus results for /proc and /sys. We haven't found a case in which they differ. A concern regarding FUSE filesystems was raised, which can return arbitrary data from |
c006cca
to
c8ebd72
Compare
Done. |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This will be merged soon. |
☔ The latest upstream changes (presumably #137065) made this pull request unmergeable. Please resolve the merge conflicts. |
Apologies that this has waited so long after FCP to get merged. r=me once rebased. |
c8ebd72
to
a529985
Compare
Rebased. |
@bors r+ rollup |
…plett Optimize `Seek::stream_len` impl for `File` It uses the file metadata on Unix with a fallback for files incorrectly reported as zero-sized. It uses `GetFileSizeEx` on Windows. This reduces the number of syscalls needed for determining the file size of an open file from 3 to 1.
…plett Optimize `Seek::stream_len` impl for `File` It uses the file metadata on Unix with a fallback for files incorrectly reported as zero-sized. It uses `GetFileSizeEx` on Windows. This reduces the number of syscalls needed for determining the file size of an open file from 3 to 1.
…iaskrgr Rollup of 9 pull requests Successful merges: - rust-lang#125087 (Optimize `Seek::stream_len` impl for `File`) - rust-lang#136986 (Apply unsafe_op_in_unsafe_fn to the standard library) - rust-lang#137012 (add docs and ut for bootstrap util cc-detect) - rust-lang#137072 (Load all builtin targets at once instead of one by one in check-cfg) - rust-lang#137102 (Rework `name_regions` to not rely on reverse scc graph for non-member-constrain usages) - rust-lang#137112 (Don't project into `NonNull` when dropping a `Box`) - rust-lang#137114 (Add an example for `std::error::Error`) - rust-lang#137117 (Fix test that relies on error language) - rust-lang#137119 (fix broken `x {doc, build} core`) r? `@ghost` `@rustbot` modify labels: rollup
@bors r- |
It uses the file metadata on Unix with a fallback for files incorrectly reported as zero-sized. It uses `GetFileSizeEx` on Windows. This reduces the number of syscalls needed for determining the file size of an open file from 3 to 1.
It can only describe the inner workings of the default implementation, other implementations might not be implemented using seeks at all.
a529985
to
5e0836d
Compare
Should be fixed. |
☔ The latest upstream changes (presumably #137176) made this pull request unmergeable. Please resolve the merge conflicts. |
It uses the file metadata on Unix with a fallback for files incorrectly reported as zero-sized. It uses
GetFileSizeEx
on Windows.This reduces the number of syscalls needed for determining the file size of an open file from 3 to 1.