Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
pageserver read path: abort on fatal IO errors from disk / filesystem (…
…#10786) Before this PR, an IO error returned from the kernel, e.g., due to a bad disk, would get bubbled up, all the way to a user-visible query failing. This is against the IO error handling policy where we have established and is hence being rectified in this PR. [[(internal Policy document link)]](https://github.com/neondatabase/docs/blob/bef44149f746d6705c709b6d9c5e342c0ecac49c/src/storage/handling_io_and_logical_errors.md#L33-L35) The practice on the write path seems to be that we call `maybe_fatal_err()` or `fatal_err()` fairly high up the stack. That is, regardless of whether std::fs, tokio::fs, or VirtualFile is used to perform the IO. For the read path, I choose a centralized approach in this PR by checking for errors as close to the kernel interface as possible. I believe this is better for long-term consistency. To mitigate the problem of missing context if we abort so far down in the stack, the `on_fatal_io_error` now captures and logs a backtrace. I grepped the pageserver code base for `fs::read` to convince myself that all non-VirtualFile reads already handle IO errors according to policy. Refs - fixes #10454
- Loading branch information