Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

safekeeper: avoid or optimize segment fsyncs #9664

Closed
Tracked by #9624
erikgrinaker opened this issue Nov 6, 2024 · 2 comments
Closed
Tracked by #9624

safekeeper: avoid or optimize segment fsyncs #9664

erikgrinaker opened this issue Nov 6, 2024 · 2 comments
Assignees
Labels
a/performance Area: relates to performance of the system c/storage/safekeeper Component: storage: safekeeper

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Nov 6, 2024

The WAL receiver attempts to batch writes and avoid fsyncs via NoFlushAppendRequest, only flushing every second:

// Don't flush the WAL on every append, only periodically via flush_ticker.
// This batches multiple appends per fsync. If the channel is empty after
// sending the reply, we'll schedule an immediate flush.
if let ProposerAcceptorMessage::AppendRequest(append_request) = msg {
msg = ProposerAcceptorMessage::NoFlushAppendRequest(append_request);
dirty = true;
}

However, we unconditionally fsync after writing out a complete 16 MB segment:

// If we reached the end of a WAL segment, flush and close it.
self.fdatasync_file(&file).await?;

We also unconditionally fsync 3 times when we create a new segment:

if let Err(e) = durable_rename(&tmp_path, &wal_file_partial_path, !self.no_sync).await {

This implies that we'll fsync four times after every 16 MB. If we want to hit 1 GB/s, that's 256 fsyncs. We should try to optimize this.

Also note that the control file is fsynced every time the commit LSN crosses a segment boundary. That's an additional 64 fsyncs per second. See #9663.

@erikgrinaker erikgrinaker added a/performance Area: relates to performance of the system c/storage/safekeeper Component: storage: safekeeper labels Nov 6, 2024
@erikgrinaker erikgrinaker self-assigned this Nov 6, 2024
@erikgrinaker
Copy link
Contributor Author

durable_rename() fsyncs (3 in total) are responsible for 50% of WAL ingest duration with large files. We should try to do these fsyncs concurrently, and/or defer them until the first WAL segment flush and piggyback on it.

@erikgrinaker
Copy link
Contributor Author

We removed an unnecessary fsync in for every segment in #9686, and moved control file flushes off of the WAL ingest path in #9698.

If we need to further reduce fsyncs, that's probably easiest done by increasing the segment size from 16 MB to e.g. 128 MB, see #9687.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/performance Area: relates to performance of the system c/storage/safekeeper Component: storage: safekeeper
Projects
None yet
Development

No branches or pull requests

1 participant