Skip to content

Prevent the shredder from making oversized batches #5720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

alexpyattaev
Copy link

@alexpyattaev alexpyattaev commented Apr 9, 2025

Problem

Shredder would make oversized (>64 shreds) batches whenever finishing a given batch of Events (e.g. it would make 3 batches of 64 and one of 59 shreds). However, under low-ish load, most Event batches are small enough to always result in >64 shred batches. This resulted in a higher load on the RSC and merkle tree decoders and also is not really what people reading the specs expect.

Note: Apparently this was done intentionally, i.e. not a bug. There is some chance we should not be doing this.

Summary of Changes

  • Modify the batch formation code to stick to the 64 shreds per batch limit.
  • Modify tests that expected >32 shred batches

@alexpyattaev alexpyattaev added the noCI Suppress CI on this Pull Request label Apr 9, 2025
@alexpyattaev alexpyattaev requested a review from bw-solana April 9, 2025 11:40
@alexpyattaev alexpyattaev added CI Pull Request is ready to enter CI and removed noCI Suppress CI on this Pull Request labels Apr 9, 2025
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Apr 9, 2025
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.3%. Comparing base (f2e68b5) to head (d747287).
Report is 17 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #5720     +/-   ##
=========================================
- Coverage    83.3%    83.3%   -0.1%     
=========================================
  Files         824      824             
  Lines      373719   373738     +19     
=========================================
- Hits       311509   311499     -10     
- Misses      62210    62239     +29     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@alexpyattaev alexpyattaev requested a review from AshwinSekar April 9, 2025 20:07
@alexpyattaev alexpyattaev marked this pull request as ready for review April 9, 2025 20:07
@@ -1178,7 +1178,8 @@ pub(super) fn make_shreds_from_data(
};
// Split the data into erasure batches and initialize
// data and coding shreds for each batch.
while data.len() >= 2 * chunk_size || data.len() == chunk_size {
while data.len() >= chunk_size {
Copy link

@AshwinSekar AshwinSekar Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine you have 321 shreds worth of data, is_last_in_slot = true.

The old code will make 9 batches of 32:32 shreds (288) and 1 batch of 33:33 shreds, no padding.

This code will make 10 batches of 32:32 shreds (320) and 1 batch of 32:32 shreds where 31 of those shreds are padding.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well yes, but the reality is such that the code typically deals with e.g. 55 shreds worth of data, and ends up making a 59:59 batches. This consistently overstresses RSC decoder. A better solution to your scenario would be to make 2 batches of 16:16 and 17:16 respectively.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the very last batch in a slot needs to be 32:32 + .

To be clear I'm fine with requiring that each batch is <= 32:32, and increasing our target later.
I'm just wondering if we can avoid unnecessary padding for the last set. So if is_last_in_slot = true, allow an exemption for the very last batch to be greater than 32:32.

Or we could add the small set in the middle if we need to be firm on the 32 max size.
Using the example above we make 9 batches of 32:32 (288), one batch of 1:17 (289), and finally one batch of 32:32

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding from @lidatong is that they highly desire batches that are exactly 32:32 in size so that they can hard code this assumption into other areas of the code and simplify a bunch of logic

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 32:32 would simplify the logic responsible for figuring out in which coding batch a given shred would land (as right now it calls for something like a hashmap with FEC Set Index as key, which is slow) and fixed FEC set size would allow a simple vector to be used instead. This in turn simplifies signaling for repair etc etc. I think we can try hardwiring 32:32 and measure if it actually bad at all on testnet. Its minimal effort and easy enough to observe how many zeros we ultimatly end up shipping on the wire.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, we're aligned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants