-
Notifications
You must be signed in to change notification settings - Fork 444
Prevent the shredder from making oversized batches #5720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5720 +/- ##
=========================================
- Coverage 83.3% 83.3% -0.1%
=========================================
Files 824 824
Lines 373719 373738 +19
=========================================
- Hits 311509 311499 -10
- Misses 62210 62239 +29 🚀 New features to boost your workflow:
|
@@ -1178,7 +1178,8 @@ pub(super) fn make_shreds_from_data( | |||
}; | |||
// Split the data into erasure batches and initialize | |||
// data and coding shreds for each batch. | |||
while data.len() >= 2 * chunk_size || data.len() == chunk_size { | |||
while data.len() >= chunk_size { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imagine you have 321 shreds worth of data, is_last_in_slot = true
.
The old code will make 9 batches of 32:32 shreds (288) and 1 batch of 33:33 shreds, no padding.
This code will make 10 batches of 32:32 shreds (320) and 1 batch of 32:32 shreds where 31 of those shreds are padding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well yes, but the reality is such that the code typically deals with e.g. 55 shreds worth of data, and ends up making a 59:59 batches. This consistently overstresses RSC decoder. A better solution to your scenario would be to make 2 batches of 16:16 and 17:16 respectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the very last batch in a slot needs to be 32:32 +
.
To be clear I'm fine with requiring that each batch is <= 32:32, and increasing our target later.
I'm just wondering if we can avoid unnecessary padding for the last set. So if is_last_in_slot = true
, allow an exemption for the very last batch to be greater than 32:32.
Or we could add the small set in the middle if we need to be firm on the 32 max size.
Using the example above we make 9 batches of 32:32 (288), one batch of 1:17 (289), and finally one batch of 32:32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding from @lidatong is that they highly desire batches that are exactly 32:32 in size so that they can hard code this assumption into other areas of the code and simplify a bunch of logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes 32:32 would simplify the logic responsible for figuring out in which coding batch a given shred would land (as right now it calls for something like a hashmap with FEC Set Index as key, which is slow) and fixed FEC set size would allow a simple vector to be used instead. This in turn simplifies signaling for repair etc etc. I think we can try hardwiring 32:32 and measure if it actually bad at all on testnet. Its minimal effort and easy enough to observe how many zeros we ultimatly end up shipping on the wire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, we're aligned
Problem
Shredder would make oversized (>64 shreds) batches whenever finishing a given batch of Events (e.g. it would make 3 batches of 64 and one of 59 shreds). However, under low-ish load, most Event batches are small enough to always result in >64 shred batches. This resulted in a higher load on the RSC and merkle tree decoders and also is not really what people reading the specs expect.
Note: Apparently this was done intentionally, i.e. not a bug. There is some chance we should not be doing this.
Summary of Changes