-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
retry token bucket not set #1234
Labels
bug
This issue is a bug.
p1
This is a high priority issue
potential-regression
Marking this issue as a potential regression to be checked by team member
Comments
github-merge-queue bot
pushed a commit
to smithy-lang/smithy-rs
that referenced
this issue
Jan 14, 2025
…odes (#3964) ## Motivation and Context <!--- Why is this change required? What problem does it solve? --> <!--- If it fixes an open issue, please link to the issue here --> awslabs/aws-sdk-rust#1234 ## Description <!--- Describe your changes in detail --> PR adds a new interceptor registered as part of the default retry plugin components that ensures a token bucket is _always_ present and available to the retry strategy. The buckets are partitioned off the retry partition (which defaults to the service name and is already set by the default plugin). We use a `static` variable in the runtime for this which means that token buckets can and will apply to every single client that uses the same retry partition. The implementation tries to avoid contention on this new global lock by only consulting it if the retry partition is overridden after client creation. For AWS SDK clients I've updated the default retry partition clients are created with to include the region when set. Now the default partition for a client will be `{service}-{region}` (e.g. `sts-us-west-2`) rather than just the service name (e.g. `sts`). This partitioning is a little more granular and closer to what we want/expect as failures in one region should not cause throttling to another (and vice versa for success in one should not increase available quota in another). I also updated the implementation to follow the SEP a little more literally/closely as far as structure which fixes some subtle bugs. State is updated in one place and we ensure that the token bucket is always consulted (before the token bucket could be skipped in the case of adaptive retries returning a delay and the adaptive rate limit was updated in multiple branches). ## Testing <!--- Please describe in detail how you tested your changes --> <!--- Include details of your testing environment, and the tests you ran to --> <!--- see how your change affects other areas of the code, etc. --> ## Checklist <!--- If a checkbox below is not applicable, then please DELETE it rather than leaving it unchecked --> - [x ] For changes to the smithy-rs codegen or runtime crates, I have created a changelog entry Markdown file in the `.changelog` directory, specifying "client," "server," or both in the `applies_to` key. - [ x] For changes to the AWS SDK, generated SDK code, or SDK runtime crates, I have created a changelog entry Markdown file in the `.changelog` directory, specifying "aws-sdk-rust" in the `applies_to` key. ---- _By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice._
Fix will go out in next |
Comments on closed issues are hard for our team to see. |
landonxjames
pushed a commit
to smithy-lang/smithy-rs
that referenced
this issue
Jan 14, 2025
…odes (#3964) <!--- Why is this change required? What problem does it solve? --> <!--- If it fixes an open issue, please link to the issue here --> awslabs/aws-sdk-rust#1234 <!--- Describe your changes in detail --> PR adds a new interceptor registered as part of the default retry plugin components that ensures a token bucket is _always_ present and available to the retry strategy. The buckets are partitioned off the retry partition (which defaults to the service name and is already set by the default plugin). We use a `static` variable in the runtime for this which means that token buckets can and will apply to every single client that uses the same retry partition. The implementation tries to avoid contention on this new global lock by only consulting it if the retry partition is overridden after client creation. For AWS SDK clients I've updated the default retry partition clients are created with to include the region when set. Now the default partition for a client will be `{service}-{region}` (e.g. `sts-us-west-2`) rather than just the service name (e.g. `sts`). This partitioning is a little more granular and closer to what we want/expect as failures in one region should not cause throttling to another (and vice versa for success in one should not increase available quota in another). I also updated the implementation to follow the SEP a little more literally/closely as far as structure which fixes some subtle bugs. State is updated in one place and we ensure that the token bucket is always consulted (before the token bucket could be skipped in the case of adaptive retries returning a delay and the adaptive rate limit was updated in multiple branches). <!--- Please describe in detail how you tested your changes --> <!--- Include details of your testing environment, and the tests you ran to --> <!--- see how your change affects other areas of the code, etc. --> <!--- If a checkbox below is not applicable, then please DELETE it rather than leaving it unchecked --> - [x ] For changes to the smithy-rs codegen or runtime crates, I have created a changelog entry Markdown file in the `.changelog` directory, specifying "client," "server," or both in the `applies_to` key. - [ x] For changes to the AWS SDK, generated SDK code, or SDK runtime crates, I have created a changelog entry Markdown file in the `.changelog` directory, specifying "aws-sdk-rust" in the `applies_to` key. ---- _By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice._
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
This issue is a bug.
p1
This is a high priority issue
potential-regression
Marking this issue as a potential regression to be checked by team member
Describe the bug
While investigating a cross SDK issue I discovered we are not setting a token bucket in the config bag. This causes retries to only fail when max attempts are exceeded not when the token bucket has exhausted it's retry quota.
This appears to have been unintentionally removed in smithy-lang/smithy-rs#3071.
Regression Issue
Expected Behavior
We should fail long before a large number of max attempts are hit if every request is failing as the retry quota should be exhausted
Current Behavior
Failure only happens when max attempts are exceeded.
Without a token bucket configured the standard retry strategy never acquires permits.
Reproduction Steps
Reproduction, logs show we only ever fail on max retries.
Possible Solution
Token bucket should always be present. We probably want it tied to the retry partition at the very least although our default retry partitioning seems to only use the service name and doesn't account for e.g. region
Additional Information/Context
No response
Version
Environment details (OS name and version, etc.)
macos
Logs
No response
The text was updated successfully, but these errors were encountered: