Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky upgrade test #1368

Merged
merged 4 commits into from
Jan 23, 2024
Merged

Conversation

lrascao
Copy link
Contributor

@lrascao lrascao commented Nov 26, 2023

Description

There's a flake in the upgrade test from 1.11.0 to 1.12.0, this is a version where we upgrade the configuration CRD by adding two new fields:controlPlaneTrustDomain and sentryAddress.
The flake stems from a known race condition that arises when both updating a CRD and a matching CR close together (described here). This patch introduces a basic retry mechanism, it needs a fresh client on each try to the OpenAPI schema caching that's happening in the kubectl client.

Issue reference

No issue referenced so far but a few occurrences of the flake can be seen in the GHA history

The error is typically always the same:

        {"time":"2023-11-26T12:19:15.977671678Z","status":"info","msg":"Dapr control plane version 1.11.0 detected in namespace dapr-cli-tests"}
        {"time":"2023-11-26T12:19:16.073032862Z","status":"info","msg":"Starting upgrade..."}
        {"time":"2023-11-26T12:19:17.416345982Z","status":"failure","msg":"Failed to upgrade Dapr: error validating \"\": error validating data: [ValidationError(Configuration.spec.mtls): unknown field \"controlPlaneTrustDomain\" in io.dapr.v1alpha1.Configuration.spec.mtls, ValidationError(Configuration.spec.mtls): unknown field \"sentryAddress\" in io.dapr.v1alpha1.Configuration.spec.mtls]"}

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

  • Code compiles correctly
  • Created/updated tests
  • [] Extended the documentation

Signed-off-by: Luis Rascao <luis.rascao@gmail.com>
@lrascao lrascao marked this pull request as ready for review November 26, 2023 18:13
@lrascao lrascao requested review from a team as code owners November 26, 2023 18:13
Copy link

codecov bot commented Dec 6, 2023

Codecov Report

Attention: 58 lines in your changes are missing coverage. Please review.

Comparison is base (f5eb4fd) 22.76% compared to head (00a5cac) 22.55%.

❗ Current head 00a5cac differs from pull request most recent head 8d81dc1. Consider uploading reports for the commit 8d81dc1 to get more accurate results

Files Patch % Lines
pkg/kubernetes/upgrade.go 0.00% 58 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1368      +/-   ##
==========================================
- Coverage   22.76%   22.55%   -0.22%     
==========================================
  Files          40       40              
  Lines        4713     4758      +45     
==========================================
  Hits         1073     1073              
- Misses       3562     3607      +45     
  Partials       78       78              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@mukundansundar mukundansundar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrascao
Thanks for the contribution.
Please change documentation in readme that the upgrade client now retries 5 times on failure.

yaron2
yaron2 previously approved these changes Dec 13, 2023
@lrascao
Copy link
Contributor Author

lrascao commented Dec 18, 2023

@lrascao Thanks for the contribution. Please change documentation in readme that the upgrade client now retries 5 times on failure.

done, added a note to the README. lmk if something else is needed

Signed-off-by: Luis Rascao <luis.rascao@gmail.com>
@lrascao lrascao force-pushed the fix-flaky-upgrade-test branch from f3fefdb to f946c34 Compare December 18, 2023 09:37
@lrascao
Copy link
Contributor Author

lrascao commented Dec 18, 2023

@mukundansundar looks like the dapr_cli CI failure is due to the upload artifact using master (actions/upload-artifact@master), v4 was released that changed the upload semantics. Pinning it down to v3 is probably the faster fix

@mukundansundar
Copy link
Collaborator

@mukundansundar looks like the dapr_cli CI failure is due to the upload artifact using master (actions/upload-artifact@master), v4 was released that changed the upload semantics. Pinning it down to v3 is probably the faster fix

Agreed ...

@dapr-bot
Copy link
Collaborator

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@dapr-bot dapr-bot added the stale label Jan 18, 2024
@mukundansundar mukundansundar merged commit 50e1dff into dapr:master Jan 23, 2024
@mukundansundar mukundansundar added this to the v1.13 milestone Jan 23, 2024
@lrascao lrascao deleted the fix-flaky-upgrade-test branch February 29, 2024 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants