-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve traffic strategy for canary and partition release #219
Conversation
f5d14c8
to
9390655
Compare
@@ -106,6 +166,12 @@ func (m *canaryReleaseManager) runCanary(c *RolloutContext) error { | |||
return err | |||
} else if done { | |||
canaryStatus.CurrentStepState = v1beta1.CanaryStepStateTrafficRouting | |||
// if it is partition style and the last batch, we can skip the CanaryStepStateTrafficRouting step | |||
// to bypass the bug mentioned above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz be more specific about the bug, leave the bug url
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/* | ||
The following check is used to solve scenario like this: | ||
steps: | ||
- replicas: 1 # frist batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frist -> first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/trafficrouting/manager.go
Outdated
// if canary svc has been already cleaned up, just return | ||
// even DisableGenerateCanaryService is true, canary svc still exists, because canary service is stable service | ||
if err = m.Get(context.TODO(), client.ObjectKeyFromObject(cService), cService); err != nil { | ||
if err = m.Get(context.TODO(), client.ObjectKeyFromObject(cService), cService); err != nil && stableServiceRestored { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err != nil && !errors.IsNotFound(err), the logic will skip the if clause and start restore stable service, it seems better to just return error and retry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's ok because we will get the stable Service again in the following restoring stable service
whatever, i have refactored this logic, looking forward to your check again, many thanks!
7e1281a
to
7c48f3e
Compare
To avoid this issue, we restore stable Service before scaling the stable pods down to zero. | ||
This ensures that the backends behind the stable ingress remain active, preventing the bug from being triggered. | ||
*/ | ||
expectedReplicas, _ := intstr.GetScaledValueFromIntOrPercent(currentStep.Replicas, int(c.Workload.Replicas), true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems a best effort workaround, if the stable replicas is down and not ready, the stable service will also have zero endpoints thus trigger the bug.
pkg/trafficrouting/manager.go
Outdated
@@ -181,6 +180,13 @@ func (m *Manager) DoTrafficRouting(c *TrafficRoutingContext) (bool, error) { | |||
if serviceModified { | |||
return false, nil | |||
} | |||
} else if c.DisableGenerateCanaryService { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz comment the field CanaryStrategy.DisableGenerateCanaryService and explain in what case it is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove subset related logic, i will submit a dedicated pr related to this in the future
// only if both conditions are met | ||
if canaryServiceRemoved && stableServiceRestored { | ||
/* | ||
even both of the conditions are met, we call Finalise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even both of the conditions are met
if both of the conditions are met
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic here is bloated and complex, i will simplify it in the coming pr
Signed-off-by: yunbo <yunbo10124scut@gmail.com>
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: zmberg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Ⅰ. Describe what this PR does
add an additional step before Upgrade step within runCanary function in order to:
Ⅱ. Does this pull request fix one issue?
Ⅲ. Special notes for reviews