Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate experimental-snapshot-catchup-entries flag to snapshot-catchup-entries #19352

Merged

Conversation

ajaysundark
Copy link
Contributor

Migrate experimental-snapshot-catchup-entries flag to snapshot-catchup-entries

Sub task from #19141

Note:

  1. This PR also handles a fix to an unintended bug from this feature PR, where the field-name was set as experimental-snapshot-catch-up-entries when cmd-line flag experimental-snapshot-catchup-entries was used.
  2. YAML field experimental-snapshot-catch-up-entries will set flag experimental-snapshot-catchup-entries correctly at FlagsExplicitlySet. This is required to mark / identify the field if used from 'config-file' as a deprecated flag usage.
  3. experimental-snapshot-catch-up-entries YAML field usage is maintained the same field-name for backward compability.

@k8s-ci-robot
Copy link

Hi @ajaysundark. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ajaysundark
Copy link
Contributor Author

/cc @siyuanfoundation @ahrtr

Copy link

codecov bot commented Feb 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.90%. Comparing base (f197f44) to head (5802231).
Report is 2 commits behind head on main.

Additional details and impacted files
Files with missing lines Coverage Δ
server/embed/config.go 80.11% <100.00%> (+0.17%) ⬆️
server/etcdmain/config.go 76.08% <100.00%> (+0.26%) ⬆️

... and 22 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #19352      +/-   ##
==========================================
- Coverage   68.93%   68.90%   -0.04%     
==========================================
  Files         420      420              
  Lines       35756    35759       +3     
==========================================
- Hits        24649    24638      -11     
- Misses       9685     9700      +15     
+ Partials     1422     1421       -1     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f197f44...5802231. Read the comment docs.

@ahrtr
Copy link
Member

ahrtr commented Feb 6, 2025

/ok-to-test

@siyuanfoundation
Copy link
Contributor

siyuanfoundation commented Feb 6, 2025

Need to change in tests/framework/e2e/cluster.go

func WithSnapshotCatchUpEntries(count uint64) EPClusterOption {
--	return func(c *EtcdProcessClusterConfig) { c.ServerConfig.SnapshotCatchUpEntries = count }
++     return func(c *EtcdProcessClusterConfig) { c.ServerConfig.ExperimentalSnapshotCatchUpEntries = count }
}

to keep using the --experimental-snapshot-catchup-entries in e2e and robustness test.

We do not want to use the non-experimental flag yet in those tests because of
Opened kubernetes/kubernetes#125031

@siyuanfoundation siyuanfoundation self-requested a review February 6, 2025 17:25
@ajaysundark ajaysundark force-pushed the migrate-experimental-snapshot-catch-up-entries branch from da2d4d8 to 4605e40 Compare February 6, 2025 19:06
@ajaysundark ajaysundark force-pushed the migrate-experimental-snapshot-catch-up-entries branch 2 times, most recently from d39df53 to 09c560c Compare February 6, 2025 19:39
@ajaysundark ajaysundark force-pushed the migrate-experimental-snapshot-catch-up-entries branch from d11c45d to 30f96c0 Compare February 6, 2025 19:57
@siyuanfoundation
Copy link
Contributor

/retest

@ajaysundark
Copy link
Contributor Author

This is blocked by #19354 as we have test framework dependency on the ExperimentalSnapshotCatchupEntries.

For example:

TestMixVersionsSnapshotByMockingPartition would need ExperimentalSnapshotCatchupEntries to be set here: framework/e2e/cluster.go#L232 whereas TestRecoverSnapshotBackend require this to be SnapshotCatchupEntries.

@ahrtr
Copy link
Member

ahrtr commented Feb 7, 2025

/retest

@ahrtr
Copy link
Member

ahrtr commented Feb 7, 2025

/test pull-etcd-e2e-amd64

1 similar comment
@ahrtr
Copy link
Member

ahrtr commented Feb 7, 2025

/test pull-etcd-e2e-amd64

@@ -229,7 +229,7 @@ func WithSnapshotCount(count uint64) EPClusterOption {
}

func WithSnapshotCatchUpEntries(count uint64) EPClusterOption {
return func(c *EtcdProcessClusterConfig) { c.ServerConfig.SnapshotCatchUpEntries = count }
return func(c *EtcdProcessClusterConfig) { c.ServerConfig.ExperimentalSnapshotCatchUpEntries = count }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The e2e test failure of TestRecoverSnapshotBackend is caused by this change. You need to update the line below to use the ExperimentalSnapshotCatchUpEntries

return config.ServerConfig.SnapshotCount + config.ServerConfig.SnapshotCatchUpEntries

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or it might be even better to set both SnapshotCatchUpEntries and ExperimentalSnapshotCatchUpEntries in this function WithSnapshotCatchUpEntries; then no need to update the line below,

return config.ServerConfig.SnapshotCount + config.ServerConfig.SnapshotCatchUpEntries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review, @ahrtr. I considered setting both the flags at test framework, but we explicitly disallow doing so (we validate for both flags being set), as that could allow misconfigurations in real world use-cases.

Copy link
Member

@ahrtr ahrtr Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware of that setting both flags are disallowed, but what we are talking about here is just test code.

Also note that the robustness test cases test both main and release branches, while ExperimentalSnapshotCatchUpEntries doesn't exist in stable release branches, so setting both fields here might be best choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can revert this change, and keep using c.ServerConfig.SnapshotCatchUpEntries = count in WithSnapshotCatchUpEntries.

Note this is a special case, the field ExperimentalSnapshotCatchUpEntries is a new field being added in this PR. While SnapshotCatchUpEntries is the existing field.

cc @ajaysundark @siyuanfoundation

Copy link
Contributor Author

@ajaysundark ajaysundark Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in response. The issue is that if we set both SnapshotCatchupEntries and ExperimentalSnapshotCatchupEntries at function WithSnapshotCatchupEntries:

the tests fail with both the flags are set.

{"level":"warn","ts":"2025-02-11T18:35:13.533795Z","caller":"etcdmain/etcd.go:66","msg":"failed to verify flags","error":"cannot set --experimental-snapshot-catchup-entries and --snapshot-catchup-entries at the same time, please use --snapshot-catchup-entries only"}

Another possible option I tried is setting one of the two flags based on the version (if version < 3.6), say at InitEtcdProcessCluster here:

func InitEtcdProcessCluster(t testing.TB, cfg *EtcdProcessClusterConfig) (*EtcdProcessCluster, error) {

like:

if cfg.Version != CurrentVersion && UsesExperimentalSnapshotCatchupEntriesFlag(BinPath.EtcdLastRelease) {
	cfg.ServerConfig.ExperimentalSnapshotCatchUpEntries = cfg.ServerConfig.SnapshotCatchUpEntries
	cfg.ServerConfig.SnapshotCatchUpEntries = etcdserver.DefaultSnapshotCatchUpEntries
}

but for some reason it's failing TestMixVersionsSnapshotByMockingPartition: timing out the waiting for Verify logs to check leader has saved snapshot. Any thoughts? cc @ahrtr / @siyuanfoundation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

increasing the timeout to more than 30s helped with the test run:
PASSES="e2e" TIMEOUT=5m TESTCASE=TestMixVersionsSnapshotByMockingPartition ./scripts/test.sh

@ahrtr
Copy link
Member

ahrtr commented Feb 7, 2025

Note that I will cut branch release-3.6 about 2 hours later. If the comments can be resolved before that time, then it should be able to be merged before cutting branch; otherwise, you will have to resubmit the PR against the new release-3.6.

@ahrtr
Copy link
Member

ahrtr commented Feb 7, 2025

otherwise, you will have to resubmit the PR against the new release-3.6.

This might not be correct. We can still merge this to main, then backport to 3.6. Eventually we will cleanup all experimental flags in main.

@ahrtr
Copy link
Member

ahrtr commented Feb 11, 2025

@ajaysundark are you still working on this PR? It's almost ready to be merged with just a couple of comments. If you don't have time, let us know. I am targeting to get it done by tomorrow.

@ajaysundark ajaysundark force-pushed the migrate-experimental-snapshot-catch-up-entries branch 2 times, most recently from eb85033 to 189127a Compare February 11, 2025 19:05
…p-entries

Signed-off-by: Ajay Sundar Karuppasamy <ajaysundar@google.com>
@ajaysundark ajaysundark force-pushed the migrate-experimental-snapshot-catch-up-entries branch from 189127a to 5802231 Compare February 11, 2025 19:17
@ajaysundark
Copy link
Contributor Author

/retest-required

if err != nil {
return false
}
v3_6 := semver.Version{Major: 3, Minor: 6}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's reuse

V3_6 = semver.Version{Major: 3, Minor: 6}

Comment on lines +460 to +463
if cfg.Version != CurrentVersion && UsesExperimentalSnapshotCatchupEntriesFlag(BinPath.EtcdLastRelease) {
cfg.ServerConfig.ExperimentalSnapshotCatchUpEntries = cfg.ServerConfig.SnapshotCatchUpEntries
cfg.ServerConfig.SnapshotCatchUpEntries = etcdserver.DefaultSnapshotCatchUpEntries
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be correct. Let me do a minor refactorig.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to #19388

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or please feel free to cherry pick my PR into this PR. @ajaysundark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review and follow-up changes. I'll take a look at #19388 to get more context.
Also, looks like the cherry-pick is not necessary as we handled merging this PR and following up with required changes in yours. Please clarify if my understanding is correct.

@ahrtr
Copy link
Member

ahrtr commented Feb 11, 2025

@fuweid @ivanvc let's just merge this PR. I will do a quick followup PR to resolve the comments.

@ahrtr
Copy link
Member

ahrtr commented Feb 11, 2025

cc @siyuanfoundation

Copy link
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just merge this PR. I will do a quick followup PR to resolve the comments.

OK. The alternative is to carry this commit in your pull request and close this one.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ajaysundark, fuweid, siyuanfoundation
Once this PR has been reviewed and has the lgtm label, please assign serathius for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ahrtr ahrtr merged commit 53d44f0 into etcd-io:main Feb 11, 2025
37 checks passed
@ahrtr
Copy link
Member

ahrtr commented Feb 11, 2025

OK. The alternative is to carry this commit in your pull request and close this one.

Not every people is comfortable with this approach :(

@ahrtr
Copy link
Member

ahrtr commented Feb 11, 2025

/cherry-pick release-3.6

@k8s-infra-cherrypick-robot

@ahrtr: new pull request created: #19389

In response to this:

/cherry-pick release-3.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ahrtr added a commit to ahrtr/etcd that referenced this pull request Feb 11, 2025
Also resolved the review comment in etcd-io#19352

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
ahrtr added a commit to ahrtr/etcd that referenced this pull request Feb 11, 2025
Also resolved the review comment in etcd-io#19352

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
k8s-infra-cherrypick-robot pushed a commit to k8s-infra-cherrypick-robot/etcd that referenced this pull request Feb 11, 2025
Also resolved the review comment in etcd-io#19352

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants