-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-48780: Fix IBMCloud DNS Propagation Issues in E2E #1164
OCPBUGS-48780: Fix IBMCloud DNS Propagation Issues in E2E #1164
Conversation
Skipping CI for Draft Pull Request. |
/test e2e-ibmcloud-operator |
66cbc36
to
c029765
Compare
/test e2e-ibmcloud-operator |
c029765
to
133382a
Compare
/test e2e-ibmcloud-operator |
afee45c
to
34c2e67
Compare
/test e2e-ibmcloud-operator |
To make sure I didn't break anything: |
Success! |
4f8b44e
to
34c2e67
Compare
Yup, it failed, that's a good thing 👍 Added the |
34c2e67
to
9f6e396
Compare
/test e2e-ibmcloud-operator |
success, but failed on deprovisioning: |
Failure with:
Looks like a flake in which the router didn't admit the router within 1 minute. That's odd, but I doubt specific to IBMCloud. |
@gcs278: This pull request references Jira Issue OCPBUGS-42045, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
9f6e396
to
941bff1
Compare
okay DNS failed again. Increased warmup period to 5 minute which is what I found in #1132 (comment) |
@gcs278: This pull request references Jira Issue OCPBUGS-48780, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Refactored to only increase the DNS Resolution Timeout to 10 minutes: /test e2e-ibmcloud-operator |
e50c9bd
to
3bfdce3
Compare
Fix IBMCloud DNS resolution issues in our E2E tests by extending the timeout to 10 minutes. This fix defines the timeout as a documented constant, ensuring that all DNS resolution logic is updated to reference it. Testing showed that new IBMCloud DNS records were resolving within 7 minutes for external (e.g., test runner cluster) queries. Setting the timeout to 10 minutes provides a reasonable buffer to accommodate DNS propagation across all platforms.
During testing, we found that IBMCloud's DNS resolution works well from outside the cluster (e.g., the test runner cluster). However, internal DNS queries within the test cluster trigger to an unchangeable ~30-minute negative caching TTL. This E2E test fix introduces an internal warmup period for IBMCloud clusters to mitigate the negative caching issue. Only one test, TestUnmanagedDNSToManagedDNSInternalIngressController, requires this workaround.
3bfdce3
to
7ee110b
Compare
darn I forgot to run it: /test e2e-ibmcloud-operator |
/retest |
success round 1 (hasn't finished, but I see the e2e jobs passed) /test e2e-ibmcloud-operator |
/lgtm |
Success, round 2: |
installation failure, not related: |
1 similar comment
installation failure, not related: |
installation failure, terraform is timing out, I guess I should have stopped while I was ahead earlier... /test e2e-ibmcloud-operator |
installation failure, not related: |
As far as DNS Propagation is concerned, the last run was a success, but the failure seems like it's a dns-related flake (but real bug) for IBMCloud, so I filed https://issues.redhat.com/browse/OCPBUGS-49684 (maybe a win for adding this E2E job?). /test e2e-ibmcloud-operator |
@gcs278: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
4 successes, looks good to me. This is a E2E fix for our pre-submit job which is optional and not running by default, so there is no risk that it will cause any hold up in CI for 4.19. /label acknowledge-critical-fixes-only |
/hold cancel |
@gcs278: Jira Issue OCPBUGS-48780: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-48780 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[ART PR BUILD NOTIFIER] Distgit: ose-cluster-ingress-operator |
Fix IBMCloud DNS resolution issues in our E2E tests with two fixes:
Fix 1:
Extend the timeout to 10 minutes. This fix defines the timeout as a documented constant, ensuring that all DNS resolution logic is updated to reference it.
Testing showed that new IBMCloud DNS records were resolving within 7 minutes for external (e.g., test runner cluster) queries. Setting the timeout to 10 minutes provides a reasonable buffer to accommodate DNS propagation across all platforms.
Fix 2:
During testing, we found that IBMCloud's DNS resolution works well from outside the cluster (e.g., the test runner cluster). However, internal DNS queries within the test cluster trigger to an unchangeable ~30-minute negative caching TTL.
This fix introduces an internal warmup period for IBMCloud clusters to mitigate the negative caching issue. Only one test, TestUnmanagedDNSToManagedDNSInternalIngressController, requires this workaround.