Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-48675, OCPBUGS-48808: Ensure build job is deleted when rebuild is triggered #4807

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

umohnani8
Copy link
Contributor

@umohnani8 umohnani8 commented Jan 24, 2025

Ensure that the build job under the MOSB is deleted before deleting the MOSB when the rebuild annotation is added to the MOSC. This ensures proper cleanup so that the build can start again.

Closes https://issues.redhat.com/browse/OCPBUGS-48675
Closes https://issues.redhat.com/browse/OCPBUGS-48808

- What I did
Ensure job is cleaned up when rebuild is triggered.

- How to verify it
Follow the steps in the bug lhttps://issues.redhat.com/browse/OCPBUGS-48675. With this fix, the rebuild should be triggered.

- Description for the changelog
Ensure rebuild is actually triggered when the rebuild annotation is added to the MOSC.

@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Jan 24, 2025
@openshift-ci-robot
Copy link
Contributor

@umohnani8: This pull request references Jira Issue OCPBUGS-48675, which is invalid:

  • expected the bug to target the "4.19.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Ensure that the build job under the MOSB is deleted before deleting the MOSB when the rebuild annotation is added to the MOSC. This ensures proper cleanup so that the build can start again.

Closes https://issues.redhat.com/browse/OCPBUGS-48675

- What I did
Ensure job is cleaned up when rebuild is triggered.

- How to verify it
Follow the steps in the bug lhttps://issues.redhat.com/browse/OCPBUGS-48675. With this fix, the rebuild should be triggered.

- Description for the changelog
Ensure rebuild is actually triggered when the rebuild annotation is added to the MOSC.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jan 24, 2025
Copy link
Contributor

openshift-ci bot commented Jan 24, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: umohnani8

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 24, 2025
@umohnani8
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 27, 2025
@openshift-ci-robot
Copy link
Contributor

@umohnani8: This pull request references Jira Issue OCPBUGS-48675, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from sergiordlr January 27, 2025 14:09
@umohnani8 umohnani8 force-pushed the rebuild branch 3 times, most recently from 5851477 to df830f8 Compare January 27, 2025 18:55
@umohnani8 umohnani8 changed the title OCPBUGS-48675: Ensure build job is deleted when rebuild is triggered OCPBUGS-48675, OCPBUGS-48808: Ensure build job is deleted when rebuild is triggered Jan 27, 2025
@openshift-ci-robot openshift-ci-robot added jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. and removed jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jan 27, 2025
@openshift-ci-robot
Copy link
Contributor

@umohnani8: This pull request references Jira Issue OCPBUGS-48675, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

The bug has been updated to refer to the pull request using the external bug tracker.

This pull request references Jira Issue OCPBUGS-48808, which is invalid:

  • expected the bug to target the "4.19.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Ensure that the build job under the MOSB is deleted before deleting the MOSB when the rebuild annotation is added to the MOSC. This ensures proper cleanup so that the build can start again.

Closes https://issues.redhat.com/browse/OCPBUGS-48675

- What I did
Ensure job is cleaned up when rebuild is triggered.

- How to verify it
Follow the steps in the bug lhttps://issues.redhat.com/browse/OCPBUGS-48675. With this fix, the rebuild should be triggered.

- Description for the changelog
Ensure rebuild is actually triggered when the rebuild annotation is added to the MOSC.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Jan 27, 2025
@openshift-ci-robot
Copy link
Contributor

@umohnani8: This pull request references Jira Issue OCPBUGS-48675, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

This pull request references Jira Issue OCPBUGS-48808, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

In response to this:

Ensure that the build job under the MOSB is deleted before deleting the MOSB when the rebuild annotation is added to the MOSC. This ensures proper cleanup so that the build can start again.

Closes https://issues.redhat.com/browse/OCPBUGS-48675
Closes https://issues.redhat.com/browse/OCPBUGS-48808

- What I did
Ensure job is cleaned up when rebuild is triggered.

- How to verify it
Follow the steps in the bug lhttps://issues.redhat.com/browse/OCPBUGS-48675. With this fix, the rebuild should be triggered.

- Description for the changelog
Ensure rebuild is actually triggered when the rebuild annotation is added to the MOSC.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jan 27, 2025
@umohnani8 umohnani8 force-pushed the rebuild branch 3 times, most recently from b5c7393 to 6d22ac3 Compare January 28, 2025 18:13
@sergiordlr
Copy link

sergiordlr commented Jan 29, 2025

We used an IPI in AWS cluster

We followed the reproduce steps (but using v1 API) and the build was triggered correctly. Nevertheless, when the build finished the MOSB resource was not updated with the right status, and it remained in "failed" status.

We could see this error in the os-builder pod

I0129 11:33:15.496245       1 reconciler.go:617] MachineOSBuild "infra-mosc-a5bfd00980f3d0037a43c2f47701a289" transitioned from transient state (Building) -> terminal state (Succeeded); update needed
I0129 11:33:15.503256       1 reconciler.go:798] Finished updating Job "build-infra-mosc-a5bfd00980f3d0037a43c2f47701a289" after 11.713956ms
E0129 11:33:15.503284       1 wrappedqueue.go:257] "Unhandled Error" err="Updating Job \"build-infra-mosc-a5bfd00980f3d0037a43c2f47701a289\" failed: could not set status on MachineOSBuild \"infra-mosc-a5bfd00980f3d0037a43c2f47701a289\": could not update status on MachineOSBuild \"infra-mosc-a5bfd00980f3d0037a43c2f47701a289\": MachineOSBuild.machineconfiguration.openshift.io \"infra-mosc-a5bfd00980f3d0037a43c2f47701a289\" is invalid: status.conditions: Invalid value: \"array\": once a Failed condition is set, conditions are immutable"
I0129 11:33:15.504392       1 wrappedqueue.go:258] Dropping item <kind: "Job", name: "build-infra-mosc-a5bfd00980f3d0037a43c2f47701a289", func: "(*OSBuildController).updateJob"> out of queue machineosbuilder: Updating Job "build-infra-mosc-a5bfd00980f3d0037a43c2f47701a289" failed: could not set status on MachineOSBuild "infra-mosc-a5bfd00980f3d0037a43c2f47701a289": could not update status on MachineOSBuild "infra-mosc-a5bfd00980f3d0037a43c2f47701a289": MachineOSBuild.machineconfiguration.openshift.io "infra-mosc-a5bfd00980f3d0037a43c2f47701a289" is invalid: status.conditions: Invalid value: "array": once a Failed condition is set, conditions are immutable

This is the state of the cluster

$ oc get job
NAME                                                STATUS     COMPLETIONS   DURATION   AGE
build-infra-mosc-a5bfd00980f3d0037a43c2f47701a289   Complete   1/1           2m51s      14m

$ oc get pods |grep build-
build-infra-mosc-a5bfd00980f3d0037a43c2f47701a289-pbwv8          0/2     Completed   0              14m

# The MOSB resource is still reporting a failed status
$ oc get machineosbuild
NAME                                          PREPARED   BUILDING   SUCCEEDED   INTERRUPTED   FAILED   AGE
infra-mosc-a5bfd00980f3d0037a43c2f47701a289   False      False      False       False         True     14m


A must-gather file has been created, we can find it in a comment in the jira ticket.

@umohnani8
Copy link
Contributor Author

@sergiordlr I am not able to reproduce the issue you mentioned above. I ran through the steps described in both the bugs and in both cases, the mosb status has succeeded set to true after the build is successful.

Following the steps in https://issues.redhat.com/browse/OCPBUGS-48808 to interrupt the build and then add the rebuild annotation:

✗ oc get machineosbuild -w
NAME                                      PREPARED   BUILDING   SUCCEEDED   INTERRUPTED   FAILED
worker-4dc747efeb553fa3ff75699afcc9a93c   False      True       False       False         False
worker-4dc747efeb553fa3ff75699afcc9a93c   False      False      False       True          False
worker-4dc747efeb553fa3ff75699afcc9a93c   False      False      False       True          False
worker-4dc747efeb553fa3ff75699afcc9a93c                                                   
worker-4dc747efeb553fa3ff75699afcc9a93c   False      True       False       False         False
worker-4dc747efeb553fa3ff75699afcc9a93c   False      False      True        False         False

Following the steps in https://issues.redhat.com/browse/OCPBUGS-48675 to create a mosc with incorrect secrets, fixing the secrets after failure, and then adding the rebuild annotation:

➜  4.18 oc get machineosbuild -w
NAME                                           PREPARED   BUILDING   SUCCEEDED   INTERRUPTED   FAILED
worker-mosc-91b288712d4f58bfbea47dfc1e4b3fad   False      True       False       False         False
worker-mosc-91b288712d4f58bfbea47dfc1e4b3fad   False      False      False       False         True
worker-mosc-91b288712d4f58bfbea47dfc1e4b3fad   False      False      False       False         True
worker-mosc-91b288712d4f58bfbea47dfc1e4b3fad                                                   
worker-mosc-91b288712d4f58bfbea47dfc1e4b3fad   False      False      False       False         True
worker-mosc-91b288712d4f58bfbea47dfc1e4b3fad   False      True       False       False         False
worker-mosc-91b288712d4f58bfbea47dfc1e4b3fad   False      False      True        False         False

Can you please tell me what steps you followed where you saw the issue mentioned above.

Ensure that the build job under the MOSB is deleted before
deleting the MOSB when the rebuild annotation is added to
the MOSC. This ensures proper cleanup so that the build can
start again.

Signed-off-by: Urvashi <umohnani@redhat.com>
Copy link
Contributor

openshift-ci bot commented Jan 30, 2025

@umohnani8: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-upgrade-out-of-change 63b5ea6 link false /test e2e-aws-ovn-upgrade-out-of-change
ci/prow/e2e-gcp-op 63b5ea6 link true /test e2e-gcp-op
ci/prow/e2e-vsphere-ovn-upi 63b5ea6 link false /test e2e-vsphere-ovn-upi
ci/prow/e2e-azure-ovn-upgrade-out-of-change 63b5ea6 link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/e2e-gcp-op-ocl 63b5ea6 link false /test e2e-gcp-op-ocl
ci/prow/e2e-aws-ovn 63b5ea6 link true /test e2e-aws-ovn
ci/prow/e2e-aws-ovn-upgrade 63b5ea6 link true /test e2e-aws-ovn-upgrade
ci/prow/okd-scos-e2e-aws-ovn 63b5ea6 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants