Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Toleration for Migration pod. #1774

Closed
5 of 11 tasks
Jaryllan opened this issue Mar 14, 2024 · 1 comment
Closed
5 of 11 tasks

Missing Toleration for Migration pod. #1774

Jaryllan opened this issue Mar 14, 2024 · 1 comment
Labels

Comments

@Jaryllan
Copy link

Please confirm the following

  • I agree to follow this project's code of conduct.
  • I have checked the current issues for duplicates.
  • I understand that AWX is open source software provided for free and that I might not receive a timely response.
  • I am NOT reporting a (potential) security vulnerability. (These should be emailed to security@ansible.com instead.)

Bug Summary

All other pods has tolerations except migration job missing tolerations and nodeselector.

AWX version

24.0.0

Select the relevant components

  • UI
  • UI (tech preview)
  • API
  • Docs
  • Collection
  • CLI
  • Other

Installation method

kubernetes

Modifications

no

Ansible version

core 2.14.6

Operating system

Ubuntu Server 22.04

Web browser

No response

Steps to reproduce

awx created with the following configuration. All other pods except migration can define the tolerations.

apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
name: awx
namespace: awx
spec:
ingress_class_name: nginx
ingress_path: /awx
ingress_type: ingress
node_selector: |
node-role.kubernetes.io/control-plane: ""
postgres_selector: |
node-role.kubernetes.io/control-plane: ""
postgres_tolerations: |
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
task_node_selector: |
node-role.kubernetes.io/control-plane: ""
task_tolerations: |
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
tolerations: |
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
web_node_selector: |
node-role.kubernetes.io/control-plane: ""
web_tolerations: |
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists

Expected results

Migration pod can run to completion status and task pod can be in running state.

Actual results

MIgration pod stuck in pending status and task pod stuck in init status.

Additional information

No response

@github-actions github-actions bot added needs_triage type:bug Something isn't working community labels Mar 14, 2024
@fosterseth fosterseth transferred this issue from ansible/awx Mar 14, 2024
@ranvit
Copy link
Contributor

ranvit commented Apr 2, 2024

I was running AWX in a kube cluster where every node has a taint, so every object needs a toleration in order to get scheduled.

This migration job was in a pending state because it never got scheduled onto a node. Due to this, I was seeing all the same errors like relation "conf_setting" does not exist(#568) or relation "django_migrations" does not exist(#1610)

I kept killing and restarting the operator, postgres, web pods, and I never noticed that the migration-job-pod was not starting up.

Once I realized the migration pod needs a toleration, I added it via kubectl edit <migration pod> and the pod was able to get scheduled, migrations ran, all errors resolved in the postgres and web pods

Longterm - I cant patch this job with kustomize because it seems to get templated by the operator. So we need a PR that adds the postgres_tolerations and postgres_node_selector to the migration job template

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants